Abstract
The rise of edge intelligence is driving distributed machine learning toward a new paradigm of edge-collaborative computing. To overcome the severe communication bottleneck in this paradigm, In-Network Aggregation is a critical enabling technology. However, its effectiveness is fundamentally undermined by the profound resource heterogeneity of edge networks. Specifically, edge devices, adapting to hardware constraints, operate at varying numerical precisions, leading to significant data inflation as gradients are aggregated. Compounding this, unevenly distributed network resources and traditional, precision-oblivious routing strategies often misallocate critical, high-precision gradients to low-quality paths. This mismatch creates severe network congestion, crippling the efficiency of distributed training. To address this, we propose the Precision-Aware Hierarchical In-Network Aggregation (PAHInA) framework, the first, to our knowledge, to perform routing optimization for in-network aggregation that explicitly considers precision heterogeneity. The core of PAHInA is an intelligent control-plane scheduler that co-optimizes for gradient priority and path cost, dynamically planning the most cost-effective aggregation strategy for each flow. This fine-grained scheduling guarantees that high-priority gradients are routed through premium, low-latency paths, minimizing global communication overhead. On the data plane, we leverage the eXpress Data Path (XDP) for high-performance packet processing to reduce aggregation-induced overhead. Extensive simulations show that, compared to state-of-the-art baselines, PAHInA significantly mitigates network congestion, reducing end-to-end communication time by up to 33% and boosting overall training throughput by approximately 30%.
Recommended Citation
Y. Nian et al., "PAHInA: Precision-Aware Hierarchical In-Network Aggregation for Edge Distributed Training," IEEE Transactions on Networking, Institute of Electrical and Electronics Engineers, Jan 2026.
The definitive version is available at https://doi.org/10.1109/TON.2026.3660333
Department(s)
Computer Science
Publication Status
Early Access
Keywords and Phrases
distributed training; Edge computing; heterogeneous precision; in-network aggregation; XDP
International Standard Serial Number (ISSN)
2998-4157
Document Type
Article - Journal
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2026 Institute of Electrical and Electronics Engineers, All rights reserved.
Publication Date
01 Jan 2026
