Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing

Read original: arXiv:2406.19613 - Published 7/1/2024 by Rui Li, Tao Ouyang, Liekang Zeng, Guocheng Liao, Zhi Zhou, Xu Chen

Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing

Overview

This paper focuses on the optimization of Deep Neural Network (DNN) inference in a collaborative edge computing environment.
The authors propose an online optimization framework to dynamically allocate workloads and route requests to edge devices in order to maximize the overall network utility.
The framework deals with the challenge of an unknown utility function and uses an online mirror descent algorithm to adaptively learn the optimal allocation strategy.

Plain English Explanation

The paper discusses a problem faced in edge computing, which is the distributed computing architecture where computing resources are placed closer to the data sources. In this setting, multiple edge devices work together to process data and run AI models, like deep neural networks, to provide intelligent services to users.

The key challenge the researchers address is how to best distribute the workload of running these AI models across the different edge devices. This is important because the performance and efficiency of the overall system depends on how well the workload is allocated. However, the researchers explain that the "utility" or benefit of different allocation strategies is often unknown ahead of time.

To tackle this problem, the researchers developed an online optimization framework. This framework continuously adjusts how it allocates workloads and routes requests to the edge devices, adapting to the unknown utility function. It uses a technique called online mirror descent to learn the optimal allocation strategy over time.

The significance of this work is that it allows edge computing systems to dynamically optimize their performance without needing to know the exact benefits of different allocation strategies in advance. This can help edge computing systems run more efficiently and provide better services to users.

Technical Explanation

The paper proposes an online optimization framework for dynamically allocating DNN inference workloads and routing requests in a collaborative edge computing environment. The key challenge is that the utility function, which captures the overall benefit of different workload allocation strategies, is unknown a priori.

To address this challenge, the authors formulate the problem as an online convex optimization task. They then develop an online mirror descent algorithm that adaptively learns the optimal workload allocation and request routing policy by observing the system's performance over time. The algorithm leverages the Koopman operator theory to model the system dynamics and uses a multi-armed bandit approach to explore different allocation strategies.

Through theoretical analysis, the authors show that their online optimization framework can achieve sublinear regret, meaning the performance approaches the optimal offline strategy as the system operates for longer. They also conduct experiments using real-world DNN inference workloads, demonstrating that their approach outperforms baseline strategies in terms of maximizing the overall network utility.

Critical Analysis

The paper presents a well-designed online optimization framework that can effectively handle the challenge of unknown utility functions in collaborative edge computing environments. The authors provide a solid theoretical foundation and validate their approach through empirical evaluations.

One potential limitation of the work is that it assumes the availability of accurate models for the system dynamics, which may not always be the case in practice. Further research could explore the use of more data-driven techniques, such as graph neural networks, to learn the system model from observed data.

Additionally, the paper does not consider the impact of various system-level constraints, such as communication delays, resource limitations, or security and privacy concerns. Incorporating these factors into the optimization framework could make the solution more practical and applicable to real-world edge computing deployments.

Overall, the paper presents an important contribution to the field of collaborative edge computing, demonstrating the potential of online optimization techniques to address the challenges of unknown utility functions and dynamic workload allocation. Further research building upon this work could lead to more robust and adaptive edge computing systems that optimize energy efficiency and provide reliable services.

Conclusion

This paper introduces an online optimization framework for dynamic workload allocation and request routing in collaborative edge computing environments. By addressing the challenge of an unknown utility function, the proposed approach can adaptively learn the optimal allocation strategy and maximize the overall network utility.

The significance of this work lies in its potential to improve the efficiency and performance of edge computing systems, which are becoming increasingly important for delivering intelligent services closer to the data sources. The online optimization technique demonstrated in this paper can be a valuable tool for building more adaptive and resilient edge computing architectures.

Future research building upon this work could explore ways to further enhance the framework, such as by incorporating additional system-level constraints and leveraging more data-driven modeling techniques. Ultimately, this research contributes to the ongoing efforts to unlock the full potential of edge computing and bring the benefits of AI-powered services to users in a more reliable and energy-efficient manner.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing

Rui Li, Tao Ouyang, Liekang Zeng, Guocheng Liao, Zhi Zhou, Xu Chen

Collaborative Edge Computing (CEC) is an emerging paradigm that collaborates heterogeneous edge devices as a resource pool to compute DNN inference tasks in proximity such as edge video analytics. Nevertheless, as the key knob to improve network utility in CEC, existing works mainly focus on the workload routing strategies among edge devices with the aim of minimizing the routing cost, remaining an open question for joint workload allocation and routing optimization problem from a system perspective. To this end, this paper presents a holistic, learned optimization for CEC towards maximizing the total network utility in an online manner, even though the utility functions of task input rates are unknown a priori. In particular, we characterize the CEC system in a flow model and formulate an online learning problem in a form of cross-layer optimization. We propose a nested-loop algorithm to solve workload allocation and distributed routing iteratively, using the tools of gradient sampling and online mirror descent. To improve the convergence rate over the nested-loop version, we further devise a single-loop algorithm. Rigorous analysis is provided to show its inherent convexity, efficient convergence, as well as algorithmic optimality. Finally, extensive numerical simulations demonstrate the superior performance of our solutions.

7/1/2024

Edge-device Collaborative Computing for Multi-view Classification

Marco Palena, Tania Cerquitelli, Carla Fabiana Chiasserini

Motivated by the proliferation of Internet-of-Thing (IoT) devices and the rapid advances in the field of deep learning, there is a growing interest in pushing deep learning computations, conventionally handled by the cloud, to the edge of the network to deliver faster responses to end users, reduce bandwidth consumption to the cloud, and address privacy concerns. However, to fully realize deep learning at the edge, two main challenges still need to be addressed: (i) how to meet the high resource requirements of deep learning on resource-constrained devices, and (ii) how to leverage the availability of multiple streams of spatially correlated data, to increase the effectiveness of deep learning and improve application-level performance. To address the above challenges, we explore collaborative inference at the edge, in which edge nodes and end devices share correlated data and the inference computational burden by leveraging different ways to split computation and fuse data. Besides traditional centralized and distributed schemes for edge-end device collaborative inference, we introduce selective schemes that decrease bandwidth resource consumption by effectively reducing data redundancy. As a reference scenario, we focus on multi-view classification in a networked system in which sensing nodes can capture overlapping fields of view. The proposed schemes are compared in terms of accuracy, computational expenditure at the nodes, communication overhead, inference latency, robustness, and noise sensitivity. Experimental results highlight that selective collaborative schemes can achieve different trade-offs between the above performance metrics, with some of them bringing substantial communication savings (from 18% to 74% of the transmitted data with respect to centralized inference) while still keeping the inference accuracy well above 90%.

9/25/2024

Multi-Source Coflow Scheduling in Collaborative Edge Computing with Multihop Network

Yuvraj Sahni, Jiannong Cao, Lei Yang, Shengwei Wang

Collaborative edge computing has become a popular paradigm where edge devices collaborate by sharing resources. Data dissemination is a fundamental problem in CEC to decide what data is transmitted from which device and how. Existing works on data dissemination have not focused on coflow scheduling in CEC, which involves deciding the order of flows within and across coflows at network links. Coflow implies a set of parallel flows with a shared objective. The existing works on coflow scheduling in data centers usually assume a non-blocking switch and do not consider congestion at different links in the multi-hop path in CEC, leading to increased coflow completion time (CCT). Furthermore, existing works do not consider multiple flow sources that cannot be ignored, as data can have duplicate copies at different edge devices. This work formulates the multi-source coflow scheduling problem in CEC, which includes jointly deciding the source and flow ordering for multiple coflows to minimize the sum of CCT. This problem is shown to be NP-hard and challenging as each flow can have multiple dependent conflicts at multiple links. We propose a source and coflow-aware search and adjust (SCASA) heuristic that first provides an initial solution considering the coflow characteristics. SCASA further improves the initial solution using the source search and adjust heuristic by leveraging the knowledge of both coflows and network congestion at links. Evaluation done using simulation experiments shows that SCASA leads to up to 83% reduction in the sum of CCT compared to benchmarks without a joint solution.

5/30/2024

Heterogeneity-Aware Cooperative Federated Edge Learning with Adaptive Computation and Communication Compression

Zhenxiao Zhang, Zhidong Gao, Yuanxiong Guo, Yanmin Gong

Motivated by the drawbacks of cloud-based federated learning (FL), cooperative federated edge learning (CFEL) has been proposed to improve efficiency for FL over mobile edge networks, where multiple edge servers collaboratively coordinate the distributed model training across a large number of edge devices. However, CFEL faces critical challenges arising from dynamic and heterogeneous device properties, which slow down the convergence and increase resource consumption. This paper proposes a heterogeneity-aware CFEL scheme called textit{Heterogeneity-Aware Cooperative Edge-based Federated Averaging} (HCEF) that aims to maximize the model accuracy while minimizing the training time and energy consumption via adaptive computation and communication compression in CFEL. By theoretically analyzing how local update frequency and gradient compression affect the convergence error bound in CFEL, we develop an efficient online control algorithm for HCEF to dynamically determine local update frequencies and compression ratios for heterogeneous devices. Experimental results show that compared with prior schemes, the proposed HCEF scheme can maintain higher model accuracy while reducing training latency and improving energy efficiency simultaneously.

9/9/2024