CoRaiS: Lightweight Real-Time Scheduler for Multi-Edge Cooperative Computing

2403.09671

Published 5/21/2024 by Yujiao Hu, Qingmin Jia, Jinchao Chen, Yuan Yao, Yan Pan, Renchao Xie, F. Richard Yu

CoRaiS: Lightweight Real-Time Scheduler for Multi-Edge Cooperative Computing

Abstract

Multi-edge cooperative computing that combines constrained resources of multiple edges into a powerful resource pool has the potential to deliver great benefits, such as a tremendous computing power, improved response time, more diversified services. However, the mass heterogeneous resources composition and lack of scheduling strategies make the modeling and cooperating of multi-edge computing system particularly complicated. This paper first proposes a system-level state evaluation model to shield the complex hardware configurations and redefine the different service capabilities at heterogeneous edges. Secondly, an integer linear programming model is designed to cater for optimally dispatching the distributed arriving requests. Finally, a learning-based lightweight real-time scheduler, CoRaiS, is proposed. CoRaiS embeds the real-time states of multi-edge system and requests information, and combines the embeddings with a policy network to schedule the requests, so that the response time of all requests can be minimized. Evaluation results verify that CoRaiS can make a high-quality scheduling decision in real time, and can be generalized to other multi-edge computing system, regardless of system scales. Characteristic validation also demonstrates that CoRaiS successfully learns to balance loads, perceive real-time state and recognize heterogeneity while scheduling.

Create account to get full access

Overview

This paper presents CoRaiS, a lightweight real-time scheduler for multi-edge cooperative computing.
CoRaiS aims to efficiently manage the scheduling of tasks across multiple edge devices to enable real-time applications like deep learning inference.
The paper highlights the importance of edge computing and the challenges of coordinating computation and communication across a distributed edge infrastructure.

Plain English Explanation

In the world of computing, there is a growing trend towards edge computing. Instead of processing all data in centralized cloud servers, edge computing brings computation closer to the devices and sensors that generate the data. This allows for faster response times and reduced data transmission costs.

One application of edge computing is multi-edge cooperative computing, where multiple edge devices work together to perform complex tasks like deep learning inference. Coordinating the scheduling of these tasks across the edge devices is a key challenge that this paper addresses.

The researchers developed a system called CoRaiS, which stands for Cooperative Real-time Scheduling. CoRaiS is a lightweight, real-time scheduler that efficiently manages the distribution of tasks across multiple edge devices. It takes into account factors like device capabilities, task deadlines, and communication costs to make scheduling decisions.

The goal of CoRaiS is to enable real-time applications that require low latency, such as autonomous vehicles or augmented reality. By intelligently scheduling tasks across the edge devices, CoRaiS can ensure that these time-sensitive applications receive the computations they need in a timely manner.

Technical Explanation

The paper presents the design and evaluation of CoRaiS, a lightweight real-time scheduler for multi-edge cooperative computing. The authors highlight the growing importance of edge computing and the challenges of coordinating computation and communication across a distributed edge infrastructure.

CoRaiS is designed to efficiently manage the scheduling of tasks across multiple edge devices to enable real-time applications like deep learning inference. The system takes into account factors such as device capabilities, task deadlines, and communication costs to make scheduling decisions.

The key technical contributions of the paper include:

A task allocation algorithm that assigns tasks to edge devices based on their capabilities and the task requirements, while considering communication costs.
A real-time scheduling algorithm that schedules tasks on each edge device to meet their deadlines, prioritizing critical tasks.
A coordination mechanism that allows edge devices to exchange information and coordinate their schedules to optimize the overall system performance.

The authors evaluate CoRaiS through extensive simulations and real-world experiments, demonstrating its ability to outperform existing scheduling approaches in terms of task completion rates, latency, and resource utilization.

Critical Analysis

The paper addresses an important problem in the context of edge computing and multi-edge cooperative computing. The CoRaiS system provides a viable solution for efficiently scheduling tasks across multiple edge devices to enable real-time applications.

However, the paper does not fully address the potential challenges and limitations of the proposed approach. For example, the impact of device heterogeneity, network failures, or dynamic changes in task requirements on the scheduling performance is not thoroughly explored.

Additionally, the paper could have discussed the scalability of CoRaiS as the number of edge devices and tasks increases. It would be interesting to see how the system's performance and overhead scale in larger-scale deployments.

Further research could also explore the integration of CoRaiS with other edge computing frameworks or the potential for extending the scheduling algorithms to support additional factors, such as energy consumption or fairness.

Conclusion

This paper presents CoRaiS, a lightweight real-time scheduler for multi-edge cooperative computing. CoRaiS aims to efficiently manage the scheduling of tasks across multiple edge devices to enable real-time applications like deep learning inference.

The key contribution of the paper is the design and evaluation of CoRaiS, which includes a task allocation algorithm, a real-time scheduling algorithm, and a coordination mechanism to optimize the overall system performance.

The results demonstrate the effectiveness of CoRaiS in outperforming existing scheduling approaches, making it a promising solution for enabling real-time applications at the edge. While the paper addresses an important problem, further research is needed to explore the system's scalability and address potential limitations in dynamic edge computing environments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Collaborative Resource Management and Workloads Scheduling in Cloud-Assisted Mobile Edge Computing across Timescales

Lujie Tang, Minxian Xu, Chengzhong Xu, Kejiang Ye

Due to the limited resource capacity of edge servers and the high purchase costs of edge resources, service providers are facing the new challenge of how to take full advantage of the constrained edge resources for Internet of Things (IoT) service hosting and task scheduling to maximize system performance. In this paper, we study the joint optimization problem on service placement, resource provisioning, and workloads scheduling under resource and budget constraints, which is formulated as a mixed integer non-linear programming problem. Given that the frequent service placement and resource provisioning will significantly increase system configuration costs and instability, we propose a two-timescale framework for resource management and workloads scheduling, named RMWS. RMWS consists of a Gibbs sampling algorithm and an alternating minimization algorithm to determine the service placement and resource provisioning on large timescales. And a sub-gradient descent method has been designed to solve the workload scheduling challenge on small timescales.We conduct comprehensive experiments under different parameter settings. The RMWS consistently ensures a minimum 10% performance enhancement compared to other algorithms, showcasing its superiority. Theoretical proofs are also provided accordingly.

6/3/2024

cs.DC

🔄

Efficient Multi-Processor Scheduling in Increasingly Realistic Models

P'al Andr'as Papp, Georg Anegg, Aikaterini Karanasiou, A. N. Yzelman

We study the problem of efficiently scheduling a computational DAG on multiple processors. The majority of previous works have developed and compared algorithms for this problem in relatively simple models; in contrast to this, we analyze this problem in a more realistic model that captures many real-world aspects, such as communication costs, synchronization costs, and the hierarchical structure of modern processing architectures. For this we extend the well-established BSP model of parallel computing with non-uniform memory access (NUMA) effects. We then develop a range of new scheduling algorithms to minimize the scheduling cost in this more complex setting: several initialization heuristics, a hill-climbing local search method, and several approaches that formulate (and solve) the scheduling problem as an Integer Linear Program (ILP). We combine these algorithms into a single framework, and conduct experiments on a diverse set of real-world computational DAGs to show that the resulting scheduler significantly outperforms both academic and practical baselines. In particular, even without NUMA effects, our scheduler finds solutions of 24%-44% smaller cost on average than the baselines, and in case of NUMA effects, it achieves up to a factor $2.5times$ improvement compared to the baselines. Finally, we also develop a multilevel scheduling algorithm, which provides up to almost a factor $5times$ improvement in the special case when the problem is dominated by very high communication costs.

4/24/2024

cs.DC

Multi-Source Coflow Scheduling in Collaborative Edge Computing with Multihop Network

Yuvraj Sahni, Jiannong Cao, Lei Yang, Shengwei Wang

Collaborative edge computing has become a popular paradigm where edge devices collaborate by sharing resources. Data dissemination is a fundamental problem in CEC to decide what data is transmitted from which device and how. Existing works on data dissemination have not focused on coflow scheduling in CEC, which involves deciding the order of flows within and across coflows at network links. Coflow implies a set of parallel flows with a shared objective. The existing works on coflow scheduling in data centers usually assume a non-blocking switch and do not consider congestion at different links in the multi-hop path in CEC, leading to increased coflow completion time (CCT). Furthermore, existing works do not consider multiple flow sources that cannot be ignored, as data can have duplicate copies at different edge devices. This work formulates the multi-source coflow scheduling problem in CEC, which includes jointly deciding the source and flow ordering for multiple coflows to minimize the sum of CCT. This problem is shown to be NP-hard and challenging as each flow can have multiple dependent conflicts at multiple links. We propose a source and coflow-aware search and adjust (SCASA) heuristic that first provides an initial solution considering the coflow characteristics. SCASA further improves the initial solution using the source search and adjust heuristic by leveraging the knowledge of both coflows and network congestion at links. Evaluation done using simulation experiments shows that SCASA leads to up to 83% reduction in the sum of CCT compared to benchmarks without a joint solution.

5/30/2024

cs.NI cs.DC

Intelligent Hybrid Resource Allocation in MEC-assisted RAN Slicing Network

Chong Zheng, Yongming Huang, Cheng Zhang, Tony Q. S. Quek

In this paper, we aim to maximize the SSR for heterogeneous service demands in the cooperative MEC-assisted RAN slicing system by jointly considering the multi-node computing resources cooperation and allocation, the transmission resource blocks (RBs) allocation, and the time-varying dynamicity of the system. To this end, we abstract the system into a weighted undirected topology graph and, then propose a recurrent graph reinforcement learning (RGRL) algorithm to intelligently learn the optimal hybrid RA policy. Therein, the graph neural network (GCN) and the deep deterministic policy gradient (DDPG) is combined to effectively extract spatial features from the equivalent topology graph. Furthermore, a novel time recurrent reinforcement learning framework is designed in the proposed RGRL algorithm by incorporating the action output of the policy network at the previous moment into the state input of the policy network at the subsequent moment, so as to cope with the time-varying and contextual network environment. In addition, we explore two use case scenarios to discuss the universal superiority of the proposed RGRL algorithm. Simulation results demonstrate the superiority of the proposed algorithm in terms of the average SSR, the performance stability, and the network complexity.

5/29/2024

cs.NI cs.AI cs.LG