ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

Read original: arXiv:2404.16812 - Published 4/26/2024 by Xinning Hui, Yuanchao Xu, Zhishan Guo, Xipeng Shen

🌿

Overview

Increasing interest in machine learning on serverless computing due to its auto-scaling and cost-effective properties
Existing serverless computing lacks effective job scheduling methods to handle the expanded scheduling space from GPU sharing, task batching, and inter-task relations
Prior solutions have neglected important factors, leaving performance potential untapped
This paper presents ESG, a new scheduling algorithm that directly addresses these difficulties

Plain English Explanation

The paper discusses a new scheduling algorithm called ESG that aims to improve the performance and cost-effectiveness of machine learning workloads running on serverless computing platforms.

Serverless computing is appealing because it can automatically scale resources up and down as needed, and only charges users for the compute time they actually use. However, existing serverless systems struggle to effectively schedule machine learning jobs, especially when they involve sharing GPUs, batching tasks together, or have dependencies between tasks.

Prior solutions have tried to address this problem, but have overlooked important factors, leaving significant room for improvement.

The ESG algorithm proposed in this paper takes a more comprehensive approach. It treats GPU sharing as a first-class concern in the scheduling process, and uses an optimality-guided adaptive search technique to efficiently explore the expanded scheduling space without compromising the quality of the schedules. ESG also introduces a novel method, called dominator-based SLO distribution, to ensure the scalability of the scheduler as the workload grows.

The results show that ESG can significantly improve the rate at which service-level objectives (SLOs) are met, by 61-80%, while also reducing costs by 47-187% compared to prior approaches.

Technical Explanation

The paper presents ESG, a new scheduling algorithm for machine learning workloads on serverless computing platforms. ESG directly addresses the challenges posed by the expanded scheduling space created by GPU sharing, task batching, and inter-task dependencies.

ESG employs an optimality-guided adaptive search method that combines A*-search with a novel dual-blade pruning technique. This allows it to dramatically reduce the scheduling space that needs to be explored without compromising the quality of the generated schedules.

Additionally, ESG introduces a novel "dominator-based SLO distribution" approach to ensure the scalability of the scheduler as the workload grows. This method distributes SLO targets across tasks in a way that accounts for their relative importance and interdependencies.

The authors evaluate ESG against prior scheduling approaches like deep reinforcement learning-based scheduling and realistic multi-processor scheduling. The results show that ESG can improve SLO hit rates by 61-80% while reducing costs by 47-187%.

Critical Analysis

The paper provides a comprehensive and thoughtful solution to the challenge of scheduling machine learning workloads on serverless platforms. The authors have clearly identified the key limitations of existing approaches and designed ESG to directly address them.

One potential area for further research mentioned in the paper is extending ESG to handle more complex task dependencies, such as directed acyclic graphs (DAGs) instead of just linear dependencies. This could further expand the applicability of the algorithm.

Additionally, while the authors have demonstrated the scalability of ESG through the dominator-based SLO distribution approach, it would be interesting to see how the algorithm performs under even greater workload pressure or more heterogeneous computing resources.

Overall, the ESG algorithm presented in this paper represents a significant advancement in the field of job shop scheduling for machine learning on serverless platforms. The authors' focus on the unique challenges of this domain and their innovative solutions are likely to have a meaningful impact on performance modeling and resource optimization in this space.

Conclusion

This paper introduces ESG, a new scheduling algorithm designed to improve the performance and cost-effectiveness of machine learning workloads running on serverless computing platforms. ESG addresses the key limitations of existing scheduling approaches by treating GPU sharing as a first-class concern, using an optimality-guided adaptive search technique, and introducing a novel method for distributing service-level objectives.

The results demonstrate that ESG can significantly enhance SLO hit rates while reducing costs compared to prior solutions. This represents an important advancement in the field of job scheduling for machine learning on serverless infrastructure, with the potential to unlock new performance and efficiency gains for a wide range of applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌿

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

Xinning Hui, Yuanchao Xu, Zhishan Guo, Xipeng Shen

Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some important factors, leaving some large performance potential locked. This paper presents ESG, a new scheduling algorithm that directly addresses the difficulties. ESG treats sharable GPU as a first-order factor in scheduling. It employs an optimality-guided adaptive method by combining A*-search and a novel dual-blade pruning to dramatically prune the scheduling space without compromising the quality. It further introduces a novel method, dominator-based SLO distribution, to ensure the scalability of the scheduler. The results show that ESG can significantly improve the SLO hit rates 61%-80% while saving 47%-187% costs over prior work.

4/26/2024

🤿

SGPRS: Seamless GPU Partitioning Real-Time Scheduler for Periodic Deep Learning Workloads

Amir Fakhim Babaei, Thidapat Chantem

Deep Neural Networks (DNNs) are useful in many applications, including transportation, healthcare, and speech recognition. Despite various efforts to improve accuracy, few works have studied DNN in the context of real-time requirements. Coarse resource allocation and sequential execution in existing frameworks result in underutilization. In this work, we conduct GPU speedup gain analysis and propose SGPRS, the first real-time GPU scheduler considering zero configuration partition switch. The proposed scheduler not only meets more deadlines for parallel tasks but also sustains overall performance beyond the pivot point.

6/17/2024

Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing

Yizhou Luo, Qiang Wang, Shaohuai Shi, Jiaxin Lai, Shuhan Qi, Jiajia Zhang, Xuan Wang

Deep learning (DL) has demonstrated significant success across diverse fields, leading to the construction of dedicated GPU accelerators within GPU clusters for high-quality training services. Efficient scheduler designs for such clusters are vital to reduce operational costs and enhance resource utilization. While recent schedulers have shown impressive performance in optimizing DL job performance and cluster utilization through periodic reallocation or selection of GPU resources, they also encounter challenges such as preemption and migration overhead, along with potential DL accuracy degradation. Nonetheless, few explore the potential benefits of GPU sharing to improve resource utilization and reduce job queuing times. Motivated by these insights, we present a job scheduling model allowing multiple jobs to share the same set of GPUs without altering job training settings. We introduce SJF-BSBF (shortest job first with best sharing benefit first), a straightforward yet effective heuristic scheduling algorithm. SJF-BSBF intelligently selects job pairs for GPU resource sharing and runtime settings (sub-batch size and scheduling time point) to optimize overall performance while ensuring DL convergence accuracy through gradient accumulation. In experiments with both physical DL workloads and trace-driven simulations, even as a preemption-free policy, SJF-BSBF reduces the average job completion time by 27-33% relative to the state-of-the-art preemptive DL schedulers. Moreover, SJF-BSBF can wisely determine the optimal resource sharing settings, such as the sharing time point and sub-batch size for gradient accumulation, outperforming the aggressive GPU sharing approach (baseline SJF-FFS policy) by up to 17% in large-scale traces.

7/19/2024

Learning Interpretable Scheduling Algorithms for Data Processing Clusters

Zhibo Hu (Hye-Young), Chen Wang (Hye-Young), Helen (Hye-Young), Paik, Yanfeng Shu, Liming Zhu

Workloads in data processing clusters are often represented in the form of DAG (Directed Acyclic Graph) jobs. Scheduling DAG jobs is challenging. Simple heuristic scheduling algorithms are often adopted in practice in production data centres. There is much room for scheduling performance optimisation for cost saving. Recently, reinforcement learning approaches (like decima) have been attempted to optimise DAG job scheduling and demonstrate clear performance gain in comparison to traditional algorithms. However, reinforcement learning (RL) approaches face their own problems in real-world deployment. In particular, their black-box decision making processes and generalizability in unseen workloads may add a non-trivial burden to the cluster administrators. Moreover, adapting RL models on unseen workloads often requires significant amount of training data, which leaves edge cases run in a sub-optimal mode. To fill the gap, we propose a new method to distill a simple scheduling policy based on observations of the behaviours of a complex deep learning model. The simple model not only provides interpretability of scheduling decisions, but also adaptive to edge cases easily through tuning. We show that our method achieves high fidelity to the decisions made by deep learning models and outperforms these models when additional heuristics are taken into account.

5/30/2024