Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator Systems

2404.08950

Published 4/16/2024 by Francesco G. Blanco, Enrico Russo, Maurizio Palesi, Davide Patti, Giuseppe Ascia, Vincenzo Catania

cs.AR cs.DC cs.LG

Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator Systems

Abstract

Currently, there is a growing trend of outsourcing the execution of DNNs to cloud services. For service providers, managing multi-tenancy and ensuring high-quality service delivery, particularly in meeting stringent execution time constraints, assumes paramount importance, all while endeavoring to maintain cost-effectiveness. In this context, the utilization of heterogeneous multi-accelerator systems becomes increasingly relevant. This paper presents RELMAS, a low-overhead deep reinforcement learning algorithm designed for the online scheduling of DNNs in multi-tenant environments, taking into account the dataflow heterogeneity of accelerators and memory bandwidths contentions. By doing so, service providers can employ the most efficient scheduling policy for user requests, optimizing Service-Level-Agreement (SLA) satisfaction rates and enhancing hardware utilization. The application of RELMAS to a heterogeneous multi-accelerator system composed of various instances of Simba and Eyeriss sub-accelerators resulted in up to a 173% improvement in SLA satisfaction rate compared to state-of-the-art scheduling techniques across different workload scenarios, with less than a 1.5% energy overhead.

Create account to get full access

Overview

This paper presents a deep reinforcement learning-based online scheduling policy for managing the allocation of deep neural network workloads on multi-tenant, multi-accelerator systems.
The proposed approach aims to optimize resource utilization and minimize latency for individual tasks while considering fairness across multiple tenants.
The research explores the use of reinforcement learning techniques to develop a dynamic scheduling algorithm that adapts to the changing workload and system conditions.

Plain English Explanation

In modern computing environments, multiple users or "tenants" often share access to powerful hardware accelerators like GPUs to run their deep learning models. However, efficiently managing the allocation of these resources can be a complex challenge. This paper introduces a new approach that uses deep reinforcement learning to create an intelligent scheduling system.

The key idea is to train an AI agent that can learn how to best assign deep learning tasks to the available accelerators over time. This agent observes the current state of the system, including the pending tasks, resource utilization, and performance metrics. Based on this information, it decides which task to schedule next in order to optimize overall system performance.

The goal is to minimize the latency experienced by individual tasks while also ensuring fair access to resources across all the users. This is important because some users may have more urgent or compute-intensive workloads than others, and the system needs to balance their needs.

By using reinforcement learning, the scheduling agent can adaptively learn the best policies over time as the workload patterns and system conditions change. This makes the approach more robust and responsive compared to static scheduling algorithms.

Technical Explanation

The paper formulates the multi-tenant, multi-accelerator scheduling problem as a Markov Decision Process (MDP), which is a standard framework for modeling sequential decision-making tasks in reinforcement learning. The state of the MDP represents the current configuration of the system, including the set of pending tasks, the available accelerators, and their utilization levels.

The agents' actions correspond to scheduling decisions - i.e., which task to assign to which accelerator next. The reward function is designed to capture the competing objectives of minimizing task latency and maintaining fairness across tenants.

The authors propose a novel deep reinforcement learning algorithm called ODRA (Online DRL-based Resource Allocation) to train the scheduling agent. ODRA combines elements of deep Q-learning and policy gradient methods to efficiently explore the large state-action space.

Extensive experiments are conducted using synthetic workloads as well as real-world traces from production GPU clusters. The results demonstrate that ODRA significantly outperforms several baseline scheduling policies in terms of average task latency, fairness, and resource utilization.

Critical Analysis

The paper presents a compelling approach to the challenging problem of multi-tenant resource management in accelerator-based systems. The use of reinforcement learning to develop an adaptive scheduling policy is a promising direction, as it can potentially capture complex, time-varying dependencies that are difficult to model explicitly.

However, the authors acknowledge several limitations and areas for future work. For example, the current formulation assumes that task characteristics and resource requirements are known a priori, which may not always be the case in practice. Extending the approach to handle uncertainty or online task arrival would be an important next step.

Additionally, the experiments are conducted on simulated workloads, and validating the approach on real-world production systems with diverse and dynamic workloads would be valuable to assess its practical efficacy. Integrating the scheduling algorithm with other system-level mechanisms, such as task preemption or migration, could also be an interesting direction to explore.

Overall, this work represents an important contribution to the field of resource management for multi-tenant accelerator-based systems. The use of deep reinforcement learning techniques to develop adaptive, performance-optimized scheduling policies is a promising direction that warrants further investigation and validation.

Conclusion

This paper presents a deep reinforcement learning-based scheduling policy for managing the allocation of deep neural network workloads across multi-tenant, multi-accelerator systems. The proposed approach, called ODRA, aims to optimize resource utilization and task latency while maintaining fairness across tenants.

The key innovation is the use of reinforcement learning to adaptively learn scheduling policies that can adapt to changing workload patterns and system conditions. Extensive experiments demonstrate the effectiveness of ODRA in outperforming baseline scheduling algorithms on various performance metrics.

While the current work has some limitations, it represents an important step towards developing more intelligent and responsive resource management systems for accelerator-based computing environments. Further research is needed to address the identified challenges and validate the approach on real-world production workloads.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Learning Interpretable Scheduling Algorithms for Data Processing Clusters

Zhibo Hu (Hye-Young), Chen Wang (Hye-Young), Helen (Hye-Young), Paik, Yanfeng Shu, Liming Zhu

Workloads in data processing clusters are often represented in the form of DAG (Directed Acyclic Graph) jobs. Scheduling DAG jobs is challenging. Simple heuristic scheduling algorithms are often adopted in practice in production data centres. There is much room for scheduling performance optimisation for cost saving. Recently, reinforcement learning approaches (like decima) have been attempted to optimise DAG job scheduling and demonstrate clear performance gain in comparison to traditional algorithms. However, reinforcement learning (RL) approaches face their own problems in real-world deployment. In particular, their black-box decision making processes and generalizability in unseen workloads may add a non-trivial burden to the cluster administrators. Moreover, adapting RL models on unseen workloads often requires significant amount of training data, which leaves edge cases run in a sub-optimal mode. To fill the gap, we propose a new method to distill a simple scheduling policy based on observations of the behaviours of a complex deep learning model. The simple model not only provides interpretability of scheduling decisions, but also adaptive to edge cases easily through tuning. We show that our method achieves high fidelity to the decisions made by deep learning models and outperforms these models when additional heuristics are taken into account.

5/30/2024

cs.DC

An Advanced Reinforcement Learning Framework for Online Scheduling of Deferrable Workloads in Cloud Computing

Hang Dong, Liwen Zhu, Zhao Shan, Bo Qiao, Fangkai Yang, Si Qin, Chuan Luo, Qingwei Lin, Yuwen Yang, Gurpreet Virdi, Saravan Rajmohan, Dongmei Zhang, Thomas Moscibroda

Efficient resource utilization and perfect user experience usually conflict with each other in cloud computing platforms. Great efforts have been invested in increasing resource utilization but trying not to affect users' experience for cloud computing platforms. In order to better utilize the remaining pieces of computing resources spread over the whole platform, deferrable jobs are provided with a discounted price to users. For this type of deferrable jobs, users are allowed to submit jobs that will run for a specific uninterrupted duration in a flexible range of time in the future with a great discount. With these deferrable jobs to be scheduled under the remaining capacity after deploying those on-demand jobs, it remains a challenge to achieve high resource utilization and meanwhile shorten the waiting time for users as much as possible in an online manner. In this paper, we propose an online deferrable job scheduling method called textit{Online Scheduling for DEferrable jobs in Cloud} (OSDEC{}), where a deep reinforcement learning model is adopted to learn the scheduling policy, and several auxiliary tasks are utilized to provide better state representations and improve the performance of the model. With the integrated reinforcement learning framework, the proposed method can well plan the deployment schedule and achieve a short waiting time for users while maintaining a high resource utilization for the platform. The proposed method is validated on a public dataset and shows superior performance.

6/4/2024

cs.DC cs.AI cs.LG

New!Reinforcement Learning-driven Data-intensive Workflow Scheduling for Volunteer Edge-Cloud

Motahare Mounesan, Mauro Lemus, Hemanth Yeddulapalli, Prasad Calyam, Saptarshi Debroy

In recent times, Volunteer Edge-Cloud (VEC) has gained traction as a cost-effective, community computing paradigm to support data-intensive scientific workflows. However, due to the highly distributed and heterogeneous nature of VEC resources, centralized workflow task scheduling remains a challenge. In this paper, we propose a Reinforcement Learning (RL)-driven data-intensive scientific workflow scheduling approach that takes into consideration: i) workflow requirements, ii) VEC resources' preference on workflows, and iii) diverse VEC resource policies, to ensure robust resource allocation. We formulate the long-term average performance optimization problem as a Markov Decision Process, which is solved using an event-based Asynchronous Advantage Actor-Critic RL approach. Our extensive simulations and testbed implementations demonstrate our approach's benefits over popular baseline strategies in terms of workflow requirement satisfaction, VEC preference satisfaction, and available VEC resource utilization.

7/2/2024

cs.DC cs.AI

Online Frequency Scheduling by Learning Parallel Actions

Anastasios Giovanidis, Mathieu Leconte, Sabrine Aroua, Tor Kvernvik, David Sandberg

Radio Resource Management is a challenging topic in future 6G networks where novel applications create strong competition among the users for the available resources. In this work we consider the frequency scheduling problem in a multi-user MIMO system. Frequency resources need to be assigned to a set of users while allowing for concurrent transmissions in the same sub-band. Traditional methods are insufficient to cope with all the involved constraints and uncertainties, whereas reinforcement learning can directly learn near-optimal solutions for such complex environments. However, the scheduling problem has an enormous action space accounting for all the combinations of users and sub-bands, so out-of-the-box algorithms cannot be used directly. In this work, we propose a scheduler based on action-branching over sub-bands, which is a deep Q-learning architecture with parallel decision capabilities. The sub-bands learn correlated but local decision policies and altogether they optimize a global reward. To improve the scaling of the architecture with the number of sub-bands, we propose variations (Unibranch, Graph Neural Network-based) that reduce the number of parameters to learn. The parallel decision making of the proposed architecture allows to meet short inference time requirements in real systems. Furthermore, the deep Q-learning approach permits online fine-tuning after deployment to bridge the sim-to-real gap. The proposed architectures are evaluated against relevant baselines from the literature showing competitive performance and possibilities of online adaptation to evolving environments.

6/10/2024

cs.NI cs.LG cs.MA