SGPRS: Seamless GPU Partitioning Real-Time Scheduler for Periodic Deep Learning Workloads

Read original: arXiv:2406.09425 - Published 6/17/2024 by Amir Fakhim Babaei, Thidapat Chantem

🤿

Overview

Deep Neural Networks (DNNs) are widely used in various applications, including transportation, healthcare, and speech recognition.
Existing DNN frameworks often result in underutilization due to coarse resource allocation and sequential execution.
This paper proposes SGPRS, a real-time GPU scheduler that considers zero-configuration partition switching to improve DNN performance.

Plain English Explanation

Deep Neural Networks (DNNs) are powerful machine learning models that have become increasingly useful in a variety of applications, such as transportation, healthcare, and speech recognition. While researchers have made efforts to improve the accuracy of DNNs, few have studied how to effectively run them in real-time settings.

Existing DNN frameworks often struggle with efficiently utilizing computing resources, such as GPUs, due to their coarse resource allocation and sequential execution approaches. This can lead to underutilization of hardware, which can negatively impact the overall performance of DNN-based applications.

To address this issue, the researchers in this paper propose a new GPU scheduler called SGPRS. SGPRS is designed to consider the real-time requirements of DNN workloads and employ a zero-configuration partition switching approach to improve resource utilization. By doing so, SGPRS not only helps meet more deadlines for parallel DNN tasks but also maintains overall performance beyond a certain "pivot point."

Technical Explanation

The researchers conducted a thorough analysis of GPU speedup gains to understand the performance characteristics of DNN workloads. Based on their findings, they developed SGPRS, a novel real-time GPU scheduler that can efficiently handle parallel DNN tasks.

SGPRS employs a zero-configuration partition switching approach, which allows it to dynamically adjust the GPU resources allocated to different DNN tasks without the need for manual configuration. This flexibility helps SGPRS better utilize the available GPU resources and meet the real-time requirements of DNN applications.

The researchers evaluated SGPRS using various DNN workloads and compared its performance to existing GPU scheduling approaches. Their results demonstrate that SGPRS not only meets more deadlines for parallel tasks but also sustains overall performance beyond a certain "pivot point," where the benefits of SGPRS become more pronounced.

Critical Analysis

The paper provides a valuable contribution to the field of DNN optimization, particularly in the context of real-time requirements. The proposed SGPRS scheduler addresses an important challenge in DNN frameworks, which is the underutilization of computing resources due to coarse resource allocation and sequential execution.

However, the paper does not explore the potential limitations or caveats of the SGPRS approach. For example, it would be interesting to understand how SGPRS performs under different workload characteristics or resource constraints, and whether there are any edge cases where its benefits may not be as pronounced.

Additionally, the paper could have delved deeper into the potential implications of SGPRS for various DNN-based applications, such as how it might impact the overall performance and responsiveness of real-time systems in domains like transportation, healthcare, or speech recognition.

Conclusion

This paper presents a novel GPU scheduler, SGPRS, that addresses the underutilization of computing resources in existing DNN frameworks. By employing a zero-configuration partition switching approach, SGPRS can efficiently manage parallel DNN tasks and meet their real-time requirements, leading to improved overall performance.

The proposed SGPRS scheduler represents an important step forward in optimizing the execution of DNN workloads, particularly in time-sensitive applications. As the use of DNNs continues to grow across various domains, techniques like SGPRS will become increasingly crucial for ensuring the reliable and responsive deployment of these powerful machine learning models.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤿

SGPRS: Seamless GPU Partitioning Real-Time Scheduler for Periodic Deep Learning Workloads

Amir Fakhim Babaei, Thidapat Chantem

Deep Neural Networks (DNNs) are useful in many applications, including transportation, healthcare, and speech recognition. Despite various efforts to improve accuracy, few works have studied DNN in the context of real-time requirements. Coarse resource allocation and sequential execution in existing frameworks result in underutilization. In this work, we conduct GPU speedup gain analysis and propose SGPRS, the first real-time GPU scheduler considering zero configuration partition switch. The proposed scheduler not only meets more deadlines for parallel tasks but also sustains overall performance beyond the pivot point.

6/17/2024

🌿

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

Xinning Hui, Yuanchao Xu, Zhishan Guo, Xipeng Shen

Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some important factors, leaving some large performance potential locked. This paper presents ESG, a new scheduling algorithm that directly addresses the difficulties. ESG treats sharable GPU as a first-order factor in scheduling. It employs an optimality-guided adaptive method by combining A*-search and a novel dual-blade pruning to dramatically prune the scheduling space without compromising the quality. It further introduces a novel method, dominator-based SLO distribution, to ensure the scalability of the scheduler. The results show that ESG can significantly improve the SLO hit rates 61%-80% while saving 47%-187% costs over prior work.

4/26/2024

🏅

Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach

Urvij Saroliya, Eishi Arima, Dai Liu, Martin Schulz

GPU-based heterogeneous architectures are now commonly used in HPC clusters. Due to their architectural simplicity specialized for data-level parallelism, GPUs can offer much higher computational throughput and memory bandwidth than CPUs in the same generation do. However, as the available resources in GPUs have increased exponentially over the past decades, it has become increasingly difficult for a single program to fully utilize them. As a consequence, the industry has started supporting several resource partitioning features in order to improve the resource utilization by co-scheduling multiple programs on the same GPU die at the same time. Driven by the technological trend, this paper focuses on hierarchical resource partitioning on modern GPUs, and as an example, we utilize a combination of two different features available on recent NVIDIA GPUs in a hierarchical manner: MPS (Multi-Process Service), a finer-grained logical partitioning; and MIG (Multi-Instance GPU), a coarse-grained physical partitioning. We propose a method for comprehensively co-optimizing the setup of hierarchical partitioning and the selection of co-scheduling groups from a given set of jobs, based on reinforcement learning using their profiles. Our thorough experimental results demonstrate that our approach can successfully set up job concurrency, partitioning, and co-scheduling group selections simultaneously. This results in a maximum throughput improvement by a factor of 1.87 compared to the time-sharing scheduling.

5/15/2024

Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing

Yizhou Luo, Qiang Wang, Shaohuai Shi, Jiaxin Lai, Shuhan Qi, Jiajia Zhang, Xuan Wang

Deep learning (DL) has demonstrated significant success across diverse fields, leading to the construction of dedicated GPU accelerators within GPU clusters for high-quality training services. Efficient scheduler designs for such clusters are vital to reduce operational costs and enhance resource utilization. While recent schedulers have shown impressive performance in optimizing DL job performance and cluster utilization through periodic reallocation or selection of GPU resources, they also encounter challenges such as preemption and migration overhead, along with potential DL accuracy degradation. Nonetheless, few explore the potential benefits of GPU sharing to improve resource utilization and reduce job queuing times. Motivated by these insights, we present a job scheduling model allowing multiple jobs to share the same set of GPUs without altering job training settings. We introduce SJF-BSBF (shortest job first with best sharing benefit first), a straightforward yet effective heuristic scheduling algorithm. SJF-BSBF intelligently selects job pairs for GPU resource sharing and runtime settings (sub-batch size and scheduling time point) to optimize overall performance while ensuring DL convergence accuracy through gradient accumulation. In experiments with both physical DL workloads and trace-driven simulations, even as a preemption-free policy, SJF-BSBF reduces the average job completion time by 27-33% relative to the state-of-the-art preemptive DL schedulers. Moreover, SJF-BSBF can wisely determine the optimal resource sharing settings, such as the sharing time point and sub-batch size for gradient accumulation, outperforming the aggressive GPU sharing approach (baseline SJF-FFS policy) by up to 17% in large-scale traces.

7/19/2024