Portable acceleration of CMS computing workflows with coprocessors as a service

Read original: arXiv:2402.15366 - Published 9/9/2024 by CMS Collaboration

Portable acceleration of CMS computing workflows with coprocessors as a service

Overview

This paper explores the use of coprocessors as a service to accelerate computing workflows for the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC).
The authors aim to leverage coprocessors, such as graphics processing units (GPUs) and field-programmable gate arrays (FPGAs), to speed up computationally intensive tasks in CMS computing workflows.
The research focuses on developing a portable and scalable system that can efficiently utilize coprocessor resources across different computing environments.

Plain English Explanation

The CMS detector is a massive experiment that collects and analyzes vast amounts of data from the LHC. To make sense of this data, CMS researchers rely on complex software and computing infrastructure. However, some of these tasks can be incredibly computationally intensive, slowing down the analysis process.

The researchers in this paper propose a solution to this problem: using coprocessors as a service. Coprocessors are specialized hardware, like GPUs and FPGAs, that can take on specific types of computations much faster than regular computer processors. The idea is to create a system that allows CMS researchers to easily access and use these coprocessors to speed up their workflows, without having to worry about the underlying hardware and software complexity.

The key benefits of this approach are portability and scalability. The system is designed to work across different computing environments, so researchers can use it on their local machines, in cloud-based infrastructure, or on shared high-performance computing resources. And as the computational demands of CMS workflows grow, the system can scale up to utilize more coprocessor resources.

Technical Explanation

The paper presents a system architecture that allows CMS computing workflows to leverage coprocessors as a service. The main components of the system include:

Coprocessor Pools: These are collections of coprocessors (GPUs, FPGAs, etc.) that can be accessed and utilized by the CMS workflows.
Workflow Orchestrator: This component manages the execution of CMS workflows, identifying tasks that can be accelerated by coprocessors and submitting them to the appropriate coprocessor pool.
Coprocessor Scheduler: This module is responsible for efficiently scheduling and executing tasks on the available coprocessor resources, ensuring optimal utilization.
Workflow Profiler: This component analyzes the performance characteristics of CMS workflows and their individual tasks, providing insights to the Workflow Orchestrator and Coprocessor Scheduler.

The researchers evaluate the performance of this system using a range of CMS computing workflows, including event reconstruction, data analysis, and machine learning tasks. They demonstrate significant speedups compared to CPU-only execution, highlighting the benefits of leveraging coprocessors for computationally intensive tasks.

Critical Analysis

The paper provides a comprehensive and well-designed solution for accelerating CMS computing workflows using coprocessors. The authors have addressed important challenges such as portability, scalability, and efficient resource utilization.

However, the paper does not fully address the potential challenges of deploying and managing such a system in a large-scale, distributed computing environment like the CMS infrastructure. Issues such as fault tolerance, load balancing, and integration with existing CMS computing frameworks may require further investigation.

Additionally, the paper focuses primarily on the technical aspects of the system and does not delve deeply into the broader implications or potential limitations of this approach. For example, the reliance on specialized hardware (GPUs, FPGAs) may introduce additional complexity and costs, which could be a concern for some CMS computing centers.

Conclusion

This research represents a significant step forward in leveraging coprocessors to accelerate CMS computing workflows. By providing a portable and scalable system, the authors have created a valuable tool that can help CMS researchers analyze their data more efficiently and effectively.

As the computational demands of the CMS experiment continue to grow, solutions like this will become increasingly important. The insights and techniques presented in this paper could also have broader applications in other fields that rely on large-scale, computationally intensive data processing and analysis.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Portable acceleration of CMS computing workflows with coprocessors as a service

CMS Collaboration

Computing demands for large scientific experiments, such as the CMS experiment at the CERN LHC, will increase dramatically in the next decades. To complement the future performance increases of software running on central processing units (CPUs), explorations of coprocessor usage in data processing hold great potential and interest. Coprocessors are a class of computer processors that supplement CPUs, often improving the execution of certain functions due to architectural design choices. We explore the approach of Services for Optimized Network Inference on Coprocessors (SONIC) and study the deployment of this as-a-service approach in large-scale data processing. In the studies, we take a data processing workflow of the CMS experiment and run the main workflow on CPUs, while offloading several machine learning (ML) inference tasks onto either remote or local coprocessors, specifically graphics processing units (GPUs). With experiments performed at Google Cloud, the Purdue Tier-2 computing center, and combinations of the two, we demonstrate the acceleration of these ML algorithms individually on coprocessors and the corresponding throughput improvement for the entire workflow. This approach can be easily generalized to different types of coprocessors and deployed on local CPUs without decreasing the throughput performance. We emphasize that the SONIC approach enables high coprocessor usage and enables the portability to run workflows on different types of coprocessors.

9/9/2024

🤯

Inference Acceleration for Large Language Models on CPUs

Ditto PS, Jithin VG, Adarsh MS

In recent years, large language models have demonstrated remarkable performance across various natural language processing (NLP) tasks. However, deploying these models for real-world applications often requires efficient inference solutions to handle the computational demands. In this paper, we explore the utilization of CPUs for accelerating the inference of large language models. Specifically, we introduce a parallelized approach to enhance throughput by 1) Exploiting the parallel processing capabilities of modern CPU architectures, 2) Batching the inference request. Our evaluation shows the accelerated inference engine gives an 18-22x improvement in the generated token per sec. The improvement is more with longer sequence and larger models. In addition to this, we can also run multiple workers in the same machine with NUMA node isolation to further improvement in tokens/s. Table 2, we have received 4x additional improvement with 4 workers. This would also make Gen-AI based products and companies environment friendly, our estimates shows that CPU usage for Inference could reduce the power consumption of LLMs by 48.9% while providing production ready throughput and latency.

6/13/2024

🤷

The integration of heterogeneous resources in the CMS Submission Infrastructure for the LHC Run 3 and beyond

Antonio Perez-Calero Yzquierdo, Marco Mascheroni, Edita Kizinevic, Farrukh Aftab Khan, Hyunwoo Kim, Maria Acosta Flechas, Nikos Tsipinakis, Saqib Haleem

While the computing landscape supporting LHC experiments is currently dominated by x86 processors at WLCG sites, this configuration will evolve in the coming years. LHC collaborations will be increasingly employing HPC and Cloud facilities to process the vast amounts of data expected during the LHC Run 3 and the future HL-LHC phase. These facilities often feature diverse compute resources, including alternative CPU architectures like ARM and IBM Power, as well as a variety of GPU specifications. Using these heterogeneous resources efficiently is thus essential for the LHC collaborations reaching their future scientific goals. The Submission Infrastructure (SI) is a central element in CMS Computing, enabling resource acquisition and exploitation by CMS data processing, simulation and analysis tasks. The SI must therefore be adapted to ensure access and optimal utilization of this heterogeneous compute capacity. Some steps in this evolution have been already taken, as CMS is currently using opportunistically a small pool of GPU slots provided mainly at the CMS WLCG sites. Additionally, Power9 processors have been validated for CMS production at the Marconi-100 cluster at CINECA. This note will describe the updated capabilities of the SI to continue ensuring the efficient allocation and use of computing resources by CMS, despite their increasing diversity. The next steps towards a full integration and support of heterogeneous resources according to CMS needs will also be reported.

5/24/2024

📉

HPC resources for CMS offline computing: An integration and scalability challenge for the Submission Infrastructure

Antonio Perez-Calero Yzquierdo, Marco Mascheroni, Edita Kizinevic, Farrukh Aftab Khan, Hyunwoo Kim, Maria Acosta Flechas, Nikos Tsipinakis, Saqib Haleem

The computing resource needs of LHC experiments are expected to continue growing significantly during the Run 3 and into the HL-LHC era. The landscape of available resources will also evolve, as High Performance Computing (HPC) and Cloud resources will provide a comparable, or even dominant, fraction of the total compute capacity. The future years present a challenge for the experiments' resource provisioning models, both in terms of scalability and increasing complexity. The CMS Submission Infrastructure (SI) provisions computing resources for CMS workflows. This infrastructure is built on a set of federated HTCondor pools, currently aggregating 400k CPU cores distributed worldwide and supporting the simultaneous execution of over 200k computing tasks. Incorporating HPC resources into CMS computing represents firstly an integration challenge, as HPC centers are much more diverse compared to Grid sites. Secondly, evolving the present SI, dimensioned to harness the current CMS computing capacity, to reach the resource scales required for the HLLHC phase, while maintaining global flexibility and efficiency, will represent an additional challenge for the SI. To preventively address future potential scalability limits, the SI team regularly runs tests to explore the maximum reach of our infrastructure. In this note, the integration of HPC resources into CMS offline computing is summarized, the potential concerns for the SI derived from the increased scale of operations are described, and the most recent results of scalability test on the CMS SI are reported.

5/24/2024