NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads

Read original: arXiv:2404.11788 - Published 4/19/2024 by Rachid Karami, Hemanth Kota, Sheng-Chun Kao, Hyoukjun Kwon
Total Score

0

NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

• This paper, "NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads," examines the performance of modern machine learning (ML) workloads that do not rely heavily on matrix multiplication (GEMM) operations.

• The researchers developed a new benchmark suite called NonGEMM Bench to measure the performance of these non-GEMM ML workloads on various hardware platforms, including Balanced Data Placement for GEMV Acceleration in Processing-in-Memory, GNNBench: Fair and Productive Benchmarking of Single-GPU Graph Neural Networks, Megaverse: Benchmarking Large Language Models Across Languages, and NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLMs.

Plain English Explanation

• The paper focuses on a specific type of machine learning (ML) workloads that do not rely heavily on matrix multiplication, which is a common operation in many ML models. These non-GEMM workloads are becoming increasingly important as the field of ML advances.

• The researchers created a new benchmark suite called NonGEMM Bench to measure the performance of these non-GEMM ML workloads on different hardware platforms, including some specialized hardware like processing-in-memory (PIM) and neural processing units (NPUs).

• The goal of this research is to understand the performance capabilities and limitations of these newer ML workloads that are not well-captured by traditional benchmarks focused on GEMM operations. This will help guide the development of hardware and software optimizations to better support these emerging ML workloads.

Technical Explanation

• The paper introduces NonGEMM Bench, a new suite of benchmarks designed to measure the performance of modern ML workloads that do not rely heavily on GEMM operations. These workloads include graph neural networks, language models, and other ML models that utilize a variety of operations beyond just matrix multiplication.

• The researchers evaluated the performance of these non-GEMM workloads on a range of hardware platforms, including Balanced Data Placement for GEMV Acceleration in Processing-in-Memory, GNNBench: Fair and Productive Benchmarking of Single-GPU Graph Neural Networks, Megaverse: Benchmarking Large Language Models Across Languages, and NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLMs.

• The results show that the performance of these non-GEMM workloads can vary significantly from traditional GEMM-centric benchmarks, highlighting the need for more comprehensive benchmarking approaches to guide the development of hardware and software optimizations for emerging ML workloads.

Critical Analysis

• The paper provides a valuable contribution by introducing a new benchmark suite specifically designed for non-GEMM ML workloads, which are becoming increasingly important in the field of ML.

• One potential limitation of the research is that the benchmark suite may not capture the full diversity of non-GEMM ML workloads, as the authors acknowledge. There may be additional types of operations or architectural requirements that are not well-represented in the current suite.

• Additionally, the paper does not delve deeply into the specific reasons for the performance differences observed between GEMM-centric and non-GEMM workloads, which could be an area for further investigation. Understanding the underlying architectural and algorithmic factors driving these performance differences could lead to more targeted hardware and software optimizations.

Conclusion

• This paper presents a new benchmark suite called NonGEMM Bench that aims to better understand the performance characteristics of modern ML workloads that do not rely heavily on GEMM operations.

• The findings from this research highlight the importance of considering a wider range of ML workloads when designing and optimizing hardware and software for emerging AI applications. The development of more comprehensive benchmarking approaches, like NonGEMM Bench, can help guide the future of ML hardware and software co-design.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads
Total Score

0

NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads

Rachid Karami, Hemanth Kota, Sheng-Chun Kao, Hyoukjun Kwon

Machine Learning (ML) operators are the building blocks to design ML models with various target applications. GEneral Matrix Multiplication (GEMM) operators are the backbone of ML models. They are notorious for being computationally expensive requiring billions of multiply-and-accumulate. Therefore, significant effort has been put to study and optimize the GEMM operators in order to speed up the execution of ML models. GPUs and accelerators are widely deployed to accelerate ML workloads by optimizing the execution of GEMM operators. Nonetheless, the performance of NonGEMM operators have not been studied as thoroughly as GEMMs. Therefore, this paper describes bench, a benchmark to study NonGEMM operators. We first construct bench using popular ML workloads from different domains, then perform case studies on various grade GPU platforms to analyze the behavior of NonGEMM operators in GPU accelerated systems. Finally, we present some key takeaways to bridge the gap between GEMM and NonGEMM operators and to offer the community with potential new optimization directions.

Read more

4/19/2024

Enabling Accelerators for Graph Computing
Total Score

0

Enabling Accelerators for Graph Computing

Kaustubh Shivdikar

The advent of Graph Neural Networks (GNNs) has revolutionized the field of machine learning, offering a novel paradigm for learning on graph-structured data. Unlike traditional neural networks, GNNs are capable of capturing complex relationships and dependencies inherent in graph data, making them particularly suited for a wide range of applications including social network analysis, molecular chemistry, and network security. GNNs, with their unique structure and operation, present new computational challenges compared to conventional neural networks. This requires comprehensive benchmarking and a thorough characterization of GNNs to obtain insight into their computational requirements and to identify potential performance bottlenecks. In this thesis, we aim to develop a better understanding of how GNNs interact with the underlying hardware and will leverage this knowledge as we design specialized accelerators and develop new optimizations, leading to more efficient and faster GNN computations. A pivotal component within GNNs is the Sparse General Matrix-Matrix Multiplication (SpGEMM) kernel, known for its computational intensity and irregular memory access patterns. In this thesis, we address the challenges posed by SpGEMM by implementing a highly optimized hashing-based SpGEMM kernel tailored for a custom accelerator. Synthesizing these insights and optimizations, we design state-of-the-art hardware accelerators capable of efficiently handling various GNN workloads. Our accelerator architectures are built on our characterization of GNN computational demands, providing clear motivation for our approaches. This exploration into novel models underlines our comprehensive approach, as we strive to enable accelerators that are not just performant, but also versatile, able to adapt to the evolving landscape of graph computing.

Read more

5/7/2024

Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication
Total Score

0

Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication

Sanjali Yadav, Bahar Asgari

Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in numerous fields, including scientific computing, graph analytics, and deep learning. These applications exploit the sparsity of matrices to reduce storage and computational demands. However, the irregular structure of sparse matrices poses significant challenges for performance optimization. Traditional hardware accelerators are tailored for specific sparsity patterns with fixed dataflow schemes - inner, outer, and row-wise but often perform suboptimally when the actual sparsity deviates from these predetermined patterns. As the use of SpGEMM expands across various domains, each with distinct sparsity characteristics, the demand for hardware accelerators that can efficiently handle a range of sparsity patterns is increasing. This paper presents a machine learning based approach for adaptively selecting the most appropriate dataflow scheme for SpGEMM tasks with diverse sparsity patterns. By employing decision trees and deep reinforcement learning, we explore the potential of these techniques to surpass heuristic-based methods in identifying optimal dataflow schemes. We evaluate our models by comparing their performance with that of a heuristic, highlighting the strengths and weaknesses of each approach. Our findings suggest that using machine learning for dynamic dataflow selection in hardware accelerators can provide upto 28 times gains.

Read more

8/30/2024

🧠

Total Score

0

NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

Ruiqi Sun, Siwei Ye, Jie Zhao, Xin He, Jianzhe Lin, Yiran Li, An Zou

The inherent diversity of computation types within the deep neural network (DNN) models often requires a variety of specialized units in hardware processors, which limits computational efficiency, increasing both inference latency and power consumption, especially when the hardware processor needs to support and execute different neural networks. In this study, we introduce NeuralMatrix, which elastically transforms the computations of entire DNNs into linear matrix operations. This transformation allows seamless execution of various DNN models all with matrix operations and paves the way for running versatile DNN models with a single General Matrix Multiplication (GEMM) accelerator.Extensive experiments with both CNN and transformer-based models demonstrate the potential of NeuralMatrix to accurately and efficiently execute a wide range of DNN models, achieving 2.17-38.72 times computation efficiency (i.e., throughput per power) compared to CPUs, GPUs, and SoC platforms. This level of efficiency is usually only attainable with the accelerator designed for a specific neural network.

Read more

8/21/2024