MPIrigen: MPI Code Generation through Domain-Specific Language Models

2402.09126

Published 4/24/2024 by Nadav Schneider, Niranjan Hasabnis, Vy A. Vo, Tal Kadosh, Neva Krien, Mihai Capotu{a}, Guy Tamir, Ted Willke, Nesreen Ahmed, Yuval Pinter and 2 others

cs.DC cs.AI cs.CL cs.LG cs.SE

🛸

Abstract

The imperative need to scale computation across numerous nodes highlights the significance of efficient parallel computing, particularly in the realm of Message Passing Interface (MPI) integration. The challenging parallel programming task of generating MPI-based parallel programs has remained unexplored. This study first investigates the performance of state-of-the-art language models in generating MPI-based parallel programs. Findings reveal that widely used models such as GPT-3.5 and PolyCoder (specialized multi-lingual code models) exhibit notable performance degradation, when generating MPI-based programs compared to general-purpose programs. In contrast, domain-specific models such as MonoCoder, which are pretrained on MPI-related programming languages of C and C++, outperform larger models. Subsequently, we introduce a dedicated downstream task of MPI-based program generation by fine-tuning MonoCoder on HPCorpusMPI. We call the resulting model as MPIrigen. We propose an innovative preprocessing for completion only after observing the whole code, thus enabling better completion with a wider context. Comparative analysis against GPT-3.5 zero-shot performance, using a novel HPC-oriented evaluation method, demonstrates that MPIrigen excels in generating accurate MPI functions up to 0.8 accuracy in location and function predictions, and with more than 0.9 accuracy for argument predictions. The success of this tailored solution underscores the importance of domain-specific fine-tuning in optimizing language models for parallel computing code generation, paving the way for a new generation of automatic parallelization tools. The sources of this work are available at our GitHub MPIrigen repository: https://github.com/Scientific-Computing-Lab-NRCN/MPI-rigen

Create account to get full access

Overview

The paper explores the challenges of generating efficient MPI-based parallel programs, which are crucial for scaling computation across multiple nodes.
It investigates the performance of state-of-the-art language models, including GPT-3.5 and PolyCoder, in generating MPI-based programs, finding notable performance degradation compared to general-purpose programs.
The paper introduces a dedicated model, called MPIrigen, that is fine-tuned on MPI-related programming languages and demonstrates superior performance in generating accurate MPI functions.

Plain English Explanation

Running computations on a single computer can only go so far. To handle really big problems, we need to split the work across many computers working together in parallel. This is where Message Passing Interface (MPI) comes in – it's a way for different computers to communicate and coordinate their efforts.

However, writing the code to make all these computers work together efficiently is a challenging task. The researchers in this paper looked at how well the latest language models, like GPT-3.5 and PolyCoder, can generate this kind of parallel programming code. Surprisingly, these powerful models struggled when it came to MPI-based programs, not performing as well as they do for more general-purpose code.

To address this, the researchers developed a new model called MPIrigen, which is specifically trained on MPI-related programming languages like C and C++. This tailored approach allows MPIrigen to outperform the larger, more general models, generating MPI functions with over 80% accuracy in predicting the location and function, and over 90% accuracy for the arguments.

The success of this domain-specific model highlights the importance of focusing on the unique challenges of parallel computing when developing AI-powered tools for automatic code generation. This work paves the way for a new generation of tools that can help make parallel programming more accessible and efficient.

Technical Explanation

The paper investigates the performance of state-of-the-art language models, including GPT-3.5 and PolyCoder, in generating MPI-based parallel programs. The findings reveal that these widely used models exhibit notable performance degradation when generating MPI-based programs compared to general-purpose programs.

To address this challenge, the researchers introduce a dedicated downstream task of MPI-based program generation by fine-tuning the MonoCoder model on the HPCorpusMPI dataset. The resulting model, called MPIrigen, demonstrates superior performance in generating accurate MPI functions.

The paper proposes an innovative preprocessing approach for MPIrigen, where completion is performed only after observing the whole code, enabling better completion with a wider context. Comparative analysis against GPT-3.5's zero-shot performance, using a novel HPC-oriented evaluation method, shows that MPIrigen excels in generating MPI functions with up to 0.8 accuracy in location and function predictions, and more than 0.9 accuracy for argument predictions.

Critical Analysis

The paper acknowledges that the success of the MPIrigen model highlights the importance of domain-specific fine-tuning in optimizing language models for parallel computing code generation. However, it does not discuss the potential limitations or caveats of this approach.

For example, the paper does not address the scalability of the MPIrigen model or its performance on larger and more complex MPI-based programs. Additionally, the evaluation method used in the paper, while novel, may not capture all the nuances of real-world parallel programming tasks.

Furthermore, the paper does not explore potential ways to further improve the performance of language models in generating MPI-based programs, such as by incorporating automated multi-language to English translation techniques or exploring the use of MLIR-based compilers to optimize the generated code.

Conclusion

The paper presents a significant step forward in the development of AI-powered tools for parallel computing code generation. By introducing the MPIrigen model, which is specifically trained on MPI-related programming languages, the researchers have demonstrated the importance of domain-specific fine-tuning in optimizing language models for specialized tasks.

This work paves the way for a new generation of automatic parallelization tools that can help make parallel programming more accessible and efficient, ultimately enabling researchers and engineers to tackle increasingly complex computational challenges.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

HPC-Coder: Modeling Parallel Programs using Large Language Models

Daniel Nichols, Aniruddha Marathe, Harshitha Menon, Todd Gamblin, Abhinav Bhatele

Parallel programs in high performance computing (HPC) continue to grow in complexity and scale in the exascale era. The diversity in hardware and parallel programming models make developing, optimizing, and maintaining parallel software even more burdensome for developers. One way to alleviate some of these burdens is with automated development and analysis tools. Such tools can perform complex and/or remedial tasks for developers that increase their productivity and decrease the chance for error. Until recently, such tools for code development and performance analysis have been limited in the complexity of tasks they can perform, especially for parallel programs. However, with recent advancements in language modeling, and the availability of large amounts of open-source code related data, these tools have started to utilize predictive language models to automate more complex tasks. In this paper, we show how large language models (LLMs) can be applied to tasks specific to high performance and scientific codes. We introduce a new dataset of HPC and scientific codes and use it to fine-tune several pre-trained models. We compare several pre-trained LLMs on HPC-related tasks and introduce a new model, HPC-Coder, fine-tuned on parallel codes. In our experiments, we show that this model can auto-complete HPC functions where generic models cannot, decorate for loops with OpenMP pragmas, and model performance changes in scientific application repositories as well as programming competition solutions.

5/15/2024

cs.DC cs.AI

OMPGPT: A Generative Pre-trained Transformer Model for OpenMP

Le Chen, Arijit Bhattacharjee, Nesreen Ahmed, Niranjan Hasabnis, Gal Oren, Vy Vo, Ali Jannesari

Large language models (LLMs)such as ChatGPT have significantly advanced the field of Natural Language Processing (NLP). This trend led to the development of code-based large language models such as StarCoder, WizardCoder, and CodeLlama, which are trained extensively on vast repositories of code and programming languages. While the generic abilities of these code LLMs are useful for many programmers in tasks like code generation, the area of high-performance computing (HPC) has a narrower set of requirements that make a smaller and more domain-specific model a smarter choice. This paper presents OMPGPT, a novel domain-specific model meticulously designed to harness the inherent strengths of language models for OpenMP pragma generation. Furthermore, we leverage prompt engineering techniques from the NLP domain to create Chain-of-OMP, an innovative strategy designed to enhance OMPGPT's effectiveness. Our extensive evaluations demonstrate that OMPGPT outperforms existing large language models specialized in OpenMP tasks and maintains a notably smaller size, aligning it more closely with the typical hardware constraints of HPC environments. We consider our contribution as a pivotal bridge, connecting the advantage of language models with the specific demands of HPC tasks.

6/26/2024

cs.SE cs.DC cs.LG

IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators

Indraneil Paul, Goran Glavav{s}, Iryna Gurevych

Code understanding and generation have fast become some of the most popular applications of language models (LMs). Nonetheless, research on multilingual aspects of Code-LMs (i.e., LMs for code generation) such as cross-lingual transfer between different programming languages, language-specific data augmentation, and post-hoc LM adaptation, alongside exploitation of data sources other than the original textual content, has been much sparser than for their natural language counterparts. In particular, most mainstream Code-LMs have been pre-trained on source code files alone. In this work, we investigate the prospect of leveraging readily available compiler intermediate representations (IR) - shared across programming languages - to improve the multilingual capabilities of Code-LMs and facilitate cross-lingual transfer. To this end, we first compile SLTrans, a parallel dataset consisting of nearly 4M self-contained source code files coupled with respective intermediate representations. Next, starting from various base Code-LMs (ranging in size from 1.1B to 7.3B parameters), we carry out continued causal language modelling training on SLTrans, forcing the Code-LMs to (1) learn the IR language and (2) align the IR constructs with respective constructs of various programming languages. Our resulting models, dubbed IRCoder, display sizeable and consistent gains across a wide variety of code generation tasks and metrics, including prompt robustness, multilingual code completion, code understanding, and instruction following.

4/16/2024

cs.AI cs.CL cs.PL

💬

Can Large Language Models Write Parallel Code?

Daniel Nichols, Joshua H. Davis, Zhaojun Xie, Arjun Rajaram, Abhinav Bhatele

Large language models are increasingly becoming a popular tool for software development. Their ability to model and generate source code has been demonstrated in a variety of contexts, including code completion, summarization, translation, and lookup. However, they often struggle to generate code for complex programs. In this paper, we study the capabilities of state-of-the-art language models to generate parallel code. In order to evaluate language models, we create a benchmark, ParEval, consisting of prompts that represent 420 different coding tasks related to scientific and parallel computing. We use ParEval to evaluate the effectiveness of several state-of-the-art open- and closed-source language models on these tasks. We introduce novel metrics for evaluating the performance of generated code, and use them to explore how well each large language model performs for 12 different computational problem types and six different parallel programming models.

5/15/2024

cs.DC cs.AI