RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

Read original: arXiv:2312.15698 - Published 6/10/2024 by Andr'e Silva, Sen Fang, Martin Monperrus

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

Overview

Presents a new approach called "RepairLLaMA" for efficient fine-tuning of large language models (LLMs) like LLaMA for program repair tasks
Introduces novel code representations and parameter-efficient fine-tuning techniques to improve the performance of LLMs on program repair benchmarks
Demonstrates that RepairLLaMA outperforms previous state-of-the-art methods for automated program repair while requiring significantly fewer parameters and training steps

Plain English Explanation

The paper introduces a new system called "RepairLLaMA" that aims to make large language models (LLMs) like LLaMA more efficient and effective at the task of program repair. Program repair is the process of automatically detecting and fixing bugs or errors in computer code.

The key ideas behind RepairLLaMA are:

Novel Code Representations: The researchers developed new ways to represent code that allow the LLM to better understand and reason about programming languages. This helps the model perform better on program repair tasks.
Parameter-Efficient Fine-Tuning: Instead of fully retraining the entire LLM from scratch, the researchers use a technique called "parameter-efficient fine-tuning". This allows them to adapt the LLM to program repair with far fewer parameters and training steps, making the process much more efficient.

By incorporating these innovations, the researchers show that RepairLLaMA outperforms previous state-of-the-art methods for automated program repair, while requiring significantly fewer resources (i.e., fewer model parameters and training steps) to achieve these improvements.

Technical Explanation

The paper presents the "RepairLLaMA" approach, which builds on top of the LLaMA large language model. The key technical contributions are:

Novel Code Representations: The authors introduce several new ways to represent code that can better capture the structure and semantics of programming languages. This includes using a combination of token-level, span-level, and program-level representations.
Parameter-Efficient Fine-Tuning: Instead of fully retraining the entire LLaMA model from scratch, the authors use a parameter-efficient fine-tuning approach. This involves adding small "adapter" modules to the LLaMA model and only fine-tuning those adapters, rather than updating the entire model. This makes the fine-tuning process much more efficient.

The authors evaluate RepairLLaMA on several program repair benchmarks, including Aligning LLMs for Free Program Repair, Automated Program Repair: Emerging Trends, Pose & Expose, and Peer-Aided Repairer: Empowering Large Language Models. They show that RepairLLaMA outperforms previous state-of-the-art methods on these benchmarks, while using significantly fewer parameters and training steps.

Critical Analysis

The paper presents a well-designed and thorough evaluation of the RepairLLaMA approach, providing compelling evidence for its effectiveness. However, a few potential limitations or areas for further research are worth noting:

Generalization to Diverse Codebases: The evaluation is primarily focused on a limited set of program repair benchmarks. It would be valuable to see how well RepairLLaMA generalizes to a more diverse range of codebases and programming languages.
Interpretability and Explainability: As with many deep learning approaches, the inner workings of RepairLLaMA may be difficult to interpret. Providing more insight into how the model reasons about and repairs code could be valuable for building trust and understanding.
Scalability and Deployment Considerations: While the parameter-efficient fine-tuning approach is a strength, the authors do not extensively discuss the practical considerations of deploying RepairLLaMA at scale, such as computational requirements, inference times, and integration with existing developer workflows.

Overall, the RepairLLaMA approach represents a promising step forward in making large language models more efficient and effective for the challenging task of automated program repair. Further research exploring the model's limitations and real-world applicability would be valuable.

Conclusion

The RepairLLaMA paper presents a novel approach for fine-tuning large language models like LLaMA to perform efficient and effective automated program repair. By introducing new code representations and a parameter-efficient fine-tuning technique, the researchers demonstrate significant improvements over previous state-of-the-art methods, while requiring far fewer resources.

This work represents an important step forward in the field of automated program repair, showing the potential for large language models to be adapted for specialized tasks like code correction and bug fixing. As language models continue to grow in capability, innovations like RepairLLaMA will be crucial for making these models more practical and accessible for real-world software development and maintenance tasks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

Andr'e Silva, Sen Fang, Martin Monperrus

Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tune LLMs with naive code representations and does not scale to frontier models. To address this problem, we propose RepairLLaMA, a novel program repair approach that 1) identifies optimal code representations for APR with fine-tuned models, and 2) pioneers state-of-the-art parameter-efficient fine-tuning technique (PEFT) for program repair. This results in RepairLLaMA producing a highly effective `program repair adapter' for fixing bugs with AI. Our experiments demonstrate the validity of both concepts. First, fine-tuning adapters with program repair specific code representations enables the model to use meaningful repair signals and produce better patches. Second, parameter-efficient fine-tuning helps fine-tuning to converge and clearly contributes to the effectiveness of RepairLLaMA in fixing bugs outside the fine-tuning data distribution. Overall, RepairLLaMA correctly fixes 144 Defects4J v2 and 109 HumanEval-Java bugs, outperforming all baselines.

6/10/2024

RePair: Automated Program Repair with Process-based Feedback

Yuze Zhao, Zhenya Huang, Yixiao Ma, Rui Li, Kai Zhang, Hao Jiang, Qi Liu, Linbo Zhu, Yu Su

The gap between the trepidation of program reliability and the expense of repairs underscores the indispensability of Automated Program Repair (APR). APR is instrumental in transforming vulnerable programs into more robust ones, bolstering program reliability while simultaneously diminishing the financial burden of manual repairs. Commercial-scale language models (LM) have taken APR to unprecedented levels. However, the emergence reveals that for models fewer than 100B parameters, making single-step modifications may be difficult to achieve the desired effect. Moreover, humans interact with the LM through explicit prompts, which hinders the LM from receiving feedback from compiler and test cases to automatically optimize its repair policies. In this literature, we explore how small-scale LM (less than 20B) achieve excellent performance through process supervision and feedback. We start by constructing a dataset named CodeNet4Repair, replete with multiple repair records, which supervises the fine-tuning of a foundational model. Building upon the encouraging outcomes of reinforcement learning, we develop a reward model that serves as a critic, providing feedback for the fine-tuned LM's action, progressively optimizing its policy. During inference, we require the LM to generate solutions iteratively until the repair effect no longer improves or hits the maximum step limit. The results show that process-based not only outperforms larger outcome-based generation methods, but also nearly matches the performance of closed-source commercial large-scale LMs.

8/22/2024

Aligning LLMs for FL-free Program Repair

Junjielong Xu, Ying Fu, Shin Hwei Tan, Pinjia He

Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction objective of current infilling-style methods, which impedes LLMs from fully leveraging pre-trained knowledge for program repair. In addition, while some LLMs are capable of locating and repairing bugs end-to-end when using the related artifacts (e.g., test cases) as input, existing methods regard them as separate tasks and ask LLMs to generate patches at fixed locations. This restriction hinders LLMs from exploring potential patches beyond the given locations. In this paper, we investigate a new approach to adapt LLMs to program repair. Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective and allowing them to refine the whole program without first performing fault localization. Based on this insight, we designed D4C, a straightforward prompting framework for APR. D4C can repair 180 bugs correctly in Defects4J, with each patch being sampled only 10 times. This surpasses the SOTA APR methods with perfect fault localization by 10% and reduces the patch sampling number by 90%. Our findings reveal that (1) objective alignment is crucial for fully exploiting LLM's pre-trained capability, and (2) replacing the traditional localize-then-repair workflow with direct debugging is more effective for LLM-based APR methods. Thus, we believe this paper introduces a new mindset for harnessing LLMs in APR.

4/16/2024

🌐

Automated Program Repair: Emerging trends pose and expose problems for benchmarks

Joseph Renzullo, Pemma Reiter, Westley Weimer, Stephanie Forrest

Machine learning (ML) now pervades the field of Automated Program Repair (APR). Algorithms deploy neural machine translation and large language models (LLMs) to generate software patches, among other tasks. But, there are important differences between these applications of ML and earlier work. Evaluations and comparisons must take care to ensure that results are valid and likely to generalize. A challenge is that the most popular APR evaluation benchmarks were not designed with ML techniques in mind. This is especially true for LLMs, whose large and often poorly-disclosed training datasets may include problems on which they are evaluated.

5/10/2024