RePair: Automated Program Repair with Process-based Feedback

Read original: arXiv:2408.11296 - Published 8/22/2024 by Yuze Zhao, Zhenya Huang, Yixiao Ma, Rui Li, Kai Zhang, Hao Jiang, Qi Liu, Linbo Zhu, Yu Su

RePair: Automated Program Repair with Process-based Feedback

Overview

This paper introduces a new approach called RePair for automated program repair.
RePair uses a process-based feedback mechanism to guide the program repair process.
The key innovations include a novel program repair model and a process-based feedback mechanism.

Plain English Explanation

RePair: Automated Program Repair with Process-based Feedback is a research paper that presents a new technique for automatically fixing bugs in computer programs. The researchers developed a system called RePair that takes a buggy program as input and tries to modify it to fix the bug.

The core idea behind RePair is to use a process-based feedback mechanism to guide the program repair process. Rather than just looking at the final program output, RePair also considers the intermediate steps the program takes to reach that output. This additional information helps RePair make more informed decisions about how to modify the program to fix the bug.

The paper describes the technical details of how RePair works, including the novel program repair model and the process-based feedback mechanism. The researchers also present the results of experiments showing that RePair can effectively fix bugs in a variety of programs, outperforming previous automated program repair techniques.

Technical Explanation

The paper begins by outlining the motivation and key innovations of the RePair approach. The researchers explain that traditional automated program repair techniques often focus solely on the final program output, ignoring the intermediate steps the program takes to reach that output. In contrast, RePair incorporates a process-based feedback mechanism that considers the program's execution trace to guide the repair process.

The data collection section describes how the researchers gathered a dataset of buggy programs and their corresponding fixes. They used a combination of static and dynamic analysis techniques to extract the relevant program information, including the execution traces.

The core technical details of the RePair model are then presented, including the novel program repair model and the process-based feedback mechanism. The researchers explain how RePair uses this information to generate and evaluate candidate repairs, ultimately selecting the most appropriate fix for the bug.

The experimental evaluation section reports on the results of testing RePair on a variety of benchmark programs. The system is shown to outperform previous state-of-the-art automated program repair techniques, demonstrating the effectiveness of the process-based feedback approach.

Critical Analysis

The paper provides a thorough technical explanation of the RePair system and presents convincing experimental results. However, the authors acknowledge certain limitations and areas for further research. For example, the current implementation of RePair is limited to repairing single-line bugs, and the system may struggle with more complex, multi-line bugs.

Additionally, the paper does not address potential ethical concerns or societal implications of automated program repair technology. As this technology becomes more advanced and widely adopted, it will be important to consider issues such as the responsibility and accountability for bugs introduced by automated repair systems.

Conclusion

Overall, the RePair paper presents a novel and promising approach to automated program repair. The key innovation of incorporating process-based feedback into the repair process appears to significantly improve the system's performance compared to previous techniques. While the current implementation has some limitations, the research highlights the potential of this approach and opens up new avenues for further exploration in the field of automated program repair.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

RePair: Automated Program Repair with Process-based Feedback

Yuze Zhao, Zhenya Huang, Yixiao Ma, Rui Li, Kai Zhang, Hao Jiang, Qi Liu, Linbo Zhu, Yu Su

The gap between the trepidation of program reliability and the expense of repairs underscores the indispensability of Automated Program Repair (APR). APR is instrumental in transforming vulnerable programs into more robust ones, bolstering program reliability while simultaneously diminishing the financial burden of manual repairs. Commercial-scale language models (LM) have taken APR to unprecedented levels. However, the emergence reveals that for models fewer than 100B parameters, making single-step modifications may be difficult to achieve the desired effect. Moreover, humans interact with the LM through explicit prompts, which hinders the LM from receiving feedback from compiler and test cases to automatically optimize its repair policies. In this literature, we explore how small-scale LM (less than 20B) achieve excellent performance through process supervision and feedback. We start by constructing a dataset named CodeNet4Repair, replete with multiple repair records, which supervises the fine-tuning of a foundational model. Building upon the encouraging outcomes of reinforcement learning, we develop a reward model that serves as a critic, providing feedback for the fine-tuned LM's action, progressively optimizing its policy. During inference, we require the LM to generate solutions iteratively until the repair effect no longer improves or hits the maximum step limit. The results show that process-based not only outperforms larger outcome-based generation methods, but also nearly matches the performance of closed-source commercial large-scale LMs.

8/22/2024

🌐

Automated Program Repair: Emerging trends pose and expose problems for benchmarks

Joseph Renzullo, Pemma Reiter, Westley Weimer, Stephanie Forrest

Machine learning (ML) now pervades the field of Automated Program Repair (APR). Algorithms deploy neural machine translation and large language models (LLMs) to generate software patches, among other tasks. But, there are important differences between these applications of ML and earlier work. Evaluations and comparisons must take care to ensure that results are valid and likely to generalize. A challenge is that the most popular APR evaluation benchmarks were not designed with ML techniques in mind. This is especially true for LLMs, whose large and often poorly-disclosed training datasets may include problems on which they are evaluated.

5/10/2024

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

Andr'e Silva, Sen Fang, Martin Monperrus

Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tune LLMs with naive code representations and does not scale to frontier models. To address this problem, we propose RepairLLaMA, a novel program repair approach that 1) identifies optimal code representations for APR with fine-tuned models, and 2) pioneers state-of-the-art parameter-efficient fine-tuning technique (PEFT) for program repair. This results in RepairLLaMA producing a highly effective `program repair adapter' for fixing bugs with AI. Our experiments demonstrate the validity of both concepts. First, fine-tuning adapters with program repair specific code representations enables the model to use meaningful repair signals and produce better patches. Second, parameter-efficient fine-tuning helps fine-tuning to converge and clearly contributes to the effectiveness of RepairLLaMA in fixing bugs outside the fine-tuning data distribution. Overall, RepairLLaMA correctly fixes 144 Defects4J v2 and 109 HumanEval-Java bugs, outperforming all baselines.

6/10/2024

Aligning LLMs for FL-free Program Repair

Junjielong Xu, Ying Fu, Shin Hwei Tan, Pinjia He

Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction objective of current infilling-style methods, which impedes LLMs from fully leveraging pre-trained knowledge for program repair. In addition, while some LLMs are capable of locating and repairing bugs end-to-end when using the related artifacts (e.g., test cases) as input, existing methods regard them as separate tasks and ask LLMs to generate patches at fixed locations. This restriction hinders LLMs from exploring potential patches beyond the given locations. In this paper, we investigate a new approach to adapt LLMs to program repair. Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective and allowing them to refine the whole program without first performing fault localization. Based on this insight, we designed D4C, a straightforward prompting framework for APR. D4C can repair 180 bugs correctly in Defects4J, with each patch being sampled only 10 times. This surpasses the SOTA APR methods with perfect fault localization by 10% and reduces the patch sampling number by 90%. Our findings reveal that (1) objective alignment is crucial for fully exploiting LLM's pre-trained capability, and (2) replacing the traditional localize-then-repair workflow with direct debugging is more effective for LLM-based APR methods. Thus, we believe this paper introduces a new mindset for harnessing LLMs in APR.

4/16/2024