CodeR: Issue Resolving with Multi-Agent and Task Graphs

Read original: arXiv:2406.01304 - Published 6/12/2024 by Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev and 7 others

CodeR: Issue Resolving with Multi-Agent and Task Graphs

Overview

Addresses the challenge of resolving software issues using a multi-agent system and task graphs
Proposes a framework called "CodeR" that leverages collaborative problem-solving among various agents
Focuses on efficient issue resolution through task planning and coordination

Plain English Explanation

The paper introduces a system called "CodeR" that aims to streamline the process of resolving software issues. Instead of a single developer working alone, CodeR employs a team of specialized agents, each with their own expertise and capabilities. These agents collaborate to identify the root cause of a problem, plan the necessary tasks, and coordinate their efforts to efficiently fix the issue.

The key idea is to represent the software issue and the potential resolution steps as a task graph. This graph allows the agents to visualize the problem, break it down into smaller, manageable tasks, and assign those tasks to the most appropriate agents. By working together, the agents can tackle complex problems more effectively than a single developer could.

For example, imagine a bug in a web application's login system. One agent might specialize in analyzing the code and identifying the root cause, another might be an expert in security protocols, and a third might be skilled in user interface design. These agents would work together to understand the problem, determine the necessary steps to fix it, and execute those steps in a coordinated manner.

Technical Explanation

The paper proposes a framework called "CodeR" that utilizes a multi-agent system and task graphs to address software issues. The framework consists of several key components:

Issue Representation: Software issues are represented as task graphs, which capture the dependencies and relationships between subtasks required to resolve the problem.
Agent Modeling: The system includes various specialized agents, each with their own capabilities, knowledge, and problem-solving strategies. These agents collaborate to tackle the software issues.
Task Planning and Coordination: The agents use the task graph to plan the necessary steps to resolve the issue, assign tasks to the most suitable agents, and coordinate their efforts to execute the plan efficiently.

The authors describe the formal models and algorithms used to enable this collaborative problem-solving approach. They also present a series of experiments that demonstrate the effectiveness of CodeR in resolving software issues compared to traditional single-developer approaches.

Critical Analysis

The paper presents a promising approach to addressing software issues by leveraging a multi-agent system and task graphs. The authors have identified a valid problem and proposed a thoughtful solution that takes advantage of collaborative problem-solving.

One potential limitation of the research is the reliance on a specific set of agent types and their predefined capabilities. In real-world software development, the range of expertise and problem-solving strategies required to resolve issues can be highly diverse and dynamic. Exploring more flexible and adaptable agent models could enhance the system's ability to handle a broader set of software problems.

Additionally, the paper focuses primarily on the theoretical aspects of the framework and its performance in simulated environments. Further research and evaluation in real-world software development contexts would be valuable to assess the practical applicability and potential challenges of deploying such a system in industry settings.

Conclusion

The CodeR framework offers a novel approach to software issue resolution by harnessing the power of collaborative multi-agent systems and task graphs. By breaking down problems into smaller, manageable tasks and assigning them to specialized agents, the system has the potential to improve the efficiency and effectiveness of software troubleshooting.

While the paper provides a solid theoretical foundation, further exploration of more flexible agent models and real-world implementation challenges could help refine and strengthen the CodeR approach. Overall, this research represents an interesting step towards developing more intelligent and collaborative software development tools.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

CodeR: Issue Resolving with Multi-Agent and Task Graphs

Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev, Jie Wang, Xiao Cheng, Guangtai Liang, Yuchi Ma, Pan Bian, Tao Xie, Qianxiang Wang

GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issues, when submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction.

6/12/2024

SWE-bench-java: A GitHub Issue Resolving Benchmark for Java

Daoguang Zan, Zhirong Huang, Ailun Yu, Shaoxin Lin, Yifan Shi, Wei Liu, Dong Chen, Zongshuai Qi, Hao Yu, Lei Yu, Dezhi Ran, Muhan Zeng, Bo Shen, Pan Bian, Guangtai Liang, Bei Guan, Pengjie Huang, Tao Xie, Yongji Wang, Qianxiang Wang

GitHub issue resolving is a critical task in software engineering, recently gaining significant attention in both industry and academia. Within this task, SWE-bench has been released to evaluate issue resolving capabilities of large language models (LLMs), but has so far only focused on Python version. However, supporting more programming languages is also important, as there is a strong demand in industry. As a first step toward multilingual support, we have developed a Java version of SWE-bench, called SWE-bench-java. We have publicly released the dataset, along with the corresponding Docker-based evaluation environment and leaderboard, which will be continuously maintained and updated in the coming months. To verify the reliability of SWE-bench-java, we implement a classic method SWE-agent and test several powerful LLMs on it. As is well known, developing a high-quality multi-lingual benchmark is time-consuming and labor-intensive, so we welcome contributions through pull requests or collaboration to accelerate its iteration and refinement, paving the way for fully automated programming.

8/27/2024

💬

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan

Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We find real-world software engineering to be a rich, sustainable, and challenging testbed for evaluating the next generation of language models. To this end, we introduce SWE-bench, an evaluation framework consisting of $2,294$ software engineering problems drawn from real GitHub issues and corresponding pull requests across $12$ popular Python repositories. Given a codebase along with a description of an issue to be resolved, a language model is tasked with editing the codebase to address the issue. Resolving issues in SWE-bench frequently requires understanding and coordinating changes across multiple functions, classes, and even files simultaneously, calling for models to interact with execution environments, process extremely long contexts and perform complex reasoning that goes far beyond traditional code generation tasks. Our evaluations show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues. The best-performing model, Claude 2, is able to solve a mere $1.96$% of the issues. Advances on SWE-bench represent steps towards LMs that are more practical, intelligent, and autonomous.

4/9/2024

❗

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press

Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like software engineering, we posit that LM agents represent a new category of end users with their own needs and abilities, and would benefit from specially-built interfaces to the software they use. We investigate how interface design affects the performance of language model agents. As a result of this exploration, we introduce SWE-agent: a system that facilitates LM agents to autonomously use computers to solve software engineering tasks. SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs. We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively, far exceeding the previous state-of-the-art achieved with non-interactive LMs. Finally, we provide insight on how the design of the ACI can impact agents' behavior and performance.

6/3/2024