CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration

2406.13381

Published 6/21/2024 by Xinming Hou, Mingming Yang, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Wayne Xin Zhao

CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration

Abstract

Existing LLMs exhibit remarkable performance on various NLP tasks, but still struggle with complex real-world tasks, even equipped with advanced strategies like CoT and ReAct. In this work, we propose the CoAct framework, which transfers the hierarchical planning and collaboration patterns in human society to LLM systems. Specifically, our CoAct framework involves two agents: (1) A global planning agent, to comprehend the problem scope, formulate macro-level plans and provide detailed sub-task descriptions to local execution agents, which serves as the initial rendition of a global plan. (2) A local execution agent, to operate within the multi-tier task execution structure, focusing on detailed execution and implementation of specific tasks within the global plan. Experimental results on the WebArena benchmark show that CoAct can re-arrange the process trajectory when facing failures, and achieves superior performance over baseline methods on long-horizon web tasks. Code is available at https://github.com/xmhou2002/CoAct.

Create account to get full access

Overview

This paper presents CoAct, a framework for enabling autonomous agent collaboration within a global-local hierarchy.
The key idea is to allow agents to coordinate their actions at both a global level, where they consider the overall system objectives, and a local level, where they optimize their individual behaviors.
The framework is designed to improve the scalability, robustness, and performance of multi-agent systems compared to existing approaches.

Plain English Explanation

The paper discusses a new way for autonomous software agents to work together, called the CoAct framework. Autonomous agents are computer programs that can make decisions and take actions on their own, without being directly controlled by humans.

In the CoAct framework, the agents operate at two different levels:

Global level: The agents consider the overall goals and objectives of the entire system they are part of. They try to coordinate their actions to achieve these high-level aims.
Local level: The agents also optimize their own individual behaviors and decisions to best suit their local circumstances and constraints.

This dual global-local approach is intended to make the multi-agent system more scalable, robust, and effective than previous methods that focused only on one level of coordination. By considering both the big picture and their own situation, the agents can work together more efficiently.

The paper provides technical details on how this global-local hierarchy is implemented and evaluated through experiments. The key insight is that allowing agents to balance system-wide and individual concerns leads to better overall performance compared to approaches that oversimplify the coordination problem.

Technical Explanation

The CoAct framework introduces a global-local hierarchy for coordinating autonomous agents. At the global level, agents reason about the overall objectives of the multi-agent system and try to align their actions accordingly. At the local level, agents optimize their individual behaviors and decisions based on their own constraints and circumstances.

This hierarchy is implemented using a hierarchical reinforcement learning approach. The global-level policy determines high-level coordination strategies, while the local-level policies handle the agents' low-level control and decision-making.

The authors evaluate the CoAct framework on several multi-agent environments, including a cooperative navigation task and a resource collection scenario. The results show that CoAct outperforms baseline approaches in terms of task completion, scalability, and robustness to agent failures.

Critical Analysis

The CoAct framework represents a promising approach to improving the coordination and performance of multi-agent systems. By incorporating both global and local considerations, the agents can better balance system-wide objectives with their individual constraints and capabilities.

However, the paper does not address some potential limitations and areas for further research:

Scalability in complex environments: While the framework shows improved scalability compared to baselines, it's unclear how it would perform in highly complex, dynamic environments with a large number of agents and objectives.
Interpretability and explainability: The hierarchical reinforcement learning approach may make it challenging to understand and explain the reasoning behind the agents' decisions, which could be a concern in high-stakes applications.
Adaptation to changing conditions: The paper does not discuss how the CoAct framework might adapt to unexpected changes in the environment or agent capabilities over time.

Further research could explore ways to address these limitations, as well as investigate the potential applications of the CoAct framework in real-world multi-agent scenarios, such as autonomous vehicle coordination or swarm robotics.

Conclusion

The CoAct framework proposed in this paper represents a significant advance in the field of multi-agent coordination. By incorporating both global and local considerations, the agents can more effectively collaborate to achieve system-wide objectives while also optimizing their individual behaviors.

The experimental results demonstrate the framework's potential for improving the scalability, robustness, and performance of autonomous agent systems. While further research is needed to address some of the limitations, CoAct offers a promising direction for developing more sophisticated and capable multi-agent systems.

As the use of autonomous agents continues to grow in various domains, the CoAct approach could have important implications for the design and deployment of these systems, helping to unlock new possibilities for collaborative AI applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤷

AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration

Bo Pan, Jiaying Lu, Ke Wang, Li Zheng, Zhen Wen, Yingchaojie Feng, Minfeng Zhu, Wei Chen

The potential of automatic task-solving through Large Language Model (LLM)-based multi-agent collaboration has recently garnered widespread attention from both the research community and industry. While utilizing natural language to coordinate multiple agents presents a promising avenue for democratizing agent technology for general users, designing coordination strategies remains challenging with existing coordination frameworks. This difficulty stems from the inherent ambiguity of natural language for specifying the collaboration process and the significant cognitive effort required to extract crucial information (e.g. agent relationship, task dependency, result correspondence) from a vast amount of text-form content during exploration. In this work, we present a visual exploration framework to facilitate the design of coordination strategies in multi-agent collaboration. We first establish a structured representation for LLM-based multi-agent coordination strategy to regularize the ambiguity of natural language. Based on this structure, we devise a three-stage generation method that leverages LLMs to convert a user's general goal into an executable initial coordination strategy. Users can further intervene at any stage of the generation process, utilizing LLMs and a set of interactions to explore alternative strategies. Whenever a satisfactory strategy is identified, users can commence the collaboration and examine the visually enhanced execution result. We develop AgentCoord, a prototype interactive system, and conduct a formal user study to demonstrate the feasibility and effectiveness of our approach.

4/19/2024

cs.HC

AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning

Shuofei Qiao, Ningyu Zhang, Runnan Fang, Yujie Luo, Wangchunshu Zhou, Yuchen Eleanor Jiang, Chengfei Lv, Huajun Chen

Language agents have achieved considerable performance on various complex question-answering tasks by planning with external tools. Despite the incessant exploration in this field, existing language agent systems still struggle with costly, non-reproducible data reliance and face the challenge of compelling a single model for multiple functions. To this end, we introduce AutoAct, an automatic agent learning framework for QA that does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models (e.g., GPT-4). Given limited data with a tool library, AutoAct first automatically synthesizes planning trajectories without any assistance from humans or strong closed-source models. Then, AutoAct leverages a division-of-labor strategy to automatically differentiate based on the target task information and synthesized trajectories, producing a sub-agent group to complete the task. We conduct comprehensive experiments with different LLMs, which demonstrates that AutoAct yields better or parallel performance compared to various strong baselines. Further analysis demonstrates the effectiveness of the division-of-labor strategy, with the trajectory quality generated by AutoAct generally outperforming that of others. Code will be available at https://github.com/zjunlp/AutoAct.

5/28/2024

cs.CL cs.AI cs.HC cs.LG cs.MA

🚀

CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation

Xinbei Ma, Zhuosheng Zhang, Hai Zhao

Multimodal large language models (MLLMs) have shown remarkable potential as human-like autonomous language agents to interact with real-world environments, especially for graphical user interface (GUI) automation. However, those GUI agents require comprehensive cognition ability including exhaustive perception and reliable action response. We propose a Comprehensive Cognitive LLM Agent, CoCo-Agent, with two novel approaches, comprehensive environment perception (CEP) and conditional action prediction (CAP), to systematically improve the GUI automation performance. First, CEP facilitates the GUI perception through different aspects and granularity, including screenshots and complementary detailed layouts for the visual channel and historical actions for the textual channel. Second, CAP decomposes the action prediction into sub-problems: action type prediction and action target conditioned on the action type. With our technical design, our agent achieves new state-of-the-art performance on AITW and META-GUI benchmarks, showing promising abilities in realistic scenarios. Code is available at https://github.com/xbmxb/CoCo-Agent.

6/4/2024

cs.CL

LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

Saaket Agashe, Yue Fan, Anthony Reyna, Xin Eric Wang

The emergent reasoning and Theory of Mind (ToM) abilities demonstrated by Large Language Models (LLMs) make them promising candidates for developing coordination agents. In this study, we introduce a new LLM-Coordination Benchmark aimed at a detailed analysis of LLMs within the context of Pure Coordination Games, where participating agents need to cooperate for the most gain. This benchmark evaluates LLMs through two distinct tasks: (1) emph{Agentic Coordination}, where LLMs act as proactive participants for cooperation in 4 pure coordination games; (2) emph{Coordination Question Answering (QA)}, where LLMs are prompted to answer 198 multiple-choice questions from the 4 games for evaluation of three key reasoning abilities: Environment Comprehension, ToM Reasoning, and Joint Planning. Furthermore, to enable LLMs for multi-agent coordination, we introduce a Cognitive Architecture for Coordination (CAC) framework that can easily integrate different LLMs as plug-and-play modules for pure coordination games. Our findings indicate that LLM agents equipped with GPT-4-turbo achieve comparable performance to state-of-the-art reinforcement learning methods in games that require commonsense actions based on the environment. Besides, zero-shot coordination experiments reveal that, unlike RL methods, LLM agents are robust to new unseen partners. However, results on Coordination QA show a large room for improvement in the Theory of Mind reasoning and joint planning abilities of LLMs. The analysis also sheds light on how the ability of LLMs to understand their environment and their partner's beliefs and intentions plays a part in their ability to plan for coordination. Our code is available at url{https://github.com/eric-ai-lab/llm_coordination}.

4/4/2024

cs.CL cs.MA