HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model

Read original: arXiv:2408.09559 - Published 8/20/2024 by Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, Ping Luo

💬

Overview

Large language model (LLM) agents can operate in various domains, processing environmental observations to generate executable actions for target tasks.
The effectiveness of these agents is significantly influenced by their memory mechanism, which records historical experiences as sequences of action-observation pairs.
Memory can be categorized into two types: cross-trial memory (accumulated across multiple attempts) and in-trial memory (working memory, accumulated within a single attempt).
While cross-trial memory has been extensively optimized, enhancing agent performance through improved working memory utilization remains underexplored.
Existing approaches often involve directly inputting entire historical action-observation pairs into LLMs, leading to redundancy in long-horizon tasks.

Plain English Explanation

Large language model (LLM) agents are computer systems that can process information from their environment and take actions to achieve specific goals. These agents are very powerful, but their effectiveness is heavily influenced by how they store and use their past experiences.

Agents can store two types of memories: cross-trial memory and in-trial memory (working memory). Cross-trial memory is what the agent has learned over multiple attempts at a task, while working memory is what the agent remembers during a single attempt.

Researchers have done a lot of work to optimize cross-trial memory, but improving how agents use their working memory has not been explored as much. Existing approaches often try to directly feed the agent's entire history of actions and observations, which can be inefficient for long-term tasks.

The paper introduces a new framework called HiAgent that aims to help LLM agents use their working memory more effectively. HiAgent prompts the LLM to break down a task into smaller sub-goals and only keep track of the information relevant to the current sub-goal, rather than the full history. This can help the agent perform better on long-term tasks.

Technical Explanation

The paper introduces HiAgent, a framework that leverages subgoals as memory chunks to manage the working memory of LLM-based agents hierarchically. Specifically, HiAgent prompts LLMs to formulate subgoals before generating executable actions and enables LLMs to decide proactively to replace previous subgoals with summarized observations, retaining only the action-observation pairs relevant to the current subgoal.

Experimental results across five long-horizon tasks demonstrate that HiAgent achieves a twofold increase in success rate and reduces the average number of steps required by 3.8. The analysis also shows that HiAgent consistently improves performance across various steps, highlighting its robustness and generalizability.

Critical Analysis

The paper acknowledges that while considerable research has optimized performance through cross-trial memory, the enhancement of agent performance through improved working memory utilization remains underexplored. This presents an important gap in the existing literature that the HiAgent framework aims to address.

One potential limitation mentioned is the reliance on LLMs to formulate subgoals, which could introduce biases or errors if the LLM is not sufficiently capable. Additionally, the paper does not explore the scalability of HiAgent as the complexity of tasks or the size of the LLM increases.

Further research could investigate alternative approaches to working memory management, such as incorporating external memory modules or exploring the use of memory-augmented neural networks. Comparisons to other working memory-focused techniques would also help contextualize the contributions of HiAgent.

Conclusion

The HiAgent framework represents a promising step towards enhancing the performance of LLM-based agents by improving the utilization of their working memory. By prompting LLMs to formulate subgoals and selectively retain relevant action-observation pairs, HiAgent achieves significant improvements in success rates and task completion efficiency across various long-horizon tasks.

This work highlights the importance of working memory management in the development of more capable and efficient LLM-based agents, which have the potential to impact a wide range of applications, from robotics and planning to interactive software systems and game AI.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model

Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, Ping Luo

Large Language Model (LLM)-based agents exhibit significant potential across various domains, operating as interactive systems that process environmental observations to generate executable actions for target tasks. The effectiveness of these agents is significantly influenced by their memory mechanism, which records historical experiences as sequences of action-observation pairs. We categorize memory into two types: cross-trial memory, accumulated across multiple attempts, and in-trial memory (working memory), accumulated within a single attempt. While considerable research has optimized performance through cross-trial memory, the enhancement of agent performance through improved working memory utilization remains underexplored. Instead, existing approaches often involve directly inputting entire historical action-observation pairs into LLMs, leading to redundancy in long-horizon tasks. Inspired by human problem-solving strategies, this paper introduces HiAgent, a framework that leverages subgoals as memory chunks to manage the working memory of LLM-based agents hierarchically. Specifically, HiAgent prompts LLMs to formulate subgoals before generating executable actions and enables LLMs to decide proactively to replace previous subgoals with summarized observations, retaining only the action-observation pairs relevant to the current subgoal. Experimental results across five long-horizon tasks demonstrate that HiAgent achieves a twofold increase in success rate and reduces the average number of steps required by 3.8. Additionally, our analysis shows that HiAgent consistently improves performance across various steps, highlighting its robustness and generalizability. Project Page: https://github.com/HiAgent2024/HiAgent .

8/20/2024

💬

Empowering Working Memory for Large Language Model Agents

Jing Guo, Nan Li, Jianchuan Qi, Hang Yang, Ruiqiao Li, Yuzhen Feng, Si Zhang, Ming Xu

Large language models (LLMs) have achieved impressive linguistic capabilities. However, a key limitation persists in their lack of human-like memory faculties. LLMs exhibit constrained memory retention across sequential interactions, hindering complex reasoning. This paper explores the potential of applying cognitive psychology's working memory frameworks, to enhance LLM architecture. The limitations of traditional LLM memory designs are analyzed, including their isolation of distinct dialog episodes and lack of persistent memory links. To address this, an innovative model is proposed incorporating a centralized Working Memory Hub and Episodic Buffer access to retain memories across episodes. This architecture aims to provide greater continuity for nuanced contextual reasoning during intricate tasks and collaborative scenarios. While promising, further research is required into optimizing episodic memory encoding, storage, prioritization, retrieval, and security. Overall, this paper provides a strategic blueprint for developing LLM agents with more sophisticated, human-like memory capabilities, highlighting memory mechanisms as a vital frontier in artificial general intelligence.

5/29/2024

A Survey on the Memory Mechanism of Large Language Model based Agents

Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, Ji-Rong Wen

Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. Compared with original LLMs, LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems that need long-term and complex agent-environment interactions. The key component to support agent-environment interactions is the memory of the agents. While previous studies have proposed many promising memory mechanisms, they are scattered in different papers, and there lacks a systematical review to summarize and compare these works from a holistic perspective, failing to abstract common and effective designing patterns for inspiring future studies. To bridge this gap, in this paper, we propose a comprehensive survey on the memory mechanism of LLM-based agents. In specific, we first discuss ''what is'' and ''why do we need'' the memory in LLM-based agents. Then, we systematically review previous studies on how to design and evaluate the memory module. In addition, we also present many agent applications, where the memory module plays an important role. At last, we analyze the limitations of existing work and show important future directions. To keep up with the latest advances in this field, we create a repository at url{https://github.com/nuster1128/LLM_Agent_Memory_Survey}.

4/23/2024

Memory Sharing for Large Language Model based Agents

Hang Gao, Yongfeng Zhang

The adaptation of Large Language Model (LLM)-based agents to execute tasks via natural language prompts represents a significant advancement, notably eliminating the need for explicit retraining or fine tuning, but are constrained by the comprehensiveness and diversity of the provided examples, leading to outputs that often diverge significantly from expected results, especially when it comes to the open-ended questions. This paper introduces the Memory Sharing, a framework which integrates the real-time memory filter, storage and retrieval to enhance the In-Context Learning process. This framework allows for the sharing of memories among multiple agents, whereby the interactions and shared memories between different agents effectively enhance the diversity of the memories. The collective self-enhancement through interactive learning among multiple agents facilitates the evolution from individual intelligence to collective intelligence. Besides, the dynamically growing memory pool is utilized not only to improve the quality of responses but also to train and enhance the retriever. We evaluated our framework across three distinct domains involving specialized tasks of agents. The experimental results demonstrate that the MS framework significantly improves the agents' performance in addressing open-ended questions.

7/8/2024