Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Read original: arXiv:2302.01560 - Published 7/9/2024 by Zihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian Ma, Yitao Liang

💬

Overview

The paper investigates the challenge of task planning for multi-task embodied agents in open-world environments like Minecraft.
Two main difficulties are identified: 1) executing plans in an open-world environment requires accurate and multi-step reasoning due to the long-term nature of tasks, and 2) vanilla planners do not consider how easy the current agent can achieve a given sub-task when ordering parallel sub-goals within a complicated plan, leading to inefficient or infeasible plans.
To address these challenges, the paper proposes DEPS, an interactive planning approach based on Large Language Models (LLMs).

Plain English Explanation

DEPS is designed to help embodied agents, like virtual robots, accomplish complex tasks in open-world environments like the video game Minecraft. The key problems it aims to solve are:

Plans for long-term tasks in these environments need to be very detailed and account for many steps, which is challenging for traditional planning algorithms.
When creating a plan with multiple sub-tasks that can be done in parallel, typical planners don't consider how easy each sub-task is for the agent to complete. This can lead to inefficient or even impossible plans.

To address these issues, DEPS uses large language models to generate and refine plans. It does this by:

Describing the plan execution process and providing self-explanations when the plan encounters failures, which helps with error correction.
Including a "goal selector" module that ranks parallel sub-tasks based on how easy they are for the agent to complete, allowing it to create better plans.

The paper shows that this approach allows a virtual agent to robustly accomplish over 70 different Minecraft tasks, nearly doubling the overall performance compared to previous methods. DEPS also works well in other domains like tabletop manipulation and the ALFWorld environment.

Technical Explanation

The key technical components of DEPS are:

Description: This module generates a natural language description of the plan execution process, which helps with error correction when the plan encounters failures.
Explanation: This component provides self-explanations to give feedback on why the plan is not working as expected, further aiding the planning process.
Plan: The planning module uses large language models to generate an initial plan for accomplishing the task.
Selector: This trainable module ranks parallel candidate sub-goals based on estimated steps to completion, refining the initial plan.

The paper's experiments show that this interactive planning approach, combining language modeling and a goal-prioritizing selector, allows for much more robust and effective task planning in open-world environments compared to previous methods. Further analysis reveals the general effectiveness of DEPS in other popular domains as well.

Critical Analysis

The paper presents a novel and promising approach to task planning for embodied agents in complex, open-world environments. The use of large language models to generate and refine plans, along with the goal-prioritizing selector, appears to be a significant advancement over traditional planning algorithms.

However, the paper does not address some potential limitations of the approach. For example, it's unclear how well DEPS would scale to environments with an extremely large number of possible actions and sub-goals, or how sensitive the performance is to the specific language model used. Additionally, the paper does not delve into the computational costs or real-time performance of the DEPS system, which could be important factors for real-world deployment.

Further research could explore ways to make the DEPS approach more efficient and robust, perhaps by incorporating adaptive reinforcement learning techniques or investigating ways to better integrate the language model with the planning components. Evaluating DEPS in even more diverse and challenging environments would also help demonstrate its general applicability.

Conclusion

The DEPS approach presented in this paper represents an important step forward in task planning for embodied agents in open-world environments. By leveraging large language models and a novel goal-prioritizing selector, the system can generate and refine plans much more effectively than previous methods, enabling a virtual agent to accomplish a wide range of complex tasks in Minecraft and other domains.

While the paper identifies some limitations that warrant further research, the overall results suggest that this type of interactive, language-based planning could be a key component in developing more capable and adaptable embodied AI systems. As the field of AI continues to advance, techniques like DEPS may play a crucial role in allowing artificial agents to navigate and interact with the real world in increasingly sophisticated ways.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Zihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian Ma, Yitao Liang

We investigate the challenge of task planning for multi-task embodied agents in open-world environments. Two main difficulties are identified: 1) executing plans in an open-world environment (e.g., Minecraft) necessitates accurate and multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla planners do not consider how easy the current agent can achieve a given sub-task when ordering parallel sub-goals within a complicated plan, the resulting plan could be inefficient or even infeasible. To this end, we propose $underline{D}$escribe, $underline{E}$xplain, $underline{P}$lan and $underline{S}$elect ($textbf{DEPS}$), an interactive planning approach based on Large Language Models (LLMs). DEPS facilitates better error correction on initial LLM-generated $textit{plan}$ by integrating $textit{description}$ of the plan execution process and providing self-$textit{explanation}$ of feedback when encountering failures during the extended planning phases. Furthermore, it includes a goal $textit{selector}$, which is a trainable module that ranks parallel candidate sub-goals based on the estimated steps of completion, consequently refining the initial plan. Our experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks and nearly double the overall performances. Further testing reveals our method's general effectiveness in popularly adopted non-open-ended domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and exploratory studies detail how our design beats the counterparts and provide a promising update on the $texttt{ObtainDiamond}$ grand challenge with our approach. The code is released at https://github.com/CraftJarvis/MC-Planner.

7/9/2024

NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions

Elliot Gestrin, Marco Kuhlmann, Jendrik Seipp

Today's classical planners are powerful, but modeling input tasks in formats such as PDDL is tedious and error-prone. In contrast, planning with Large Language Models (LLMs) allows for almost any input text, but offers no guarantees on plan quality or even soundness. In an attempt to merge the best of these two approaches, some work has begun to use LLMs to automate parts of the PDDL creation process. However, these methods still require various degrees of expert input. We present NL2Plan, the first domain-agnostic offline LLM-driven planning system. NL2Plan uses an LLM to incrementally extract the necessary information from a short text prompt before creating a complete PDDL description of both the domain and the problem, which is finally solved by a classical planner. We evaluate NL2Plan on four planning domains and find that it solves 10 out of 15 tasks - a clear improvement over a plain chain-of-thought reasoning LLM approach, which only solves 2 tasks. Moreover, in two out of the five failure cases, instead of returning an invalid plan, NL2Plan reports that it failed to solve the task. In addition to using NL2Plan in end-to-end mode, users can inspect and correct all of its intermediate results, such as the PDDL representation, increasing explainability and making it an assistive tool for PDDL creation.

5/8/2024

Ask-before-Plan: Proactive Language Agents for Real-World Planning

Xuan Zhang, Yang Deng, Zifeng Ren, See-Kiong Ng, Tat-Seng Chua

The evolution of large language models (LLMs) has enhanced the planning capabilities of language agents in diverse real-world scenarios. Despite these advancements, the potential of LLM-powered agents to comprehend ambiguous user instructions for reasoning and decision-making is still under exploration. In this work, we introduce a new task, Proactive Agent Planning, which requires language agents to predict clarification needs based on user-agent conversation and agent-environment interaction, invoke external tools to collect valid information, and generate a plan to fulfill the user's demands. To study this practical problem, we establish a new benchmark dataset, Ask-before-Plan. To tackle the deficiency of LLMs in proactive planning, we propose a novel multi-agent framework, Clarification-Execution-Planning (texttt{CEP}), which consists of three agents specialized in clarification, execution, and planning. We introduce the trajectory tuning scheme for the clarification agent and static execution agent, as well as the memory recollection mechanism for the dynamic execution agent. Extensive evaluations and comprehensive analyses conducted on the Ask-before-Plan dataset validate the effectiveness of our proposed framework.

6/19/2024

💬

Large Language Models as Planning Domain Generators

James Oswald, Kavitha Srinivas, Harsha Kokel, Junkyu Lee, Michael Katz, Shirin Sohrabi

Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.

5/14/2024