PDDLEGO: Iterative Planning in Textual Environments

2405.19793

Published 5/31/2024 by Li Zhang, Peter Jansen, Tianyi Zhang, Peter Clark, Chris Callison-Burch, Niket Tandon

PDDLEGO: Iterative Planning in Textual Environments

Abstract

Planning in textual environments have been shown to be a long-standing challenge even for current models. A recent, promising line of work uses LLMs to generate a formal representation of the environment that can be solved by a symbolic planner. However, existing methods rely on a fully-observed environment where all entity states are initially known, so a one-off representation can be constructed, leading to a complete plan. In contrast, we tackle partially-observed environments where there is initially no sufficient information to plan for the end-goal. We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal. By accomplishing the sub-goal, more information is acquired to augment the representation, eventually achieving the end-goal. We show that plans produced by few-shot PDDLEGO are 43% more efficient than generating plans end-to-end on the Coin Collector simulation, with strong performance (98%) on the more complex Cooking World simulation where end-to-end LLMs fail to generate coherent plans (4%).

Create account to get full access

Overview

This paper, titled "\ours: Iterative Planning in Textual Environments," explores an approach to enable language models to engage in iterative planning and problem-solving within textual environments.
The key idea is to leverage the language understanding and generation capabilities of large language models to guide an iterative planning process, where the model can generate and refine plans step-by-step to achieve a given goal.
The authors propose a novel architecture and training procedure to enable this capability, and evaluate their approach on a range of textual environments and tasks.

Plain English Explanation

The paper focuses on teaching language models, the powerful AI systems that can understand and generate human-like text, to solve problems in a step-by-step fashion. Instead of just generating a single response, the model is trained to iteratively plan and refine its actions to achieve a desired goal.

Imagine you ask a language model to help you plan a trip. A typical model might provide a high-level itinerary. But the model described in this paper would engage in a more interactive process, breaking down the trip planning into smaller steps, considering alternative options, and refining the plan based on your feedback. This allows the model to tackle more complex, open-ended problems that require thoughtful planning and problem-solving.

The key innovation is the authors' novel architecture and training procedure that equips the language model with this iterative planning capability. They evaluate their approach on a variety of textual environments, like interactive fiction games and task-oriented dialogues, showing the model can effectively navigate these challenges and solve problems in a step-by-step manner.

This research represents an important step towards building language models that can engage in more intelligent, flexible, and goal-oriented reasoning, moving beyond simply generating text to actually solving problems in a strategic way. It lays the groundwork for language models that can be true collaborative partners in tackling complex, real-world challenges.

Technical Explanation

The paper proposes a novel approach, called "\ours," that enables large language models to engage in iterative planning within textual environments. At a high level, the authors introduce an architecture that combines the language understanding and generation capabilities of a base language model with a planning module that can iteratively refine and execute plans to achieve a given goal.

Specifically, the architecture consists of three key components:

Observation Encoder: This module encodes the current state of the textual environment, drawing on the language understanding capabilities of the base model.
Plan Generator: This component uses the encoded state information to generate a sequence of planning actions, which are then executed in the environment.
Plan Refiner: Based on the outcomes of the executed plan, this module refines the plan, generating a new sequence of actions to better achieve the goal.

The authors train this architecture end-to-end using a combination of imitation learning, where the model learns from expert demonstrations, and reinforcement learning, where the model learns by interacting with the environment and receiving rewards for progress towards the goal.

To evaluate their approach, the authors test "\ours" on a range of textual environments, including interactive fiction games and task-oriented dialogues. The results demonstrate that the model can effectively navigate these complex, open-ended domains, generating and iteratively refining plans to successfully complete the given tasks.

The authors also analyze the model's planning behavior, showing that it can capture high-level strategies as well as low-level, step-by-step actions required to solve the problems. This suggests that the iterative planning approach enables the language model to engage in more intelligent, goal-oriented reasoning beyond simple text generation.

Critical Analysis

The research presented in this paper represents an important step towards developing language models that can engage in more sophisticated, strategic problem-solving. By equipping language models with iterative planning capabilities, the authors have shown how these systems can move beyond simple text generation to tackle complex, open-ended challenges in a more intelligent and goal-oriented manner.

However, the paper also acknowledges several limitations and areas for further work. For example, the authors note that the current approach is limited to textual environments, and it would be valuable to explore how these iterative planning capabilities could be extended to other modalities, such as visual or multimodal environments. Additionally, the authors highlight the need for more robust evaluation methods to assess the model's planning behavior and generalization to novel tasks and domains.

Another potential area for improvement is the model's ability to handle uncertainty and partial observability, which are common in real-world problem-solving scenarios. The current approach assumes the model has access to a complete and accurate representation of the environment, which may not always be the case. Incorporating techniques to handle imperfect information and reason under uncertainty could further enhance the model's planning capabilities.

Finally, the authors mention the potential for the iterative planning approach to enable more transparent and interpretable decision-making, as the step-by-step planning process could be inspected and understood. Exploring ways to leverage this property to improve model transparency and accountability would be a valuable direction for future research.

Overall, this paper represents an important contribution to the field of language model-based planning and problem-solving. By introducing a novel architecture and training procedure that enables iterative planning, the authors have laid the groundwork for the development of more intelligent, goal-oriented language models that can collaborate with humans to tackle complex challenges. As the field continues to progress, addressing the limitations and expanding the capabilities of these systems will be crucial for realizing their full potential.

Conclusion

The paper "\ours: Iterative Planning in Textual Environments" presents a novel approach to equipping large language models with the ability to engage in iterative planning and problem-solving within textual environments. By combining the language understanding and generation capabilities of a base model with a planning module, the authors have developed a system that can generate and refine plans step-by-step to achieve a given goal.

The results demonstrate the effectiveness of this iterative planning approach on a range of textual environments, including interactive fiction games and task-oriented dialogues. This represents an important step towards building language models that can move beyond simple text generation to tackle more complex, open-ended challenges in a strategic and goal-oriented manner.

As the field of language model-based planning continues to evolve, addressing the limitations and expanding the capabilities of these systems will be crucial. Exploring extensions to other modalities, handling uncertainty and partial observability, and leveraging the transparency of the iterative planning process are all promising directions for future research. By continuing to push the boundaries of what language models can achieve, researchers can pave the way for the development of intelligent, collaborative AI systems that can work alongside humans to solve real-world problems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions

Elliot Gestrin, Marco Kuhlmann, Jendrik Seipp

Today's classical planners are powerful, but modeling input tasks in formats such as PDDL is tedious and error-prone. In contrast, planning with Large Language Models (LLMs) allows for almost any input text, but offers no guarantees on plan quality or even soundness. In an attempt to merge the best of these two approaches, some work has begun to use LLMs to automate parts of the PDDL creation process. However, these methods still require various degrees of expert input. We present NL2Plan, the first domain-agnostic offline LLM-driven planning system. NL2Plan uses an LLM to incrementally extract the necessary information from a short text prompt before creating a complete PDDL description of both the domain and the problem, which is finally solved by a classical planner. We evaluate NL2Plan on four planning domains and find that it solves 10 out of 15 tasks - a clear improvement over a plain chain-of-thought reasoning LLM approach, which only solves 2 tasks. Moreover, in two out of the five failure cases, instead of returning an invalid plan, NL2Plan reports that it failed to solve the task. In addition to using NL2Plan in end-to-end mode, users can inspect and correct all of its intermediate results, such as the PDDL representation, increasing explainability and making it an assistive tool for PDDL creation.

5/8/2024

cs.AI

💬

Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models

Houjun Liu

While language models (LMs) offer significant capability in zero-shot reasoning tasks across a wide range of domains, they do not perform satisfactorily in problems which requires multi-step reasoning. Previous approaches to mitigate this involves breaking a larger, multi-step task into sub-tasks and asking the language model to generate proposals (thoughts) for each sub-task and using exhaustive planning approaches such as DFS to compose a solution. In this work, we leverage this idea to introduce two new contributions: first, we formalize a planning-based approach to perform multi-step problem solving with LMs via Partially Observable Markov Decision Processes (POMDPs), with the LM's own reflections about the value of a state used as a search heuristic; second, leveraging the online POMDP solver POMCP, we demonstrate a superior success rate of 89.4% on the Game of 24 task as compared to existing approaches while also offering better anytime performance characteristics than fixed tree-search which is used previously. Taken together, these contributions allow modern LMs to decompose and solve larger-scale reasoning tasks more effectively.

5/1/2024

cs.CL

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

Jianliang He, Siyu Chen, Fengzhuo Zhang, Zhuoran Yang

In this work, from a theoretical lens, we aim to understand why large language model (LLM) empowered agents are able to solve decision-making problems in the physical world. To this end, consider a hierarchical reinforcement learning (RL) model where the LLM Planner and the Actor perform high-level task planning and low-level execution, respectively. Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting. Under proper assumptions on the pretraining data, we prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning. Additionally, we highlight the necessity for exploration beyond the subgoals derived from BAIL by proving that naively executing the subgoals returned by LLM leads to a linear regret. As a remedy, we introduce an $epsilon$-greedy exploration strategy to BAIL, which is proven to incur sublinear regret when the pretraining error is small. Finally, we extend our theoretical framework to include scenarios where the LLM Planner serves as a world model for inferring the transition model of the environment and to multi-agent settings, enabling coordination among multiple Actors.

5/31/2024

cs.LG cs.AI cs.CL

💬

Large Language Models as Planning Domain Generators

James Oswald, Kavitha Srinivas, Harsha Kokel, Junkyu Lee, Michael Katz, Shirin Sohrabi

Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.

5/14/2024

cs.CL cs.AI