Language Models can Infer Action Semantics for Classical Planners from Environment Feedback

2406.02791

Published 6/6/2024 by Wang Zhu, Ishika Singh, Robin Jia, Jesse Thomason

Language Models can Infer Action Semantics for Classical Planners from Environment Feedback

Abstract

Classical planning approaches guarantee finding a set of actions that can achieve a given goal state when possible, but require an expert to specify logical action semantics that govern the dynamics of the environment. Researchers have shown that Large Language Models (LLMs) can be used to directly infer planning steps based on commonsense knowledge and minimal domain information alone, but such plans often fail on execution. We bring together the strengths of classical planning and LLM commonsense inference to perform domain induction, learning and validating action pre- and post-conditions based on closed-loop interactions with the environment itself. We propose PSALM, which leverages LLM inference to heuristically complete partial plans emitted by a classical planner given partial domain knowledge, as well as to infer the semantic rules of the domain in a logical language based on environment feedback after execution. Our analysis on 7 environments shows that with just one expert-curated example plans, using LLMs as heuristic planners and rule predictors achieves lower environment execution steps and environment resets than random exploration while simultaneously recovering the underlying ground truth action semantics of the domain.

Create account to get full access

Overview

This paper explores how large language models (LLMs) can learn the semantics of actions in classical planning domains from environment feedback, without being explicitly programmed with that knowledge.
The researchers developed a framework that allows LLMs to infer action semantics by interacting with simulated environments and observing the effects of their actions.
This approach could help bridge the gap between the symbolic world of classical planning and the connectionist representations of LLMs, enabling them to perform more complex reasoning and problem-solving tasks.

Plain English Explanation

In this research, the authors wanted to see if large language models (LLMs) - the powerful AI systems that can generate human-like text - could learn the meaning and effects of actions in classical planning domains. Classical planning is a way of modeling real-world problems using a set of defined actions, their preconditions, and their effects.

Typically, classical planners have this action knowledge hard-coded into them. But the researchers wondered if an LLM could instead

learn

this action semantics by interacting with simulated environments and observing the results of its actions. [link to "Learning Planning Abstractions from Language"]

The key idea is that by watching how the environment changes when the LLM takes different actions, it can start to infer the underlying logic and rules governing those actions. This could allow the LLM to plan and reason about complex tasks without needing explicit action definitions.

The researchers developed a framework to enable this process of action semantics learning. They showed that their approach allows the LLM to build up an understanding of actions that is comparable to the hard-coded knowledge used in traditional planning systems. [link to "Generating Consistent PDDL Domains from Large Language Models"]

This is an important step towards bridging the gap between the symbolic world of classical planning and the more fluid, connectionist representations used in LLMs. If LLMs can learn to reason about actions and their effects, it could unlock their ability to tackle more complex, real-world problem-solving tasks. [link to "From Words to Actions: Unveiling the Theoretical Underpinnings"]

Technical Explanation

The paper proposes a framework that allows large language models (LLMs) to learn the semantics of actions in classical planning domains through interaction with simulated environments. [link to "What's a Plan? Evaluating and Developing Planning-Aware Techniques"]

The key components of the framework are:

An LLM that is used to generate action proposals based on the current state of the environment.
A classical planner that can execute the proposed actions and observe their effects.
A reward function that provides feedback to the LLM based on the observed effects, allowing it to refine its understanding of the action semantics.

The authors evaluate their framework on a set of classic planning domains, such as Blocks World and Grid Navigation. They show that the LLM is able to learn action semantics that are comparable to the hard-coded knowledge used in traditional planning systems.

Additionally, the authors demonstrate that the learned action semantics can be used to guide the classical planner in finding solutions to complex planning problems. This suggests that their approach could help bridge the gap between the symbolic representations of classical planning and the more fluid, connectionist representations used in LLMs.

Critical Analysis

The paper presents a promising approach for enabling LLMs to learn action semantics and reasoning capabilities that are traditionally the domain of classical planners. By allowing the LLM to interact with simulated environments and learn from the feedback, the researchers have shown that it is possible to build up an understanding of actions and their effects without hard-coding this knowledge.

However, the paper does not address some potential limitations and areas for further research. For example, the framework was only evaluated on relatively simple planning domains, and it's unclear how well it would scale to more complex, real-world problems. [link to "Large Language Models as Planning Domain Generators"]

Additionally, the paper does not discuss the robustness of the learned action semantics - for example, how well the LLM would perform if the environment or task was changed slightly. There may be concerns about the stability and generalization of the learned knowledge.

Further research could also explore ways to make the learning process more efficient and data-efficient, as currently, it requires a significant amount of interaction with the simulated environment. Techniques from the field of sample-efficient reinforcement learning could potentially be applied to this problem.

Overall, the paper presents an interesting and promising approach, but there are still many open questions and areas for future work to fully realize the potential of using LLMs for classical planning and reasoning tasks.

Conclusion

This paper demonstrates a novel framework that allows large language models (LLMs) to learn the semantics of actions in classical planning domains through interaction with simulated environments. By observing the effects of their actions and receiving feedback, the LLMs are able to build up an understanding of the underlying logic and rules governing those actions.

This is an important step towards bridging the gap between the symbolic world of classical planning and the more fluid, connectionist representations used in LLMs. If LLMs can learn to reason about actions and their effects, it could unlock their ability to tackle more complex, real-world problem-solving tasks that require advanced planning and reasoning capabilities.

While the paper presents promising results, there are still many open questions and areas for future research to fully realize the potential of this approach. Nonetheless, the work represents an exciting advancement in the field of AI planning and reasoning, with potential applications in a wide range of domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

Large Language Models as Planning Domain Generators

James Oswald, Kavitha Srinivas, Harsha Kokel, Junkyu Lee, Michael Katz, Shirin Sohrabi

Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.

5/14/2024

cs.CL cs.AI

💬

What's the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models

Eran Hirsch, Guy Uziel, Ateret Anaby-Tavor

Planning is a fundamental task in artificial intelligence that involves finding a sequence of actions that achieve a specified goal in a given environment. Large language models (LLMs) are increasingly used for applications that require planning capabilities, such as web or embodied agents. In line with recent studies, we demonstrate through experimentation that LLMs lack necessary skills required for planning. Based on these observations, we advocate for the potential of a hybrid approach that combines LLMs with classical planning methodology. Then, we introduce SimPlan, a novel hybrid-method, and evaluate its performance in a new challenging setup. Our extensive experiments across various planning domains demonstrate that SimPlan significantly outperforms existing LLM-based planners.

5/24/2024

cs.CL

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

Jianliang He, Siyu Chen, Fengzhuo Zhang, Zhuoran Yang

In this work, from a theoretical lens, we aim to understand why large language model (LLM) empowered agents are able to solve decision-making problems in the physical world. To this end, consider a hierarchical reinforcement learning (RL) model where the LLM Planner and the Actor perform high-level task planning and low-level execution, respectively. Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting. Under proper assumptions on the pretraining data, we prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning. Additionally, we highlight the necessity for exploration beyond the subgoals derived from BAIL by proving that naively executing the subgoals returned by LLM leads to a linear regret. As a remedy, we introduce an $epsilon$-greedy exploration strategy to BAIL, which is proven to incur sublinear regret when the pretraining error is small. Finally, we extend our theoretical framework to include scenarios where the LLM Planner serves as a world model for inferring the transition model of the environment and to multi-agent settings, enabling coordination among multiple Actors.

5/31/2024

cs.LG cs.AI cs.CL

Learning Planning Abstractions from Language

Weiyu Liu, Geng Chen, Joy Hsu, Jiayuan Mao, Jiajun Wu

This paper presents a framework for learning state and action abstractions in sequential decision-making domains. Our framework, planning abstraction from language (PARL), utilizes language-annotated demonstrations to automatically discover a symbolic and abstract action space and induce a latent state abstraction based on it. PARL consists of three stages: 1) recovering object-level and action concepts, 2) learning state abstractions, abstract action feasibility, and transition models, and 3) applying low-level policies for abstract actions. During inference, given the task description, PARL first makes abstract action plans using the latent transition and feasibility functions, then refines the high-level plan using low-level policies. PARL generalizes across scenarios involving novel object instances and environments, unseen concept compositions, and tasks that require longer planning horizons than settings it is trained on.

5/8/2024

cs.RO cs.AI