Generating consistent PDDL domains with Large Language Models

2404.07751

Published 4/12/2024 by Pavel Smirnov, Frank Joublin, Antonello Ceravola, Michael Gienger

Generating consistent PDDL domains with Large Language Models

Abstract

Large Language Models (LLMs) are capable of transforming natural language domain descriptions into plausibly looking PDDL markup. However, ensuring that actions are consistent within domains still remains a challenging task. In this paper we present a novel concept to significantly improve the quality of LLM-generated PDDL models by performing automated consistency checking during the generation process. Although the proposed consistency checking strategies still can't guarantee absolute correctness of generated models, they can serve as valuable source of feedback reducing the amount of correction efforts expected from a human in the loop. We demonstrate the capabilities of our error detection approach on a number of classical and custom planning domains (logistics, gripper, tyreworld, household, pizza).

Create account to get full access

Overview

This paper explores the use of large language models (LLMs) to generate consistent PDDL (Planning Domain Definition Language) domains for automated planning tasks.
PDDL is a widely used language for representing planning problems, but manually creating PDDL domains can be a tedious and error-prone process.
The researchers investigate whether LLMs, which have shown impressive natural language capabilities, can be leveraged to generate coherent and consistent PDDL domains.

Plain English Explanation

Large language models (LLMs) are powerful AI systems that can understand and generate human-like text. In this paper, the researchers explore using LLMs to automatically create PDDL domains, which are the building blocks for automated planning tasks. PDDL is a specialized language used to describe the available actions, objects, and constraints in a planning problem, but writing PDDL domains by hand can be a time-consuming and error-prone process.

The researchers hypothesize that LLMs, with their impressive language understanding and generation abilities, could be used to generate PDDL domains more efficiently and consistently than manual approaches. By providing an LLM with a high-level description of a planning problem, the model might be able to translate that into a valid PDDL domain, saving time and effort for researchers and practitioners working on automated planning tasks.

Technical Explanation

The researchers propose a method for using LLMs to generate PDDL domains. They first fine-tune a pre-trained LLM on a dataset of existing PDDL domains, teaching the model the structure and syntax of the PDDL language. Then, they prompt the fine-tuned model with a high-level description of a planning problem and ask it to generate the corresponding PDDL domain.

To evaluate the approach, the researchers conduct experiments on several benchmark planning domains, comparing the LLM-generated PDDL domains to those created manually by domain experts. They assess the consistency and validity of the generated domains, as well as the time savings compared to manual PDDL domain creation.

The results suggest that LLMs can indeed generate PDDL domains that are largely consistent with the expected structure and semantics of the PDDL language. The generated domains are often valid and can be used directly in automated planning systems, potentially saving significant time and effort for researchers and practitioners.

Critical Analysis

The paper presents a promising approach for leveraging LLMs to automate the creation of PDDL domains, which could have significant practical benefits for the field of automated planning. However, the researchers acknowledge several limitations and areas for further exploration:

The quality and consistency of the generated PDDL domains may depend on the specific LLM used and the quality of the training data. More research is needed to understand the robustness of the approach across different LLM architectures and datasets.
The paper focuses on relatively simple planning domains, and it's unclear how well the approach would scale to more complex, real-world planning problems. Exploring autonomous agents through the lens of large language models and GenChip: Generating Robot Policy Code with High Precision discuss similar challenges in applying LLMs to more complex AI systems.
The paper does not address the potential for LLM-generated PDDL domains to introduce new types of errors or inconsistencies that may not be present in manually created domains. Further research is needed to understand the reliability and safety implications of this approach.

Conclusion

This paper presents a novel approach for using large language models to automatically generate PDDL domains, a critical component of automated planning systems. The results suggest that LLMs can be fine-tuned to generate consistent and valid PDDL domains, potentially saving significant time and effort for researchers and practitioners working in this field.

While the approach shows promise, the researchers acknowledge several limitations and areas for further exploration. Continued research in this direction, as discussed in Can Only LLMs Do Reasoning? The Potential of Small-Scale Models, Large Language Models as Oracles: Instantiating Ontologies, and GoEx: Perspectives and Designs Towards Runtime Autonomous LLM, could lead to more robust and reliable approaches for leveraging LLMs in automated planning and other AI domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

Large Language Models as Planning Domain Generators

James Oswald, Kavitha Srinivas, Harsha Kokel, Junkyu Lee, Michael Katz, Shirin Sohrabi

Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.

5/14/2024

cs.CL cs.AI

Language Models can Infer Action Semantics for Classical Planners from Environment Feedback

Wang Zhu, Ishika Singh, Robin Jia, Jesse Thomason

Classical planning approaches guarantee finding a set of actions that can achieve a given goal state when possible, but require an expert to specify logical action semantics that govern the dynamics of the environment. Researchers have shown that Large Language Models (LLMs) can be used to directly infer planning steps based on commonsense knowledge and minimal domain information alone, but such plans often fail on execution. We bring together the strengths of classical planning and LLM commonsense inference to perform domain induction, learning and validating action pre- and post-conditions based on closed-loop interactions with the environment itself. We propose PSALM, which leverages LLM inference to heuristically complete partial plans emitted by a classical planner given partial domain knowledge, as well as to infer the semantic rules of the domain in a logical language based on environment feedback after execution. Our analysis on 7 environments shows that with just one expert-curated example plans, using LLMs as heuristic planners and rule predictors achieves lower environment execution steps and environment resets than random exploration while simultaneously recovering the underlying ground truth action semantics of the domain.

6/6/2024

cs.AI cs.CL cs.RO

NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions

Elliot Gestrin, Marco Kuhlmann, Jendrik Seipp

Today's classical planners are powerful, but modeling input tasks in formats such as PDDL is tedious and error-prone. In contrast, planning with Large Language Models (LLMs) allows for almost any input text, but offers no guarantees on plan quality or even soundness. In an attempt to merge the best of these two approaches, some work has begun to use LLMs to automate parts of the PDDL creation process. However, these methods still require various degrees of expert input. We present NL2Plan, the first domain-agnostic offline LLM-driven planning system. NL2Plan uses an LLM to incrementally extract the necessary information from a short text prompt before creating a complete PDDL description of both the domain and the problem, which is finally solved by a classical planner. We evaluate NL2Plan on four planning domains and find that it solves 10 out of 15 tasks - a clear improvement over a plain chain-of-thought reasoning LLM approach, which only solves 2 tasks. Moreover, in two out of the five failure cases, instead of returning an invalid plan, NL2Plan reports that it failed to solve the task. In addition to using NL2Plan in end-to-end mode, users can inspect and correct all of its intermediate results, such as the PDDL representation, increasing explainability and making it an assistive tool for PDDL creation.

5/8/2024

cs.AI

Towards Logically Consistent Language Models via Probabilistic Reasoning

Diego Calanzone, Stefano Teso, Antonio Vergari

Large language models (LLMs) are a promising venue for natural language understanding and generation tasks. However, current LLMs are far from reliable: they are prone to generate non-factual information and, more crucially, to contradict themselves when prompted to reason about beliefs of the world. These problems are currently addressed with large scale fine-tuning or by delegating consistent reasoning to external tools. In this work, we strive for a middle ground and introduce a training objective based on principled probabilistic reasoning that teaches a LLM to be consistent with external knowledge in the form of a set of facts and rules. Fine-tuning with our loss on a limited set of facts enables our LLMs to be more logically consistent than previous baselines and allows them to extrapolate to unseen but semantically similar factual knowledge more systematically.

4/22/2024

cs.LG cs.CL