TIC: Translate-Infer-Compile for accurate text to plan using LLMs and Logical Representations

Read original: arXiv:2402.06608 - Published 7/2/2024 by Sudhir Agarwal, Anu Sreepathy

🖼️

Overview

This research paper explores a method for generating plans from natural language planning task requests.
The authors leverage the strengths of both large language models (LLMs) and classical planning tools to address the limitations of each approach.
The proposed approach, called the TIC (Translate-Infer-Compile) approach, uses an LLM to generate an intermediate representation of the planning task, which is then used to derive additional information and generate the final planning task definition in the Planning Domain Definition Language (PDDL).

Plain English Explanation

The paper focuses on the problem of generating plans from natural language instructions. For example, imagine you need to plan a trip to the grocery store, but you don't want to write out a detailed set of instructions. Instead, you could describe the task in plain language, like "Go to the grocery store, buy milk, eggs, and bread, then come back home."

Traditional planning systems, called classical planners, are very good at generating plans, but they require the task to be described in a specific, structured language called PDDL. On the other hand, large language models (LLMs) like GPT-3 are excellent at processing natural language, but they don't perform well at actual planning tasks.

The authors propose a hybrid approach that combines the strengths of both techniques. First, they use an LLM to generate an intermediate representation of the natural language task description. Then, they use a logic reasoner to infer additional information from this intermediate representation. Finally, they compile this base and inferred information into the final PDDL task definition, which can be used by a classical planner to generate the plan.

The key benefit of this approach is that it reduces the errors introduced by the LLM, as the LLM is only responsible for generating the intermediate representation, not the final PDDL. This allows the system to achieve high accuracy on PDDL generation, even for complex planning tasks.

Technical Explanation

The paper presents a three-step approach called TIC (Translate-Infer-Compile) for generating plans from natural language task descriptions:

Translate: An LLM is used to generate an intermediate representation of the natural language task description. This intermediate representation is designed to be logically interpretable, making it easier to work with than the original natural language input.
Infer: A logic reasoner, specifically an Answer Set Programming (ASP) solver, is used to derive additional logically dependent information from the intermediate representation. This step helps to fill in any gaps or missing details in the original task description.
Compile: The base information from the intermediate representation and the inferred information are then used to generate the final PDDL task definition, which can be used by a classical planner to compute a plan.

The authors compare their TIC approach to previous methods that used LLMs to generate PDDL directly, and they find that the TIC approach significantly reduces the errors introduced by the LLM. This is because the LLM is only responsible for the intermediate representation, which is easier to generate accurately than the full PDDL specification.

The authors evaluate their approach on seven different planning domains and find that it achieves high accuracy on PDDL generation, outperforming the direct LLM-based approach for at least one of the LLMs tested.

Critical Analysis

The paper presents a novel and promising approach to leveraging the strengths of both LLMs and classical planners for natural language planning tasks. The key innovation is the use of an intermediate representation that is designed to be logically interpretable, which allows the system to reduce the errors introduced by the LLM.

However, the paper does not provide a comprehensive evaluation of the approach's performance compared to other state-of-the-art methods. The authors only compare their approach to a direct LLM-based method, and it would be valuable to see how it performs against other hybrid approaches or specialized natural language planning systems.

Additionally, the paper does not address the potential limitations or biases of the LLM used in the system. LLMs can sometimes produce outputs that are plausible but factually incorrect or reflect societal biases, and it would be important to understand how these issues might impact the performance of the TIC approach.

Finally, the paper does not discuss the computational complexity or runtime performance of the TIC approach, which could be an important consideration for real-world applications. As the system involves multiple steps, including the use of an ASP solver, the overall runtime and scalability of the approach should be examined.

Conclusion

The TIC approach presented in this paper represents an important step forward in the field of natural language planning. By leveraging the strengths of both LLMs and classical planners, the authors have developed a method that can generate accurate PDDL task definitions from natural language inputs, paving the way for more accessible and user-friendly planning systems.

While the paper raises some questions about the approach's limitations and performance compared to other methods, the core ideas behind the TIC approach are promising and could have significant implications for the development of planning-aware techniques that can bridge the gap between natural language and formal planning representations. As the field of large language models for planning continues to evolve, this research provides a valuable contribution to our understanding of how these powerful language models can be effectively integrated into real-world planning applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

TIC: Translate-Infer-Compile for accurate text to plan using LLMs and Logical Representations

Sudhir Agarwal, Anu Sreepathy

We study the problem of generating plans for given natural language planning task requests. On one hand, LLMs excel at natural language processing but do not perform well on planning. On the other hand, classical planning tools excel at planning tasks but require input in a structured language such as the Planning Domain Definition Language (PDDL). We leverage the strengths of both the techniques by using an LLM for generating the PDDL representation (task PDDL) of planning task requests followed by using a classical planner for computing a plan. Unlike previous approaches that use LLMs for generating task PDDLs directly, our approach comprises of (a) translate: using an LLM only for generating a logically interpretable intermediate representation of natural language task description, (b) infer: deriving additional logically dependent information from the intermediate representation using a logic reasoner (currently, Answer Set Programming solver), and (c) compile: generating the target task PDDL from the base and inferred information. We observe that using an LLM to only output the intermediate representation significantly reduces LLM errors. Consequently, TIC approach achieves, for at least one LLM, high accuracy on task PDDL generation for all seven domains of our evaluation dataset.

7/2/2024

NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions

Elliot Gestrin, Marco Kuhlmann, Jendrik Seipp

Today's classical planners are powerful, but modeling input tasks in formats such as PDDL is tedious and error-prone. In contrast, planning with Large Language Models (LLMs) allows for almost any input text, but offers no guarantees on plan quality or even soundness. In an attempt to merge the best of these two approaches, some work has begun to use LLMs to automate parts of the PDDL creation process. However, these methods still require various degrees of expert input. We present NL2Plan, the first domain-agnostic offline LLM-driven planning system. NL2Plan uses an LLM to incrementally extract the necessary information from a short text prompt before creating a complete PDDL description of both the domain and the problem, which is finally solved by a classical planner. We evaluate NL2Plan on four planning domains and find that it solves 10 out of 15 tasks - a clear improvement over a plain chain-of-thought reasoning LLM approach, which only solves 2 tasks. Moreover, in two out of the five failure cases, instead of returning an invalid plan, NL2Plan reports that it failed to solve the task. In addition to using NL2Plan in end-to-end mode, users can inspect and correct all of its intermediate results, such as the PDDL representation, increasing explainability and making it an assistive tool for PDDL creation.

5/8/2024

💬

Large Language Models as Planning Domain Generators

James Oswald, Kavitha Srinivas, Harsha Kokel, Junkyu Lee, Michael Katz, Shirin Sohrabi

Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.

5/14/2024

🔗

LLM+Reasoning+Planning for supporting incomplete user queries in presence of APIs

Sudhir Agarwal, Anu Sreepathy, David H. Alonso, Prarit Lamba

Recent availability of Large Language Models (LLMs) has led to the development of numerous LLM-based approaches aimed at providing natural language interfaces for various end-user tasks. These end-user tasks in turn can typically be accomplished by orchestrating a given set of APIs. In practice, natural language task requests (user queries) are often incomplete, i.e., they may not contain all the information required by the APIs. While LLMs excel at natural language processing (NLP) tasks, they frequently hallucinate on missing information or struggle with orchestrating the APIs. The key idea behind our proposed approach is to leverage logical reasoning and classical AI planning along with an LLM for accurately answering user queries including identification and gathering of any missing information in these queries. Our approach uses an LLM and ASP (Answer Set Programming) solver to translate a user query to a representation in Planning Domain Definition Language (PDDL) via an intermediate representation in ASP. We introduce a special API get_info_api for gathering missing information. We model all the APIs as PDDL actions in a way that supports dataflow between the APIs. Our approach then uses a classical AI planner to generate an orchestration of API calls (including calls to get_info_api) to answer the user query. Our evaluation results show that our approach significantly outperforms a pure LLM based approach by achieving over 95% success rate in most cases on a dataset containing complete and incomplete single goal and multi-goal queries where the multi-goal queries may or may not require dataflow among the APIs.

5/22/2024