LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

2402.01817

Published 6/13/2024 by Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, Anil Murthy

cs.AI cs.LG

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

Abstract

There is considerable confusion about the role of Large Language Models (LLMs) in planning and reasoning tasks. On one side are over-optimistic claims that LLMs can indeed do these tasks with just the right prompting or self-verification strategies. On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers. In this position paper, we take the view that both these extremes are misguided. We argue that auto-regressive LLMs cannot, by themselves, do planning or self-verification (which is after all a form of reasoning), and shed some light on the reasons for misunderstandings in the literature. We will also argue that LLMs should be viewed as universal approximate knowledge sources that have much more meaningful roles to play in planning/reasoning tasks beyond simple front-end/back-end format translators. We present a vision of {bf LLM-Modulo Frameworks} that combine the strengths of LLMs with external model-based verifiers in a tighter bi-directional interaction regime. We will show how the models driving the external verifiers themselves can be acquired with the help of LLMs. We will also argue that rather than simply pipelining LLMs and symbolic components, this LLM-Modulo Framework provides a better neuro-symbolic approach that offers tighter integration between LLMs and symbolic components, and allows extending the scope of model-based planning/reasoning regimes towards more flexible knowledge, problem and preference specifications.

Create account to get full access

Overview

• This paper examines the limitations of large language models (LLMs) in autonomous planning tasks and proposes an "LLM-Modulo" framework to leverage LLMs' strengths in supporting human-in-the-loop planning.

Plain English Explanation

Large language models (LLMs) like GPT-3 have shown impressive capabilities in generating human-like text, answering questions, and even completing simple tasks. However, the paper argues that LLMs struggle with autonomous planning - the ability to generate executable plans to achieve specific goals.

The key limitation is that LLMs lack the structured reasoning and decision-making capabilities required for effective planning. They can generate descriptive text about plans, but cannot translate that into actionable steps that can be directly executed. This makes LLMs less suitable for fully autonomous planning tasks.

To address this, the paper introduces the "LLM-Modulo" framework, which combines the natural language abilities of LLMs with specialized planning modules. In this approach, the LLM is used to assist and support human planners, rather than attempting to plan autonomously. The LLM can help generate initial plan ideas, provide relevant information, and engage in a collaborative planning process with humans.

By leveraging the complementary strengths of LLMs and specialized planning components, the LLM-Modulo framework aims to enable more effective and efficient planning, with the human remaining in control of the core decision-making and execution.

Technical Explanation

The paper first outlines the key limitations of LLMs in autonomous planning tasks. It explains that while LLMs can generate descriptive text about plans, they lack the structured reasoning and decision-making capabilities required to translate those plans into executable steps. This makes LLMs less suitable for fully autonomous planning.

To address this, the authors propose the "LLM-Modulo" framework, which combines LLMs with specialized planning modules. In this approach, the LLM is used to assist and support human planners, rather than attempting to plan autonomously.

The LLM-Modulo framework works by having the LLM generate initial plan ideas and provide relevant information to the human planner. The human then works with the planning modules to refine and execute the plan. This collaborative process allows the LLM to leverage its natural language abilities while the planning modules handle the structured reasoning and decision-making required for effective planning.

The authors discuss several use cases where the LLM-Modulo framework could be applied, such as link to "Robust Planning in LLM-Modulo Frameworks: A Case Study", link to "Large Language Models as Planning Domain Generators", and link to "Can Only LLMs Do Reasoning? The Potential of Small Language Models".

Critical Analysis

The paper makes a compelling case that while LLMs have impressive capabilities, they are limited in their ability to autonomously generate executable plans. The authors acknowledge that LLMs can be useful in supporting human planning, but they correctly identify the need for specialized planning components to handle the structured reasoning and decision-making required.

One potential concern raised in the paper is the risk of overreliance on LLMs, which could lead to human planners becoming overly dependent on the language model's suggestions and losing critical thinking skills. The authors suggest that the LLM-Modulo framework is designed to mitigate this by keeping the human planner in the loop, but further research may be needed to fully understand the long-term implications of this approach.

Additionally, the paper does not explore the potential challenges of integrating LLMs with specialized planning modules, such as ensuring seamless communication and alignment of goals and constraints. This is an area that may warrant further investigation.

Conclusion

This paper highlights the limitations of large language models (LLMs) in autonomous planning tasks and proposes the "LLM-Modulo" framework as a solution. By combining the natural language abilities of LLMs with specialized planning modules, the LLM-Modulo framework aims to leverage the strengths of both components to enable more effective and efficient planning, with the human remaining in control of the core decision-making and execution.

The insights presented in this paper have important implications for the development and deployment of LLMs in real-world applications, particularly those involving complex planning and decision-making. As the field of AI continues to evolve, research like this will be crucial in understanding the limitations and potential of language models, and in developing innovative frameworks to unlock their full capabilities.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning

Atharva Gundawar, Mudit Verma, Lin Guan, Karthik Valmeekam, Siddhant Bhambri, Subbarao Kambhampati

As the applicability of Large Language Models (LLMs) extends beyond traditional text processing tasks, there is a burgeoning interest in their potential to excel in planning and reasoning assignments, realms traditionally reserved for System 2 cognitive competencies. Despite their perceived versatility, the research community is still unraveling effective strategies to harness these models in such complex domains. The recent discourse introduced by the paper on LLM Modulo marks a significant stride, proposing a conceptual framework that enhances the integration of LLMs into diverse planning and reasoning activities. This workshop paper delves into the practical application of this framework within the domain of travel planning, presenting a specific instance of its implementation. We are using the Travel Planning benchmark by the OSU NLP group, a benchmark for evaluating the performance of LLMs in producing valid itineraries based on user queries presented in natural language. While popular methods of enhancing the reasoning abilities of LLMs such as Chain of Thought, ReAct, and Reflexion achieve a meager 0%, 0.6%, and 0% with GPT3.5-Turbo respectively, our operationalization of the LLM-Modulo framework for TravelPlanning domain provides a remarkable improvement, enhancing baseline performances by 4.6x for GPT4-Turbo and even more for older models like GPT3.5-Turbo from 0% to 5%. Furthermore, we highlight the other useful roles of LLMs in the planning pipeline, as suggested in LLM-Modulo, which can be reliably operationalized such as extraction of useful critics and reformulator for critics.

6/3/2024

cs.AI

💬

Large Language Models as Planning Domain Generators

James Oswald, Kavitha Srinivas, Harsha Kokel, Junkyu Lee, Michael Katz, Shirin Sohrabi

Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.

5/14/2024

cs.CL cs.AI

Can only LLMs do Reasoning?: Potential of Small Language Models in Task Planning

Gawon Choi, Hyemin Ahn

In robotics, the use of Large Language Models (LLMs) is becoming prevalent, especially for understanding human commands. In particular, LLMs are utilized as domain-agnostic task planners for high-level human commands. LLMs are capable of Chain-of-Thought (CoT) reasoning, and this allows LLMs to be task planners. However, we need to consider that modern robots still struggle to perform complex actions, and the domains where robots can be deployed are limited in practice. This leads us to pose a question: If small LMs can be trained to reason in chains within a single domain, would even small LMs be good task planners for the robots? To train smaller LMs to reason in chains, we build `COmmand-STeps datasets' (COST) consisting of high-level commands along with corresponding actionable low-level steps, via LLMs. We release not only our datasets but also the prompt templates used to generate them, to allow anyone to build datasets for their domain. We compare GPT3.5 and GPT4 with the finetuned GPT2 for task domains, in tabletop and kitchen environments, and the result shows that GPT2-medium is comparable to GPT3.5 for task planning in a specific domain. Our dataset, code, and more output samples can be found in https://github.com/Gawon-Choi/small-LMs-Task-Planning

4/8/2024

cs.RO cs.AI cs.LG

💬

Large Language Models Can Plan Your Travels Rigorously with Formal Verification Tools

Yilun Hao, Yongchao Chen, Yang Zhang, Chuchu Fan

The recent advancements of Large Language Models (LLMs), with their abundant world knowledge and capabilities of tool-using and reasoning, fostered many LLM planning algorithms. However, LLMs have not shown to be able to accurately solve complex combinatorial optimization problems. In Xie et al. (2024), the authors proposed TravelPlanner, a U.S. domestic travel planning benchmark, and showed that LLMs themselves cannot make travel plans that satisfy user requirements with a best success rate of 0.6%. In this work, we propose a framework that enables LLMs to formally formulate and solve the travel planning problem as a satisfiability modulo theory (SMT) problem and use SMT solvers interactively and automatically solve the combinatorial search problem. The SMT solvers guarantee the satisfiable of input constraints and the LLMs can enable a language-based interaction with our framework. When the input constraints cannot be satisfiable, our LLM-based framework will interactively offer suggestions to users to modify their travel requirements via automatic reasoning using the SMT solvers. We evaluate our framework with TravelPlanner and achieve a success rate of 97%. We also create a separate dataset that contain international travel benchmarks and use both dataset to evaluate the effectiveness of our interactive planning framework when the initial user queries cannot be satisfied. Our framework could generate valid plans with an average success rate of 78.6% for our dataset and 85.0% for TravelPlanner according to diverse humans preferences.

4/22/2024

cs.AI cs.CL cs.HC