Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning

2405.20625

Published 6/3/2024 by Atharva Gundawar, Mudit Verma, Lin Guan, Karthik Valmeekam, Siddhant Bhambri, Subbarao Kambhampati

cs.AI

Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning

Abstract

As the applicability of Large Language Models (LLMs) extends beyond traditional text processing tasks, there is a burgeoning interest in their potential to excel in planning and reasoning assignments, realms traditionally reserved for System 2 cognitive competencies. Despite their perceived versatility, the research community is still unraveling effective strategies to harness these models in such complex domains. The recent discourse introduced by the paper on LLM Modulo marks a significant stride, proposing a conceptual framework that enhances the integration of LLMs into diverse planning and reasoning activities. This workshop paper delves into the practical application of this framework within the domain of travel planning, presenting a specific instance of its implementation. We are using the Travel Planning benchmark by the OSU NLP group, a benchmark for evaluating the performance of LLMs in producing valid itineraries based on user queries presented in natural language. While popular methods of enhancing the reasoning abilities of LLMs such as Chain of Thought, ReAct, and Reflexion achieve a meager 0%, 0.6%, and 0% with GPT3.5-Turbo respectively, our operationalization of the LLM-Modulo framework for TravelPlanning domain provides a remarkable improvement, enhancing baseline performances by 4.6x for GPT4-Turbo and even more for older models like GPT3.5-Turbo from 0% to 5%. Furthermore, we highlight the other useful roles of LLMs in the planning pipeline, as suggested in LLM-Modulo, which can be reliably operationalized such as extraction of useful critics and reformulator for critics.

Create account to get full access

Overview

This paper presents the "LLM-Modulo" framework, which leverages large language models (LLMs) for robust and human-like planning, using travel planning as a case study.
The framework combines LLMs with traditional planning modules to enable multi-phase planning that can handle complex real-world scenarios.
The authors demonstrate the capabilities of their approach in several travel planning tasks, showing how LLMs can be used as both planning domain generators and urban residents to improve the planning process.

Plain English Explanation

The research paper explores a new framework called "LLM-Modulo" that aims to make planning tasks, like travel planning, more robust and human-like by combining large language models (LLMs) with traditional planning modules. Large Language Models as Planning Domain Generators and Large Language Models as Urban Residents (LLM) are key components of this framework.

The researchers found that using LLMs in this way can help handle the complexity of real-world planning scenarios more effectively than traditional planning approaches alone. For example, the LLM-Modulo framework was able to generate detailed travel plans that accounted for various constraints and preferences, similar to how humans plan their own trips. Human-like Reasoning Framework for Multi-Phases Planning

Overall, this work demonstrates how integrating LLMs with classical planning techniques can lead to more robust and human-like planning capabilities, which could have significant implications for a wide range of applications beyond just travel planning.

Technical Explanation

The paper introduces the "LLM-Modulo" framework, which combines large language models (LLMs) with traditional planning modules to enable robust and human-like planning. The key components of this framework are:

Large Language Models as Planning Domain Generators: LLMs are used to generate the planning domain, including actions, states, and constraints, from natural language descriptions.
Human-like Reasoning Framework for Multi-Phases Planning: The framework supports multi-phase planning, where the LLM-generated domain is used by classical planning modules to generate and refine plans in an iterative manner, similar to how humans plan.
Large Language Models as Urban Residents (LLM): LLMs are also used to provide context-specific knowledge and preferences, acting as "urban residents" to inform the planning process.

The authors demonstrate the capabilities of the LLM-Modulo framework in the domain of travel planning, showing how it can handle complex real-world scenarios and generate robust, human-like travel plans. Through a series of experiments, they showcase the framework's ability to effectively combine LLM-generated planning domains with classical planning techniques to produce high-quality travel plans that account for various constraints and preferences.

Critical Analysis

The paper presents a compelling approach to integrating LLMs with traditional planning methods, but it also acknowledges several limitations and areas for future research:

The authors note that the performance of the LLM-Modulo framework is heavily dependent on the quality and coverage of the LLM used, which may be a significant constraint in real-world applications. Thought Search: Planning with Language Models Through a Lens
The framework currently relies on a relatively simple integration of the LLM and planning modules, and the authors suggest that more sophisticated techniques for combining these components could lead to further improvements.
The evaluation in the paper is limited to travel planning tasks, and the researchers acknowledge the need to test the framework's generalizability to other planning domains.

Additionally, while the paper demonstrates the potential benefits of the LLM-Modulo approach, it would be valuable to see more extensive comparisons with other state-of-the-art planning techniques to better assess the framework's overall performance and advantages.

Conclusion

The "LLM-Modulo" framework presented in this paper represents a promising approach to incorporating large language models into classical planning systems, with the goal of achieving more robust and human-like planning capabilities. By leveraging LLMs as both planning domain generators and contextual knowledge providers, the framework demonstrates the potential for these models to enhance traditional planning techniques in complex, real-world scenarios.

The travel planning case study showcases the framework's ability to generate detailed, human-like plans that account for various constraints and preferences. While the paper acknowledges some limitations, the research lays the groundwork for further exploration of how LLMs can be effectively integrated with classical planning methods to tackle a wide range of planning and decision-making challenges.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, Anil Murthy

There is considerable confusion about the role of Large Language Models (LLMs) in planning and reasoning tasks. On one side are over-optimistic claims that LLMs can indeed do these tasks with just the right prompting or self-verification strategies. On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers. In this position paper, we take the view that both these extremes are misguided. We argue that auto-regressive LLMs cannot, by themselves, do planning or self-verification (which is after all a form of reasoning), and shed some light on the reasons for misunderstandings in the literature. We will also argue that LLMs should be viewed as universal approximate knowledge sources that have much more meaningful roles to play in planning/reasoning tasks beyond simple front-end/back-end format translators. We present a vision of {bf LLM-Modulo Frameworks} that combine the strengths of LLMs with external model-based verifiers in a tighter bi-directional interaction regime. We will show how the models driving the external verifiers themselves can be acquired with the help of LLMs. We will also argue that rather than simply pipelining LLMs and symbolic components, this LLM-Modulo Framework provides a better neuro-symbolic approach that offers tighter integration between LLMs and symbolic components, and allows extending the scope of model-based planning/reasoning regimes towards more flexible knowledge, problem and preference specifications.

6/13/2024

cs.AI cs.LG

💬

Large Language Models Can Plan Your Travels Rigorously with Formal Verification Tools

Yilun Hao, Yongchao Chen, Yang Zhang, Chuchu Fan

The recent advancements of Large Language Models (LLMs), with their abundant world knowledge and capabilities of tool-using and reasoning, fostered many LLM planning algorithms. However, LLMs have not shown to be able to accurately solve complex combinatorial optimization problems. In Xie et al. (2024), the authors proposed TravelPlanner, a U.S. domestic travel planning benchmark, and showed that LLMs themselves cannot make travel plans that satisfy user requirements with a best success rate of 0.6%. In this work, we propose a framework that enables LLMs to formally formulate and solve the travel planning problem as a satisfiability modulo theory (SMT) problem and use SMT solvers interactively and automatically solve the combinatorial search problem. The SMT solvers guarantee the satisfiable of input constraints and the LLMs can enable a language-based interaction with our framework. When the input constraints cannot be satisfiable, our LLM-based framework will interactively offer suggestions to users to modify their travel requirements via automatic reasoning using the SMT solvers. We evaluate our framework with TravelPlanner and achieve a success rate of 97%. We also create a separate dataset that contain international travel benchmarks and use both dataset to evaluate the effectiveness of our interactive planning framework when the initial user queries cannot be satisfied. Our framework could generate valid plans with an average success rate of 78.6% for our dataset and 85.0% for TravelPlanner according to diverse humans preferences.

4/22/2024

cs.AI cs.CL cs.HC

A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models

Chengxing Xie, Difan Zou

Recent studies have highlighted their proficiency in some simple tasks like writing and coding through various reasoning strategies. However, LLM agents still struggle with tasks that require comprehensive planning, a process that challenges current models and remains a critical research issue. In this study, we concentrate on travel planning, a Multi-Phases planning problem, that involves multiple interconnected stages, such as outlining, information gathering, and planning, often characterized by the need to manage various constraints and uncertainties. Existing reasoning approaches have struggled to effectively address this complex task. Our research aims to address this challenge by developing a human-like planning framework for LLM agents, i.e., guiding the LLM agent to simulate various steps that humans take when solving Multi-Phases problems. Specifically, we implement several strategies to enable LLM agents to generate a coherent outline for each travel query, mirroring human planning patterns. Additionally, we integrate Strategy Block and Knowledge Block into our framework: Strategy Block facilitates information collection, while Knowledge Block provides essential information for detailed planning. Through our extensive experiments, we demonstrate that our framework significantly improves the planning capabilities of LLM agents, enabling them to tackle the travel planning task with improved efficiency and effectiveness. Our experimental results showcase the exceptional performance of the proposed framework; when combined with GPT-4-Turbo, it attains $10times$ the performance gains in comparison to the baseline framework deployed on GPT-4-Turbo.

5/29/2024

cs.AI cs.CL cs.LG

Exploring and Benchmarking the Planning Capabilities of Large Language Models

Bernd Bohnet, Azade Nova, Aaron T Parisi, Kevin Swersky, Katayoon Goshvadi, Hanjun Dai, Dale Schuurmans, Noah Fiedel, Hanie Sedghi

We seek to elevate the planning capabilities of Large Language Models (LLMs)investigating four main directions. First, we construct a comprehensive benchmark suite encompassing both classical planning domains and natural language scenarios. This suite includes algorithms to generate instances with varying levels of difficulty, allowing for rigorous and systematic evaluation of LLM performance. Second, we investigate the use of in-context learning (ICL) to enhance LLM planning, exploring the direct relationship between increased context length and improved planning performance. Third, we demonstrate the positive impact of fine-tuning LLMs on optimal planning paths, as well as the effectiveness of incorporating model-driven search procedures. Finally, we investigate the performance of the proposed methods in out-of-distribution scenarios, assessing the ability to generalize to novel and unseen planning challenges.

6/21/2024

cs.CL cs.AI cs.LG