Large Language Models Can Plan Your Travels Rigorously with Formal Verification Tools

2404.11891

Published 4/22/2024 by Yilun Hao, Yongchao Chen, Yang Zhang, Chuchu Fan

💬

Abstract

The recent advancements of Large Language Models (LLMs), with their abundant world knowledge and capabilities of tool-using and reasoning, fostered many LLM planning algorithms. However, LLMs have not shown to be able to accurately solve complex combinatorial optimization problems. In Xie et al. (2024), the authors proposed TravelPlanner, a U.S. domestic travel planning benchmark, and showed that LLMs themselves cannot make travel plans that satisfy user requirements with a best success rate of 0.6%. In this work, we propose a framework that enables LLMs to formally formulate and solve the travel planning problem as a satisfiability modulo theory (SMT) problem and use SMT solvers interactively and automatically solve the combinatorial search problem. The SMT solvers guarantee the satisfiable of input constraints and the LLMs can enable a language-based interaction with our framework. When the input constraints cannot be satisfiable, our LLM-based framework will interactively offer suggestions to users to modify their travel requirements via automatic reasoning using the SMT solvers. We evaluate our framework with TravelPlanner and achieve a success rate of 97%. We also create a separate dataset that contain international travel benchmarks and use both dataset to evaluate the effectiveness of our interactive planning framework when the initial user queries cannot be satisfied. Our framework could generate valid plans with an average success rate of 78.6% for our dataset and 85.0% for TravelPlanner according to diverse humans preferences.

Create account to get full access

Overview

Recent advancements in Large Language Models (LLMs) have led to the development of many LLM planning algorithms.
However, LLMs have struggled to accurately solve complex combinatorial optimization problems.
The paper proposes a framework that enables LLMs to formally formulate and solve the travel planning problem as a satisfiability modulo theory (SMT) problem.
The framework uses SMT solvers interactively and automatically to solve the combinatorial search problem, while allowing language-based interaction with LLMs.
The framework achieves a high success rate in generating valid travel plans that satisfy user requirements.

Plain English Explanation

Large language models (LLMs) like GPT-3 have become incredibly capable at understanding and generating human language. They have a vast knowledge of the world and can even perform reasoning and problem-solving tasks. However, when it comes to complex optimization problems, LLMs have struggled to find the best solutions.

The researchers in this paper tackled the problem of travel planning. They created a benchmark dataset called TravelPlanner, which tests an AI's ability to plan a domestic U.S. trip that meets a user's specific requirements. When the researchers tried using LLMs alone to solve this problem, the best success rate was only 60%.

To address this, the researchers developed a new framework that combines LLMs with something called "satisfiability modulo theory" (SMT) solvers. SMT solvers are specialized algorithms that can efficiently search through complex combinatorial problems and guarantee that the final solution satisfies all the given constraints.

In this framework, the LLM is used to interact with the user in natural language, understanding their travel requirements. The framework then translates these requirements into the formal language that the SMT solver can understand. The SMT solver then automatically searches for a valid travel plan that meets all the constraints. If no such plan can be found, the framework uses the SMT solver's reasoning to suggest ways the user could modify their requirements to make a plan possible.

This hybrid approach achieved a much higher success rate of 97% on the TravelPlanner benchmark. The researchers also tested it on a dataset of international travel planning problems, where it had an average success rate of 78.6%. The framework's ability to combine the language understanding of LLMs with the rigorous problem-solving of SMT solvers allows it to tackle complex, real-world planning challenges effectively.

Technical Explanation

The paper presents a framework that enables large language models (LLMs) to formally formulate and solve the travel planning problem as a satisfiability modulo theory (SMT) problem. The framework uses SMT solvers interactively and automatically to solve the combinatorial search problem, while allowing language-based interaction with LLMs.

The authors first introduced a new benchmark dataset called TravelPlanner, which tests an AI's ability to plan a domestic U.S. trip that meets specific user requirements. They found that LLMs alone could only achieve a best success rate of 60% on this task.

To address this, the researchers developed a framework that combines LLMs and SMT solvers. The LLM is used to understand the user's natural language travel requirements. These requirements are then translated into a formal SMT problem representation. The SMT solver is then used to automatically search for a valid travel plan that satisfies all the constraints.

If no valid plan can be found, the framework uses the SMT solver's reasoning to provide suggestions to the user on how they could modify their travel requirements to make a plan possible. This interactive approach allows the framework to achieve a much higher success rate of 97% on the TravelPlanner benchmark.

The researchers also evaluated their framework on a separate dataset of international travel planning problems. On this more challenging dataset, the framework maintained an average success rate of 78.6% in generating valid travel plans that satisfied diverse user preferences.

Critical Analysis

The paper presents a novel and effective approach to combining the language understanding capabilities of LLMs with the rigorous problem-solving abilities of SMT solvers. This hybrid framework is able to tackle complex, real-world travel planning challenges in a way that LLMs alone could not.

One limitation of the approach is that it requires translating the natural language travel requirements into a formal SMT problem representation. This translation process may not always be straightforward, and could introduce errors or lose important nuances in the user's intent. The paper does not provide a detailed discussion of how this translation is performed.

Additionally, the success rates, while impressive, are not 100%. There may still be cases where the framework is unable to find a valid travel plan or provide helpful suggestions to the user. The paper does not explore the types of travel requirements or constraints that are most challenging for the framework to handle.

Further research could investigate ways to tighten the integration between the LLM and SMT solver components, perhaps by having the LLM more directly participate in the formal reasoning process. Exploring the use of graph-based reasoning within the framework could also be a fruitful direction.

Overall, this paper demonstrates an important step forward in the quest to develop AI systems that can effectively tackle complex, real-world planning and optimization problems. The authors' approach of leveraging the complementary strengths of LLMs and SMT solvers is a promising direction for the field.

Conclusion

This paper presents a framework that enables large language models (LLMs) to formally formulate and solve the travel planning problem as a satisfiability modulo theory (SMT) problem. By combining the language understanding capabilities of LLMs with the rigorous problem-solving abilities of SMT solvers, the framework is able to generate valid travel plans that satisfy user requirements with a high success rate.

The key innovation is the use of SMT solvers to automatically search for solutions that meet all the constraints, while allowing the LLM to handle the natural language interaction with users. This hybrid approach outperforms using LLMs alone, which struggle with complex combinatorial optimization problems.

The framework's effectiveness is demonstrated on both a domestic U.S. travel planning benchmark as well as a more challenging international travel dataset. The ability to interactively provide suggestions when no valid plan can be found is also a notable capability.

Overall, this research represents an important step forward in developing AI systems that can tackle real-world planning and optimization challenges. By combining the complementary strengths of different AI techniques, the authors have created a framework that shows promise for a wide range of applications beyond just travel planning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning

Atharva Gundawar, Mudit Verma, Lin Guan, Karthik Valmeekam, Siddhant Bhambri, Subbarao Kambhampati

As the applicability of Large Language Models (LLMs) extends beyond traditional text processing tasks, there is a burgeoning interest in their potential to excel in planning and reasoning assignments, realms traditionally reserved for System 2 cognitive competencies. Despite their perceived versatility, the research community is still unraveling effective strategies to harness these models in such complex domains. The recent discourse introduced by the paper on LLM Modulo marks a significant stride, proposing a conceptual framework that enhances the integration of LLMs into diverse planning and reasoning activities. This workshop paper delves into the practical application of this framework within the domain of travel planning, presenting a specific instance of its implementation. We are using the Travel Planning benchmark by the OSU NLP group, a benchmark for evaluating the performance of LLMs in producing valid itineraries based on user queries presented in natural language. While popular methods of enhancing the reasoning abilities of LLMs such as Chain of Thought, ReAct, and Reflexion achieve a meager 0%, 0.6%, and 0% with GPT3.5-Turbo respectively, our operationalization of the LLM-Modulo framework for TravelPlanning domain provides a remarkable improvement, enhancing baseline performances by 4.6x for GPT4-Turbo and even more for older models like GPT3.5-Turbo from 0% to 5%. Furthermore, we highlight the other useful roles of LLMs in the planning pipeline, as suggested in LLM-Modulo, which can be reliably operationalized such as extraction of useful critics and reformulator for critics.

6/3/2024

cs.AI

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, Anil Murthy

There is considerable confusion about the role of Large Language Models (LLMs) in planning and reasoning tasks. On one side are over-optimistic claims that LLMs can indeed do these tasks with just the right prompting or self-verification strategies. On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers. In this position paper, we take the view that both these extremes are misguided. We argue that auto-regressive LLMs cannot, by themselves, do planning or self-verification (which is after all a form of reasoning), and shed some light on the reasons for misunderstandings in the literature. We will also argue that LLMs should be viewed as universal approximate knowledge sources that have much more meaningful roles to play in planning/reasoning tasks beyond simple front-end/back-end format translators. We present a vision of {bf LLM-Modulo Frameworks} that combine the strengths of LLMs with external model-based verifiers in a tighter bi-directional interaction regime. We will show how the models driving the external verifiers themselves can be acquired with the help of LLMs. We will also argue that rather than simply pipelining LLMs and symbolic components, this LLM-Modulo Framework provides a better neuro-symbolic approach that offers tighter integration between LLMs and symbolic components, and allows extending the scope of model-based planning/reasoning regimes towards more flexible knowledge, problem and preference specifications.

6/13/2024

cs.AI cs.LG

💬

TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners

Tomas de la Rosa, Sriram Gopalakrishnan, Alberto Pozanco, Zhen Zeng, Daniel Borrajo

Travel planning is a complex task that involves generating a sequence of actions related to visiting places subject to constraints and maximizing some user satisfaction criteria. Traditional approaches rely on problem formulation in a given formal language, extracting relevant travel information from web sources, and use an adequate problem solver to generate a valid solution. As an alternative, recent Large Language Model (LLM) based approaches directly output plans from user requests using language. Although LLMs possess extensive travel domain knowledge and provide high-level information like points of interest and potential routes, current state-of-the-art models often generate plans that lack coherence, fail to satisfy constraints fully, and do not guarantee the generation of high-quality solutions. We propose TRIP-PAL, a hybrid method that combines the strengths of LLMs and automated planners, where (i) LLMs get and translate travel information and user information into data structures that can be fed into planners; and (ii) automated planners generate travel plans that guarantee constraint satisfaction and optimize for users' utility. Our experiments across various travel scenarios show that TRIP-PAL outperforms an LLM when generating travel plans.

6/17/2024

cs.AI

💬

Large Language Models as Planning Domain Generators

James Oswald, Kavitha Srinivas, Harsha Kokel, Junkyu Lee, Michael Katz, Shirin Sohrabi

Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.

5/14/2024

cs.CL cs.AI