A Framework for Neurosymbolic Robot Action Planning using Large Language Models

2303.00438

Published 6/5/2024 by Alessio Capitanelli, Fulvio Mastrogiovanni

💬

Abstract

Symbolic task planning is a widely used approach to enforce robot autonomy due to its ease of understanding and deployment in robot architectures. However, techniques for symbolic task planning are difficult to scale in real-world, human-robot collaboration scenarios because of the poor performance in complex planning domains or when frequent re-planning is needed. We present a framework, Teriyaki, specifically aimed at bridging the gap between symbolic task planning and machine learning approaches. The rationale is training Large Language Models (LLMs), namely GPT-3, into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL), and then leveraging its generative capabilities to overcome a number of limitations inherent to symbolic task planners. Potential benefits include (i) a better scalability in so far as the planning domain complexity increases, since LLMs' response time linearly scales with the combined length of the input and the output, and (ii) the ability to synthesize a plan action-by-action instead of end-to-end, making each action available for execution as soon as it is generated instead of waiting for the whole plan to be available, which in turn enables concurrent planning and execution. Recently, significant efforts have been devoted by the research community to evaluate the cognitive capabilities of LLMs, with alternate successes. Instead, with Teriyaki we aim to provide an overall planning performance comparable to traditional planners in specific planning domains, while leveraging LLMs capabilities to build a look-ahead predictive planning model. Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%

Create account to get full access

Overview

Symbolic task planning is a widely used approach to enforce robot autonomy, but it struggles to scale in real-world, human-robot collaboration scenarios due to poor performance in complex planning domains or when frequent re-planning is needed.
The authors present a framework called Teriyaki, which aims to bridge the gap between symbolic task planning and machine learning approaches by training a Large Language Model (LLM), namely GPT-3, into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL).
Potential benefits of this approach include better scalability as planning domain complexity increases, and the ability to synthesize a plan action-by-action instead of end-to-end, enabling concurrent planning and execution.

Plain English Explanation

The paper discusses a new approach to robot task planning called Teriyaki. Traditional symbolic task planning, where robots follow a pre-defined set of rules to plan their actions, can be hard to scale to complex real-world scenarios, especially when the robot needs to frequently re-plan its actions.

The researchers behind Teriyaki have developed a way to combine symbolic task planning with machine learning, using a Large Language Model (LLM) like GPT-3. The LLM is trained to become a "neurosymbolic task planner" that can understand and generate plans in the standard Planning Domain Definition Language (PDDL) format.

This hybrid approach has some key advantages. First, the LLM-based planner can scale better as the planning problem becomes more complex, since the LLM's response time grows linearly with the input and output size. Second, the LLM can generate plans action-by-action, rather than all at once, allowing the robot to start executing parts of the plan while the rest is still being planned. This enables more efficient concurrent planning and execution.

The researchers have tested this Teriyaki framework on various planning domains and found that it can solve 95.5% of test problems, produce plans up to 13.5% shorter than traditional planners, and reduce the average waiting time for a plan by up to 61.4%.

Technical Explanation

The Teriyaki framework aims to address the limitations of traditional symbolic task planners by leveraging the capabilities of Large Language Models (LLMs). The authors train a GPT-3 model to become a "neurosymbolic task planner" that can understand and generate plans in the Planning Domain Definition Language (PDDL) format.

Key elements of the Teriyaki approach include:

PDDL Compatibility: The LLM is trained to accept PDDL-formatted inputs describing the planning domain and problem, and to output PDDL-compliant plans.
Action-by-Action Planning: Instead of generating entire plans at once, the LLM can synthesize plans action-by-action, allowing the robot to start executing parts of the plan while the rest is still being generated.
Scalability: The authors argue that the LLM-based planner can scale better than traditional symbolic planners as the planning domain complexity increases, since the LLM's response time scales linearly with the input and output size.

In their experiments, the researchers tested the Teriyaki framework on various planning domains and found that it could solve 95.5% of test problems, produce plans up to 13.5% shorter than a traditional symbolic planner, and reduce the average waiting time for a plan by up to 61.4%.

Critical Analysis

The Teriyaki framework presents a promising approach to combining the strengths of symbolic task planning and machine learning, but it also has some potential limitations and areas for further research:

Explainability: While the LLM-based planner may offer improved performance, the "black box" nature of large language models can make it difficult to understand and explain the reasoning behind the generated plans. Maintaining transparency and interpretability in the planning process is an important consideration for real-world applications.
Robustness: The paper does not extensively discuss the robustness of the Teriyaki framework to changes in the planning domain or to unexpected situations that may arise during execution. Further research is needed to assess the framework's ability to handle these types of challenges.
Deployment Challenges: Integrating an LLM-based planner into a real-world robot architecture may pose technical and computational challenges, such as managing the model's resource requirements and ensuring reliable performance in dynamic environments.

Overall, the Teriyaki framework represents an interesting and potentially valuable contribution to the field of robot task planning. However, as with any emerging technology, further research and validation will be necessary to fully understand its capabilities, limitations, and potential real-world applications.

Conclusion

The Teriyaki framework presented in this paper offers a novel approach to bridging the gap between symbolic task planning and machine learning, with the potential to improve the scalability and flexibility of robot task planning in complex, real-world scenarios. By training a Large Language Model to become a neurosymbolic task planner, the researchers have demonstrated promising results in terms of planning performance, plan quality, and execution efficiency.

While the framework has some limitations and areas for further research, the core ideas behind Teriyaki represent an important step forward in the integration of symbolic and machine learning techniques for robot autonomy. As the field of robotics continues to evolve, approaches like Teriyaki may play a significant role in enabling robots to operate more seamlessly and effectively in dynamic, human-centric environments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

Action Contextualization: Adaptive Task Planning and Action Tuning using Large Language Models

Sthithpragya Gupta, Kunpeng Yao, Loic Niederhauser, Aude Billard

Large Language Models (LLMs) present a promising frontier in robotic task planning by leveraging extensive human knowledge. Nevertheless, the current literature often overlooks the critical aspects of adaptability and error correction within robotic systems. This work aims to overcome this limitation by enabling robots to modify their motion strategies and select the most suitable task plans based on the context. We introduce a novel framework termed action contextualization, aimed at tailoring robot actions to the precise requirements of specific tasks, thereby enhancing adaptability through applying LLM-derived contextual insights. Our proposed motion metrics guarantee the feasibility and efficiency of adjusted motions, which evaluate robot performance and eliminate planning redundancies. Moreover, our framework supports online feedback between the robot and the LLM, enabling immediate modifications to the task plans and corrections of errors. Our framework has achieved an overall success rate of 81.25% through extensive validation. Finally, integrated with dynamic system (DS)-based robot controllers, the robotic arm-hand system demonstrates its proficiency in autonomously executing LLM-generated motion plans for sequential table-clearing tasks, rectifying errors without human intervention, and completing tasks, showcasing robustness against external disturbances. Our proposed framework features the potential to be integrated with modular control approaches, significantly enhancing robots' adaptability and autonomy in sequential task execution.

4/23/2024

cs.RO

💬

Large Language Models as Planning Domain Generators

James Oswald, Kavitha Srinivas, Harsha Kokel, Junkyu Lee, Michael Katz, Shirin Sohrabi

Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.

5/14/2024

cs.CL cs.AI

💬

Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

Murtaza Dalal, Tarun Chiruvolu, Devendra Chaplot, Ruslan Salakhutdinov

Large Language Models (LLMs) have been shown to be capable of performing high-level planning for long-horizon robotics tasks, yet existing methods require access to a pre-defined skill library (e.g. picking, placing, pulling, pushing, navigating). However, LLM planning does not address how to design or learn those behaviors, which remains challenging particularly in long-horizon settings. Furthermore, for many tasks of interest, the robot needs to be able to adjust its behavior in a fine-grained manner, requiring the agent to be capable of modifying low-level control actions. Can we instead use the internet-scale knowledge from LLMs for high-level policies, guiding reinforcement learning (RL) policies to efficiently solve robotic control tasks online without requiring a pre-determined set of skills? In this paper, we propose Plan-Seq-Learn (PSL): a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control for solving long-horizon robotics tasks from scratch. We demonstrate that PSL achieves state-of-the-art results on over 25 challenging robotics tasks with up to 10 stages. PSL solves long-horizon tasks from raw visual input spanning four benchmarks at success rates of over 85%, out-performing language-based, classical, and end-to-end approaches. Video results and code at https://mihdalal.github.io/planseqlearn/

5/3/2024

cs.LG cs.AI cs.CV cs.RO

LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots

Ruoyu Wang, Zhipeng Yang, Zinan Zhao, Xinyan Tong, Zhi Hong, Kun Qian

The development of a general purpose service robot for daily life necessitates the robot's ability to deploy a myriad of fundamental behaviors judiciously. Recent advancements in training Large Language Models (LLMs) can be used to generate action sequences directly, given an instruction in natural language with no additional domain information. However, while the outputs of LLMs are semantically correct, the generated task plans may not accurately map to acceptable actions and might encompass various linguistic ambiguities. LLM hallucinations pose another challenge for robot task planning, which results in content that is inconsistent with real-world facts or user inputs. In this paper, we propose a task planning method based on a constrained LLM prompt scheme, which can generate an executable action sequence from a command. An exceptional handling module is further proposed to deal with LLM hallucinations problem. This module can ensure the LLM-generated results are admissible in the current environment. We evaluate our method on the commands generated by the RoboCup@Home Command Generator, observing that the robot demonstrates exceptional performance in both comprehending instructions and executing tasks.

5/27/2024

cs.RO