Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

2405.01534

Published 5/3/2024 by Murtaza Dalal, Tarun Chiruvolu, Devendra Chaplot, Ruslan Salakhutdinov

💬

Abstract

Large Language Models (LLMs) have been shown to be capable of performing high-level planning for long-horizon robotics tasks, yet existing methods require access to a pre-defined skill library (e.g. picking, placing, pulling, pushing, navigating). However, LLM planning does not address how to design or learn those behaviors, which remains challenging particularly in long-horizon settings. Furthermore, for many tasks of interest, the robot needs to be able to adjust its behavior in a fine-grained manner, requiring the agent to be capable of modifying low-level control actions. Can we instead use the internet-scale knowledge from LLMs for high-level policies, guiding reinforcement learning (RL) policies to efficiently solve robotic control tasks online without requiring a pre-determined set of skills? In this paper, we propose Plan-Seq-Learn (PSL): a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control for solving long-horizon robotics tasks from scratch. We demonstrate that PSL achieves state-of-the-art results on over 25 challenging robotics tasks with up to 10 stages. PSL solves long-horizon tasks from raw visual input spanning four benchmarks at success rates of over 85%, out-performing language-based, classical, and end-to-end approaches. Video results and code at https://mihdalal.github.io/planseqlearn/

Create account to get full access

Overview

Large language models (LLMs) have shown promise in performing high-level planning for long-horizon robotics tasks, but existing methods require access to a pre-defined skill library.
The paper proposes a new approach, called Plan-Seq-Learn (PSL), that uses motion planning to bridge the gap between abstract language and learned low-level control, allowing robots to solve long-horizon tasks from scratch.
PSL achieves state-of-the-art results on over 25 challenging robotics tasks, outperforming language-based, classical, and end-to-end approaches.

Plain English Explanation

The paper explores how we can use the vast knowledge of large language models (LLMs) to help robots solve complex, long-term tasks. Traditionally, robots have been programmed with a set of pre-defined skills, like picking, placing, and navigating. But this can be limiting, especially for tasks that require fine-grained control or a combination of many different skills.

The researchers propose a new approach called Plan-Seq-Learn (PSL). PSL uses the high-level planning capabilities of LLMs to guide a reinforcement learning (RL) agent, which can then learn the low-level control needed to execute the plan. This allows the robot to solve complex tasks from scratch, without needing a pre-determined set of skills.

The key innovation is the use of motion planning to bridge the gap between the abstract language of the LLM and the actual robot controls. This allows the robot to adjust its behavior in a fine-grained way, tailoring its actions to the specific task at hand.

The researchers show that PSL outperforms other approaches, including language-based, classical, and end-to-end methods, on a wide range of long-horizon robotics tasks. This suggests that combining the strengths of LLMs and RL could be a powerful way to enable robots to tackle increasingly complex real-world challenges.

Technical Explanation

The paper introduces Plan-Seq-Learn (PSL), a modular approach that uses motion planning to connect the high-level planning capabilities of large language models (LLMs) with learned low-level control policies.

The key components of PSL are:

Language Model: An LLM is used to generate high-level plans for solving the given task, drawing on its vast knowledge of the world.
Motion Planner: A motion planning module translates the LLM's abstract plan into a sequence of concrete actions that the robot can execute.
Reinforcement Learning (RL) Agent: An RL agent is trained to learn the low-level control policies needed to carry out the motion plan, allowing for fine-grained adjustment of the robot's behavior.

The researchers evaluate PSL on over 25 challenging long-horizon robotics tasks, including manipulation and navigation scenarios. They find that PSL achieves state-of-the-art results, outperforming language-based, classical, and end-to-end approaches with success rates over 85%.

This work demonstrates the potential of combining the strengths of LLMs and RL to enable robots to tackle complex, long-horizon tasks from scratch, without requiring a pre-defined set of skills. The use of motion planning to bridge the gap between abstract language and low-level control is a key innovation that allows for fine-grained control and adaptation to the task at hand.

Critical Analysis

The paper presents a promising approach to using LLMs for high-level planning in long-horizon robotics tasks. However, the authors acknowledge some limitations and areas for further research:

The current implementation of PSL still relies on some pre-defined skills, such as navigation and object manipulation, which could limit its generalization to completely novel tasks.
The motion planning module in PSL is based on a classical algorithm, which could be less efficient or flexible than learning the motion planning itself.
The paper focuses on simulation experiments, and the performance of PSL in real-world settings with additional uncertainties and constraints remains to be explored.

Additionally, one could question whether the reliance on motion planning is truly necessary, or if the RL agent could learn to connect the LLM's high-level plans directly to low-level controls. This could potentially simplify the overall architecture and reduce the need for hand-engineered components.

Despite these caveats, the paper makes a significant contribution by demonstrating the potential of combining LLMs and RL for solving complex robotics tasks. As the field of large language models as generalizable policies continues to evolve, this work highlights the importance of bridging the gap between abstract language and grounded, embodied control.

Conclusion

The paper introduces Plan-Seq-Learn (PSL), a modular approach that leverages the high-level planning capabilities of large language models (LLMs) to guide reinforcement learning (RL) agents in solving long-horizon robotics tasks. By using motion planning to bridge the gap between abstract language and low-level control, PSL outperforms language-based, classical, and end-to-end approaches on a wide range of challenging scenarios.

This work demonstrates the potential of combining LLMs and RL to enable robots to tackle complex, long-horizon tasks from scratch, without relying on a pre-determined set of skills. As the field of large language models as generalizable policies continues to advance, the insights from this paper could help pave the way for more versatile and capable robotic systems that can adapt to a wide range of real-world challenges.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models

Yutao Ouyang, Jinhan Li, Yunfei Li, Zhongyu Li, Chao Yu, Koushil Sreenath, Yi Wu

We present a large language model (LLM) based system to empower quadrupedal robots with problem-solving abilities for long-horizon tasks beyond short-term motions. Long-horizon tasks for quadrupeds are challenging since they require both a high-level understanding of the semantics of the problem for task planning and a broad range of locomotion and manipulation skills to interact with the environment. Our system builds a high-level reasoning layer with large language models, which generates hybrid discrete-continuous plans as robot code from task descriptions. It comprises multiple LLM agents: a semantic planner for sketching a plan, a parameter calculator for predicting arguments in the plan, and a code generator to convert the plan into executable robot code. At the low level, we adopt reinforcement learning to train a set of motion planning and control skills to unleash the flexibility of quadrupeds for rich environment interactions. Our system is tested on long-horizon tasks that are infeasible to complete with one single skill. Simulation and real-world experiments show that it successfully figures out multi-step strategies and demonstrates non-trivial behaviors, including building tools or notifying a human for help.

4/9/2024

cs.RO

💬

Large Language Models are Learnable Planners for Long-Term Recommendation

Wentao Shi, Xiangnan He, Yang Zhang, Chongming Gao, Xinyue Li, Jizhi Zhang, Qifan Wang, Fuli Feng

Planning for both immediate and long-term benefits becomes increasingly important in recommendation. Existing methods apply Reinforcement Learning (RL) to learn planning capacity by maximizing cumulative reward for long-term recommendation. However, the scarcity of recommendation data presents challenges such as instability and susceptibility to overfitting when training RL models from scratch, resulting in sub-optimal performance. In this light, we propose to leverage the remarkable planning capabilities over sparse data of Large Language Models (LLMs) for long-term recommendation. The key to achieving the target lies in formulating a guidance plan following principles of enhancing long-term engagement and grounding the plan to effective and executable actions in a personalized manner. To this end, we propose a Bi-level Learnable LLM Planner framework, which consists of a set of LLM instances and breaks down the learning process into macro-learning and micro-learning to learn macro-level guidance and micro-level personalized recommendation policies, respectively. Extensive experiments validate that the framework facilitates the planning ability of LLMs for long-term recommendation. Our code and data can be found at https://github.com/jizhi-zhang/BiLLP.

4/29/2024

cs.IR cs.AI cs.CL cs.LG

Language Models as Zero-Shot Trajectory Generators

Teyun Kwon, Norman Di Palo, Edward Johns

Large Language Models (LLMs) have recently shown promise as high-level planners for robots when given access to a selection of low-level skills. However, it is often assumed that LLMs do not possess sufficient knowledge to be used for the low-level trajectories themselves. In this work, we address this assumption thoroughly, and investigate if an LLM (GPT-4) can directly predict a dense sequence of end-effector poses for manipulation tasks, when given access to only object detection and segmentation vision models. We designed a single, task-agnostic prompt, without any in-context examples, motion primitives, or external trajectory optimisers. Then we studied how well it can perform across 30 real-world language-based tasks, such as open the bottle cap and wipe the plate with the sponge, and we investigated which design choices in this prompt are the most important. Our conclusions raise the assumed limit of LLMs for robotics, and we reveal for the first time that LLMs do indeed possess an understanding of low-level robot control sufficient for a range of common tasks, and that they can additionally detect failures and then re-plan trajectories accordingly. Videos, prompts, and code are available at: https://www.robot-learning.uk/language-models-trajectory-generators.

6/19/2024

cs.RO cs.AI cs.CL cs.HC cs.LG

🔮

Trust the PRoC3S: Solving Long-Horizon Robotics Problems with LLMs and Constraint Satisfaction

Aidan Curtis, Nishanth Kumar, Jing Cao, Tom'as Lozano-P'erez, Leslie Pack Kaelbling

Recent developments in pretrained large language models (LLMs) applied to robotics have demonstrated their capacity for sequencing a set of discrete skills to achieve open-ended goals in simple robotic tasks. In this paper, we examine the topic of LLM planning for a set of continuously parameterized skills whose execution must avoid violations of a set of kinematic, geometric, and physical constraints. We prompt the LLM to output code for a function with open parameters, which, together with environmental constraints, can be viewed as a Continuous Constraint Satisfaction Problem (CCSP). This CCSP can be solved through sampling or optimization to find a skill sequence and continuous parameter settings that achieve the goal while avoiding constraint violations. Additionally, we consider cases where the LLM proposes unsatisfiable CCSPs, such as those that are kinematically infeasible, dynamically unstable, or lead to collisions, and re-prompt the LLM to form a new CCSP accordingly. Experiments across three different simulated 3D domains demonstrate that our proposed strategy, PRoC3S, is capable of solving a wide range of complex manipulation tasks with realistic constraints on continuous parameters much more efficiently and effectively than existing baselines.

6/11/2024

cs.RO cs.AI