MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model

2403.18760

Published 4/3/2024 by Yike Wu, Jiatao Zhang, Nan Hu, LanLing Tang, Guilin Qi, Jun Shao, Jie Ren, Wei Song

MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model

Abstract

In the realm of data-driven AI technology, the application of open-source large language models (LLMs) in robotic task planning represents a significant milestone. Recent robotic task planning methods based on open-source LLMs typically leverage vast task planning datasets to enhance models' planning abilities. While these methods show promise, they struggle with complex long-horizon tasks, which require comprehending more context and generating longer action sequences. This paper addresses this limitation by proposing MLDT, theMulti-Level Decomposition Task planning method. This method innovatively decomposes tasks at the goal-level, task-level, and action-level to mitigate the challenge of complex long-horizon tasks. In order to enhance open-source LLMs' planning abilities, we introduce a goal-sensitive corpus generation method to create high-quality training data and conduct instruction tuning on the generated corpus. Since the complexity of the existing datasets is not high enough, we construct a more challenging dataset, LongTasks, to specifically evaluate planning ability on complex long-horizon tasks. We evaluate our method using various LLMs on four datasets in VirtualHome. Our results demonstrate a significant performance enhancement in robotic task planning, showcasing MLDT's effectiveness in overcoming the limitations of existing methods based on open-source LLMs as well as its practicality in complex, real-world scenarios.

Create account to get full access

Overview

This paper presents a novel approach called MLDT (Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning) that leverages an open-source large language model to tackle complex long-term robotic tasks.
The key innovation is a multi-level decomposition strategy that breaks down high-level tasks into more manageable sub-tasks, allowing the language model to plan and execute complex sequences of actions.
The method is evaluated on challenging robotic manipulation scenarios, demonstrating its ability to solve long-horizon tasks that traditional planning approaches struggle with.

Plain English Explanation

Imagine you have a robot that needs to perform a complex task, like cleaning your entire house. This would involve many individual steps, such as picking up toys, dusting shelves, mopping floors, and so on. Typically, programming a robot to do all of this would be extremely challenging, as the robot would need to be told precisely what to do at every step.

The researchers in this paper have developed a new approach that makes it easier to program robots for these kinds of complex, long-term tasks. Their key insight is to break down the high-level task (cleaning the house) into smaller, more manageable sub-tasks (e.g., pick up toys, dust shelves, mop floors). By doing this, they can leverage the capabilities of a powerful language model to understand the overall task and plan out the sequence of actions needed to complete it.

This multi-level decomposition strategy allows the robot to tackle complex tasks in a more structured and organized way, without getting overwhelmed by the details. The language model can understand the high-level goal, reason about the necessary steps, and then execute those steps in the right order. This makes it much easier to program the robot to handle long-term, multi-step tasks.

Technical Explanation

The researchers propose a novel framework called MLDT (Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning) that leverages an open-source large language model for complex task planning. The key innovation is a multi-level decomposition strategy that breaks down high-level tasks into more manageable sub-tasks.

At the highest level, the language model is used to understand the overall task and generate a plan of action, which is then decomposed into a hierarchy of sub-tasks. These sub-tasks are then executed by the robot, with the language model providing guidance and feedback at each step to ensure the plan is executed correctly.

The researchers evaluate MLDT on several challenging robotic manipulation scenarios, such as household chores and object assembly tasks. The results demonstrate that MLDT outperforms traditional planning approaches, particularly on long-horizon tasks that require reasoning about complex sequences of actions.

Critical Analysis

The paper presents a promising approach to addressing the challenge of programming robots for complex, long-term tasks. By leveraging the power of large language models, the MLDT framework can tackle problems that were previously difficult for traditional planning methods.

However, the paper does not explore some potential limitations or areas for further research. For example, the performance of the language model may be sensitive to the quality and diversity of the training data, which could limit its generalization to novel scenarios. Additionally, the multi-level decomposition strategy relies on the language model's ability to accurately break down tasks and generate sub-plans, which could be error-prone in some cases.

Further research could also investigate how MLDT might be combined with other robotic planning techniques, such as reinforcement learning or hierarchical task networks, to potentially improve its robustness and flexibility. Exploring the computational and memory requirements of the approach, as well as its scalability to even more complex tasks, would also be valuable areas for future work.

Conclusion

The MLDT framework presented in this paper represents a significant advancement in the field of robotic task planning, leveraging the capabilities of large language models to tackle complex, long-horizon tasks. By breaking down high-level tasks into manageable sub-tasks, the approach allows robots to plan and execute complex sequences of actions with greater ease and flexibility.

The successful evaluation of MLDT on challenging robotic manipulation scenarios suggests that this approach could have important implications for a wide range of real-world applications, from household chores to industrial automation. As language models continue to improve and become more widely accessible, the potential for using these tools to enhance robotic capabilities is likely to grow, opening up new avenues for research and development in this rapidly evolving field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

DELTA: Decomposed Efficient Long-Term Robot Task Planning using Large Language Models

Yuchen Liu, Luigi Palmieri, Sebastian Koch, Ilche Georgievski, Marco Aiello

Recent advancements in Large Language Models (LLMs) have sparked a revolution across various research fields. In particular, the integration of common-sense knowledge from LLMs into robot task and motion planning has been proven to be a game-changer, elevating performance in terms of explainability and downstream task efficiency to unprecedented heights. However, managing the vast knowledge encapsulated within these large models has posed challenges, often resulting in infeasible plans generated by LLM-based planning systems due to hallucinations or missing domain information. To overcome these challenges and obtain even greater planning feasibility and computational efficiency, we propose a novel LLM-driven task planning approach called DELTA. For achieving better grounding from environmental topology into actionable knowledge, DELTA leverages the power of scene graphs as environment representations within LLMs, enabling the fast generation of precise planning problem descriptions. For obtaining higher planning performance, we use LLMs to decompose the long-term task goals into an autoregressive sequence of sub-goals for an automated task planner to solve. Our contribution enables a more efficient and fully automatic task planning pipeline, achieving higher planning success rates and significantly shorter planning times compared to the state of the art.

4/5/2024

cs.RO cs.AI

Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models

Yutao Ouyang, Jinhan Li, Yunfei Li, Zhongyu Li, Chao Yu, Koushil Sreenath, Yi Wu

We present a large language model (LLM) based system to empower quadrupedal robots with problem-solving abilities for long-horizon tasks beyond short-term motions. Long-horizon tasks for quadrupeds are challenging since they require both a high-level understanding of the semantics of the problem for task planning and a broad range of locomotion and manipulation skills to interact with the environment. Our system builds a high-level reasoning layer with large language models, which generates hybrid discrete-continuous plans as robot code from task descriptions. It comprises multiple LLM agents: a semantic planner for sketching a plan, a parameter calculator for predicting arguments in the plan, and a code generator to convert the plan into executable robot code. At the low level, we adopt reinforcement learning to train a set of motion planning and control skills to unleash the flexibility of quadrupeds for rich environment interactions. Our system is tested on long-horizon tasks that are infeasible to complete with one single skill. Simulation and real-world experiments show that it successfully figures out multi-step strategies and demonstrates non-trivial behaviors, including building tools or notifying a human for help.

4/9/2024

cs.RO

💬

Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

Murtaza Dalal, Tarun Chiruvolu, Devendra Chaplot, Ruslan Salakhutdinov

Large Language Models (LLMs) have been shown to be capable of performing high-level planning for long-horizon robotics tasks, yet existing methods require access to a pre-defined skill library (e.g. picking, placing, pulling, pushing, navigating). However, LLM planning does not address how to design or learn those behaviors, which remains challenging particularly in long-horizon settings. Furthermore, for many tasks of interest, the robot needs to be able to adjust its behavior in a fine-grained manner, requiring the agent to be capable of modifying low-level control actions. Can we instead use the internet-scale knowledge from LLMs for high-level policies, guiding reinforcement learning (RL) policies to efficiently solve robotic control tasks online without requiring a pre-determined set of skills? In this paper, we propose Plan-Seq-Learn (PSL): a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control for solving long-horizon robotics tasks from scratch. We demonstrate that PSL achieves state-of-the-art results on over 25 challenging robotics tasks with up to 10 stages. PSL solves long-horizon tasks from raw visual input spanning four benchmarks at success rates of over 85%, out-performing language-based, classical, and end-to-end approaches. Video results and code at https://mihdalal.github.io/planseqlearn/

5/3/2024

cs.LG cs.AI cs.CV cs.RO

LLM-State: Open World State Representation for Long-horizon Task Planning with Large Language Model

Siwei Chen, Anxing Xiao, David Hsu

This work addresses the problem of long-horizon task planning with the Large Language Model (LLM) in an open-world household environment. Existing works fail to explicitly track key objects and attributes, leading to erroneous decisions in long-horizon tasks, or rely on highly engineered state features and feedback, which is not generalizable. We propose an open state representation that provides continuous expansion and updating of object attributes from the LLM's inherent capabilities for context understanding and historical action reasoning. Our proposed representation maintains a comprehensive record of an object's attributes and changes, enabling robust retrospective summary of the sequence of actions leading to the current state. This allows continuously updating world model to enhance context understanding for decision-making in task planning. We validate our model through experiments across simulated and real-world task planning scenarios, demonstrating significant improvements over baseline methods in a variety of tasks requiring long-horizon state tracking and reasoning. (Videofootnote{Video demonstration: url{https://youtu.be/QkN-8pxV3Mo}.})

4/23/2024

cs.RO cs.AI