Learning to Learn Faster from Human Feedback with Language Model Predictive Control

2402.11450

Published 6/3/2024 by Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng, Montserrat Gonzalez Arenas, Maria Attarian, Maria Bauza, Matthew Bennice, Alex Bewley, Adil Dostmohamed and 40 others

cs.RO

💬

Abstract

Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for only as long as it fits within the context size of the LLM, and can be forgotten over longer interactions. In this work, we investigate fine-tuning the robot code-writing LLMs, to remember their in-context interactions and improve their teachability i.e., how efficiently they adapt to human inputs (measured by average number of corrections before the user considers the task successful). Our key observation is that when human-robot interactions are viewed as a partially observable Markov decision process (in which human language inputs are observations, and robot code outputs are actions), then training an LLM to complete previous interactions is training a transition dynamics model -- that can be combined with classic robotics techniques such as model predictive control (MPC) to discover shorter paths to success. This gives rise to Language Model Predictive Control (LMPC), a framework that fine-tunes PaLM 2 to improve its teachability on 78 tasks across 5 robot embodiments -- improving non-expert teaching success rates of unseen tasks by 26.9% while reducing the average number of human corrections from 2.4 to 1.9. Experiments show that LMPC also produces strong meta-learners, improving the success rate of in-context learning new tasks on unseen robot embodiments and APIs by 31.5%. See videos, code, and demos at: https://robot-teaching.github.io/.

Create account to get full access

Overview

Large language models (LLMs) have shown the ability to write robot code from language commands, enabling non-experts to direct robot behaviors, modify them, and compose new tasks.
However, these capabilities are limited to short-term interactions, as the LLMs can forget past interactions over longer periods.
This work investigates fine-tuning robot code-writing LLMs to remember past interactions and improve their "teachability" - how efficiently they adapt to human inputs.

Plain English Explanation

Large language models (LLMs) are powerful AI systems that can understand and generate human-like text. Researchers have found that LLMs can also write code to control robots, allowing non-experts to instruct the robots, make changes, and combine different tasks. This is related to the concept of policy improvement using language feedback models.

However, these LLM-powered robot control capabilities are limited to short interactions. The LLMs may forget what happened in previous interactions over time, making it difficult for humans to effectively teach and refine the robot's behavior. This is similar to the challenge of enabling large language models to provide adaptive, incremental learning of humanoid robot behavior from natural language.

The researchers in this study investigated a way to fine-tune the LLMs to remember past interactions and become more "teachable" - meaning the humans can more easily guide the robots to do what they want. This relates to the idea of language models enabling automated formative feedback.

Technical Explanation

The key insight is that human-robot interactions can be viewed as a partially observable Markov decision process, where the human language inputs are observations and the robot code outputs are actions. By training the LLM to complete previous interactions, the model is effectively learning a transition dynamics model. This can then be combined with classic robotics techniques like model predictive control (MPC) to help the LLM find more efficient paths to success in new tasks.

The researchers developed a framework called Language Model Predictive Control (LMPC) that fine-tunes the PaLM 2 LLM to improve its teachability on 78 tasks across 5 different robot embodiments. This relates to the concept of action contextualization for adaptive task planning and action tuning. The results show that LMPC improves non-expert teaching success rates on unseen tasks by 26.9%, while reducing the average number of human corrections from 2.4 to 1.9. The LMPC approach also produces strong meta-learners, improving the success rate of in-context learning new tasks on unseen robot embodiments and APIs by 31.5%. This is similar to the idea of a self-corrected multimodal large language model for end-to-end robotics.

Critical Analysis

The paper provides a compelling approach to improving the teachability of LLMs for robot control, but there are a few potential limitations and areas for further research:

The experiments were limited to 78 tasks across 5 robot embodiments, so the generalization to a wider range of tasks and robots is still an open question.
The paper does not discuss how the LMPC framework would scale to more complex robot behaviors or longer-term interactions, where the memory requirements may become more challenging.
It is unclear how the LMPC approach would perform in safety-critical applications, where the robot's actions must be highly reliable and predictable.

Overall, the research represents an interesting step forward in enabling non-experts to more effectively teach and refine robot behaviors using language-based interfaces. Further exploration of the limitations and real-world applicability of this approach would be valuable for advancing the field of human-robot interaction.

Conclusion

This study investigates a novel approach called Language Model Predictive Control (LMPC) that fine-tunes large language models to remember past interactions and become more "teachable" by human users. By treating human-robot interactions as a partially observable Markov decision process, the researchers were able to train the LLMs to learn transition dynamics models that can be combined with robotics techniques like model predictive control.

The results show that LMPC can significantly improve the success rates of non-experts teaching new robot tasks, while reducing the average number of corrections needed. The approach also produces strong meta-learners that can adapt to new robot embodiments and APIs. This work represents an important step towards enabling more natural and effective human-robot collaboration powered by advanced language models.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Policy Improvement using Language Feedback Models

Victor Zhong, Dipendra Misra, Xingdi Yuan, Marc-Alexandre C^ot'e

We introduce Language Feedback Models (LFMs) that identify desirable behaviour - actions that help achieve tasks specified in the instruction - for imitation learning in instruction following. To train LFMs, we obtain feedback from Large Language Models (LLMs) on visual trajectories verbalized to language descriptions. First, by using LFMs to identify desirable behaviour to imitate, we improve in task-completion rate over strong behavioural cloning baselines on three distinct language grounding environments (Touchdown, ScienceWorld, and ALFWorld). Second, LFMs outperform using LLMs as experts to directly predict actions, when controlling for the number of LLM output tokens. Third, LFMs generalize to unseen environments, improving task-completion rate by 3.5-12.0% through one round of adaptation. Finally, LFM can be modified to provide human-interpretable feedback without performance loss, allowing human verification of desirable behaviour for imitation learning.

4/22/2024

cs.LG cs.AI cs.CL

📈

Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration

Haokun Liu, Yaonan Zhu, Kenji Kato, Atsushi Tsukahara, Izumi Kondo, Tadayoshi Aoyama, Yasuhisa Hasegawa

Large Language Models (LLMs) are gaining popularity in the field of robotics. However, LLM-based robots are limited to simple, repetitive motions due to the poor integration between language models, robots, and the environment. This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC). The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot. The system also employs a YOLO-based perception algorithm, providing visual cues to the LLM, which aids in planning feasible motions within the specific environment. Additionally, an HRC method is proposed by combining teleoperation and Dynamic Movement Primitives (DMP), allowing the LLM-based robot to learn from human guidance. Real-world experiments have been conducted using the Toyota Human Support Robot for manipulation tasks. The outcomes indicate that tasks requiring complex trajectory planning and reasoning over environments can be efficiently accomplished through the incorporation of human demonstrations.

6/21/2024

cs.RO cs.AI cs.HC

💬

Action Contextualization: Adaptive Task Planning and Action Tuning using Large Language Models

Sthithpragya Gupta, Kunpeng Yao, Loic Niederhauser, Aude Billard

Large Language Models (LLMs) present a promising frontier in robotic task planning by leveraging extensive human knowledge. Nevertheless, the current literature often overlooks the critical aspects of adaptability and error correction within robotic systems. This work aims to overcome this limitation by enabling robots to modify their motion strategies and select the most suitable task plans based on the context. We introduce a novel framework termed action contextualization, aimed at tailoring robot actions to the precise requirements of specific tasks, thereby enhancing adaptability through applying LLM-derived contextual insights. Our proposed motion metrics guarantee the feasibility and efficiency of adjusted motions, which evaluate robot performance and eliminate planning redundancies. Moreover, our framework supports online feedback between the robot and the LLM, enabling immediate modifications to the task plans and corrections of errors. Our framework has achieved an overall success rate of 81.25% through extensive validation. Finally, integrated with dynamic system (DS)-based robot controllers, the robotic arm-hand system demonstrates its proficiency in autonomously executing LLM-generated motion plans for sequential table-clearing tasks, rectifying errors without human intervention, and completing tasks, showcasing robustness against external disturbances. Our proposed framework features the potential to be integrated with modular control approaches, significantly enhancing robots' adaptability and autonomy in sequential task execution.

4/23/2024

cs.RO

Large Language Models Enable Automated Formative Feedback in Human-Robot Interaction Tasks

Emily Jensen, Sriram Sankaranarayanan, Bradley Hayes

We claim that LLMs can be paired with formal analysis methods to provide accessible, relevant feedback for HRI tasks. While logic specifications are useful for defining and assessing a task, these representations are not easily interpreted by non-experts. Luckily, LLMs are adept at generating easy-to-understand text that explains difficult concepts. By integrating task assessment outcomes and other contextual information into an LLM prompt, we can effectively synthesize a useful set of recommendations for the learner to improve their performance.

5/28/2024

cs.RO