Programmatic Imitation Learning from Unlabeled and Noisy Demonstrations

2303.01440

Published 4/8/2024 by Jimmy Xin, Linus Zheng, Kia Rahmani, Jiayi Wei, Jarrett Holtz, Isil Dillig, Joydeep Biswas

🛠️

Abstract

Imitation Learning (IL) is a promising paradigm for teaching robots to perform novel tasks using demonstrations. Most existing approaches for IL utilize neural networks (NN), however, these methods suffer from several well-known limitations: they 1) require large amounts of training data, 2) are hard to interpret, and 3) are hard to repair and adapt. There is an emerging interest in programmatic imitation learning (PIL), which offers significant promise in addressing the above limitations. In PIL, the learned policy is represented in a programming language, making it amenable to interpretation and repair. However, state-of-the-art PIL algorithms assume access to action labels and struggle to learn from noisy real-world demonstrations. In this paper, we propose PLUNDER, a novel PIL algorithm that integrates a probabilistic program synthesizer in an iterative Expectation-Maximization (EM) framework to address these shortcomings. Unlike existing PIL approaches, PLUNDER synthesizes probabilistic programmatic policies that are particularly well-suited for modeling the uncertainties inherent in real-world demonstrations. Our approach leverages an EM loop to simultaneously infer the missing action labels and the most likely probabilistic policy. We benchmark PLUNDER against several established IL techniques, and demonstrate its superiority across five challenging imitation learning tasks under noise. PLUNDER policies achieve 95% accuracy in matching the given demonstrations, outperforming the next best baseline by 19%. Additionally, policies generated by PLUNDER successfully complete the tasks 17% more frequently than the nearest baseline.

Create account to get full access

Overview

Imitation Learning (IL) is a promising way to teach robots new tasks by having them learn from demonstrations.
Many existing IL approaches use neural networks, which have some limitations:
1. They require large amounts of training data.
2. They are hard to interpret.
3. They are hard to repair and adapt.
There is growing interest in programmatic imitation learning (PIL), which can address these limitations.
In PIL, the learned policy is represented in a programming language, making it easier to understand and modify.
However, current PIL algorithms assume they have access to action labels and struggle to learn from noisy real-world demonstrations.

Plain English Explanation

Imitation Learning (IL) is a way for robots to learn new skills by watching demonstrations. Most current IL methods use neural networks, which are good at learning complex tasks. But neural networks have some problems - they need a lot of training data, it's hard to understand how they work, and it's difficult to change or fix them once they're trained.

An alternative approach called programmatic imitation learning (PIL) tries to address these issues. In PIL, the robot learns a program that represents the skill, rather than a neural network. This makes the learned skill easier to understand and modify.

However, existing PIL methods assume they know exactly what actions the demonstrator is taking. This can be a problem when learning from real-world demonstrations, which often have some noise or uncertainty. The paper introduces a new PIL algorithm called PLUNDER that can deal with this uncertainty.

Technical Explanation

The paper proposes PLUNDER, a novel programmatic imitation learning algorithm that can learn skills from noisy real-world demonstrations. Unlike previous PIL approaches, PLUNDER does not require access to the exact action labels performed by the demonstrator.

PLUNDER works by integrating a probabilistic program synthesizer into an iterative Expectation-Maximization (EM) framework. This allows the algorithm to simultaneously infer the missing action labels and learn the most likely programmatic policy to match the demonstrations. The resulting policies are probabilistic, which helps them capture the inherent uncertainties present in real-world data.

The paper benchmarks PLUNDER against several established imitation learning techniques on five challenging tasks involving noisy demonstrations. PLUNDER policies are able to match the given demonstrations with 95% accuracy, outperforming the next best baseline by 19%. Additionally, the PLUNDER policies are able to successfully complete the tasks 17% more often than the nearest competing method.

Critical Analysis

The paper makes a strong case for the benefits of PLUNDER over existing IL approaches. By representing the learned skills as probabilistic programs rather than neural networks, PLUNDER is able to overcome many of the typical limitations of neural-network-based IL methods.

One potential limitation mentioned in the paper is that PLUNDER currently assumes the demonstrations come from a single expert. An interesting area for further research could be extending PLUNDER to handle demonstrations from multiple, potentially conflicting, experts.

Additionally, while the paper demonstrates PLUNDER's superior performance on the benchmark tasks, it would be valuable to see how the approach scales to more complex real-world scenarios. Applying PLUNDER in domains like robotics or autonomous driving could provide additional insights into its strengths and weaknesses.

Overall, the PLUNDER algorithm represents a promising step forward in programmatic imitation learning, and the paper provides a thoughtful technical contribution to the field of imitation learning.

Conclusion

The paper introduces PLUNDER, a novel programmatic imitation learning algorithm that can learn skills from noisy real-world demonstrations. By representing the learned policies as probabilistic programs, PLUNDER is able to address several key limitations of neural network-based imitation learning approaches.

PLUNDER's strong performance on benchmark tasks suggests it could be a valuable tool for teaching robots new skills through imitation. As the field of imitation learning continues to advance, approaches like PLUNDER that can handle the complexities of real-world data will likely become increasingly important.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Towards Imitation Learning in Real World Unstructured Social Mini-Games in Pedestrian Crowds

Rohan Chandra, Haresh Karnan, Negar Mehr, Peter Stone, Joydeep Biswas

Imitation Learning (IL) strategies are used to generate policies for robot motion planning and navigation by learning from human trajectories. Recently, there has been a lot of excitement in applying IL in social interactions arising in urban environments such as university campuses, restaurants, grocery stores, and hospitals. However, obtaining numerous expert demonstrations in social settings might be expensive, risky, or even impossible. Current approaches therefore, focus only on simulated social interaction scenarios. This raises the question: textit{How can a robot learn to imitate an expert demonstrator from real world multi-agent social interaction scenarios}? It remains unknown which, if any, IL methods perform well and what assumptions they require. We benchmark representative IL methods in real world social interaction scenarios on a motion planning task, using a novel pedestrian intersection dataset collected at the University of Texas at Austin campus. Our evaluation reveals two key findings: first, learning multi-agent cost functions is required for learning the diverse behavior modes of agents in tightly coupled interactions and second, conditioning the training of IL methods on partial state information or providing global information in simulation can improve imitation learning, especially in real world social interaction scenarios.

5/28/2024

cs.RO cs.AI cs.LG cs.MA

🏅

Imitation Bootstrapped Reinforcement Learning

Hengyuan Hu, Suvir Mirchandani, Dorsa Sadigh

Despite the considerable potential of reinforcement learning (RL), robotic control tasks predominantly rely on imitation learning (IL) due to its better sample efficiency. However, it is costly to collect comprehensive expert demonstrations that enable IL to generalize to all possible scenarios, and any distribution shift would require recollecting data for finetuning. Therefore, RL is appealing if it can build upon IL as an efficient autonomous self-improvement procedure. We propose imitation bootstrapped reinforcement learning (IBRL), a novel framework for sample-efficient RL with demonstrations that first trains an IL policy on the provided demonstrations and then uses it to propose alternative actions for both online exploration and bootstrapping target values. Compared to prior works that oversample the demonstrations or regularize RL with an additional imitation loss, IBRL is able to utilize high quality actions from IL policies since the beginning of training, which greatly accelerates exploration and training efficiency. We evaluate IBRL on 6 simulation and 3 real-world tasks spanning various difficulty levels. IBRL significantly outperforms prior methods and the improvement is particularly more prominent in harder tasks.

5/7/2024

cs.LG cs.AI

🏷️

Beyond Imitation: A Life-long Policy Learning Framework for Path Tracking Control of Autonomous Driving

C. Gong, C. Lu, Z. Li, Z. Liu, J. Gong, X. Chen

Model-free learning-based control methods have recently shown significant advantages over traditional control methods in avoiding complex vehicle characteristic estimation and parameter tuning. As a primary policy learning method, imitation learning (IL) is capable of learning control policies directly from expert demonstrations. However, the performance of IL policies is highly dependent on the data sufficiency and quality of the demonstrations. To alleviate the above problems of IL-based policies, a lifelong policy learning (LLPL) framework is proposed in this paper, which extends the IL scheme with lifelong learning (LLL). First, a novel IL-based model-free control policy learning method for path tracking is introduced. Even with imperfect demonstration, the optimal control policy can be learned directly from historical driving data. Second, by using the LLL method, the pre-trained IL policy can be safely updated and fine-tuned with incremental execution knowledge. Third, a knowledge evaluation method for policy learning is introduced to avoid learning redundant or inferior knowledge, thus ensuring the performance improvement of online policy learning. Experiments are conducted using a high-fidelity vehicle dynamic model in various scenarios to evaluate the performance of the proposed method. The results show that the proposed LLPL framework can continuously improve the policy performance with collected incremental driving data, and achieves the best accuracy and control smoothness compared to other baseline methods after evolving on a 7 km curved road. Through learning and evaluation with noisy real-life data collected in an off-road environment, the proposed LLPL framework also demonstrates its applicability in learning and evolving in real-life scenarios.

4/29/2024

cs.RO

Combining RL and IL using a dynamic, performance-based modulation over learning signals and its application to local planning

Francisco Leiva, Javier Ruiz-del-Solar

This paper proposes a method to combine reinforcement learning (RL) and imitation learning (IL) using a dynamic, performance-based modulation over learning signals. The proposed method combines RL and behavioral cloning (IL), or corrective feedback in the action space (interactive IL/IIL), by dynamically weighting the losses to be optimized, taking into account the backpropagated gradients used to update the policy and the agent's estimated performance. In this manner, RL and IL/IIL losses are combined by equalizing their impact on the policy's updates, while modulating said impact such that IL signals are prioritized at the beginning of the learning process, and as the agent's performance improves, the RL signals become progressively more relevant, allowing for a smooth transition from pure IL/IIL to pure RL. The proposed method is used to learn local planning policies for mobile robots, synthesizing IL/IIL signals online by means of a scripted policy. An extensive evaluation of the application of the proposed method to this task is performed in simulations, and it is empirically shown that it outperforms pure RL in terms of sample efficiency (achieving the same level of performance in the training environment utilizing approximately 4 times less experiences), while consistently producing local planning policies with better performance metrics (achieving an average success rate of 0.959 in an evaluation environment, outperforming pure RL by 12.5% and pure IL by 13.9%). Furthermore, the obtained local planning policies are successfully deployed in the real world without performing any major fine tuning. The proposed method can extend existing RL algorithms, and is applicable to other problems for which generating IL/IIL signals online is feasible. A video summarizing some of the real world experiments that were conducted can be found in https://youtu.be/mZlaXn9WGzw.

5/17/2024

cs.RO