MEGA-DAgger: Imitation Learning with Multiple Imperfect Experts

2303.00638

Published 5/3/2024 by Xiatao Sun, Shuo Yang, Mingyan Zhou, Kunpeng Liu, Rahul Mangharam

🏷️

Abstract

Imitation learning has been widely applied to various autonomous systems thanks to recent development in interactive algorithms that address covariate shift and compounding errors induced by traditional approaches like behavior cloning. However, existing interactive imitation learning methods assume access to one perfect expert. Whereas in reality, it is more likely to have multiple imperfect experts instead. In this paper, we propose MEGA-DAgger, a new DAgger variant that is suitable for interactive learning with multiple imperfect experts. First, unsafe demonstrations are filtered while aggregating the training data, so the imperfect demonstrations have little influence when training the novice policy. Next, experts are evaluated and compared on scenarios-specific metrics to resolve the conflicted labels among experts. Through experiments in autonomous racing scenarios, we demonstrate that policy learned using MEGA-DAgger can outperform both experts and policies learned using the state-of-the-art interactive imitation learning algorithms such as Human-Gated DAgger. The supplementary video can be found at url{https://youtu.be/wPCht31MHrw}.

Create account to get full access

Overview

The paper presents a new imitation learning algorithm called MEGA-DAgger that can learn from multiple imperfect experts.
Traditional imitation learning approaches like behavior cloning assume access to a single perfect expert, but in reality, multiple imperfect experts are more common.
MEGA-DAgger addresses challenges like covariate shift and compounding errors that arise when learning from imperfect demonstrations.

Plain English Explanation

Imitation learning is a powerful technique used in autonomous systems, where a machine learns by observing and imitating the behavior of an expert. Recent advances in interactive algorithms, such as DAgger and Human-Gated DAgger, have made imitation learning more effective by addressing issues like covariate shift and compounding errors.

However, these existing methods assume that the expert providing the demonstrations is perfect. In reality, it's more common to have access to multiple experts, and those experts may not always be perfectly accurate or consistent. This can create problems when trying to learn from their demonstrations.

The paper proposes a new algorithm called MEGA-DAgger, which is designed to work with multiple imperfect experts. MEGA-DAgger has two key features:

It filters out "unsafe" demonstrations from the imperfect experts, so that those poor-quality demonstrations don't overly influence the training of the novice policy.
It evaluates and compares the experts on scenario-specific metrics to resolve any conflicts or inconsistencies in their demonstrations.

Through experiments in autonomous racing scenarios, the researchers show that the policy learned using MEGA-DAgger can outperform both the individual experts and policies learned using other state-of-the-art interactive imitation learning algorithms.

Technical Explanation

The paper proposes MEGA-DAgger, a new variant of the DAgger algorithm that can handle learning from multiple imperfect experts. The key innovations are:

Unsafe Demonstration Filtering: When aggregating the training data from the multiple experts, MEGA-DAgger filters out "unsafe" demonstrations that are likely to be of poor quality. This ensures that the imperfect demonstrations have minimal influence on the final trained policy.
Expert Evaluation and Comparison: MEGA-DAgger evaluates the performance of each expert on scenario-specific metrics, and uses this information to resolve any conflicts or inconsistencies in their demonstrations. This allows the algorithm to learn an optimal policy by intelligently combining the strengths of the different experts.

The paper evaluates MEGA-DAgger in the context of autonomous racing, where multiple human drivers provide demonstrations. The results show that the policy learned using MEGA-DAgger outperforms both the individual expert drivers and policies learned using other state-of-the-art interactive imitation learning algorithms, such as Human-Gated DAgger and Imitation Game.

Critical Analysis

The paper presents a well-designed and thorough evaluation of the MEGA-DAgger algorithm, demonstrating its effectiveness in handling multiple imperfect experts. However, a few potential limitations and areas for further research are worth noting:

Generalization to Other Domains: The paper focuses on the autonomous racing domain, and it's unclear how well MEGA-DAgger would perform in other application areas with different types of experts and task characteristics. Further research is needed to assess the algorithm's broader applicability.
Handling Diverse Expert Skill Levels: The paper assumes that the experts have similar skill levels, but in real-world scenarios, the experts may have a wide range of abilities. Additional mechanisms may be required to effectively leverage diverse expert knowledge.
Explainability and Transparency: The paper does not provide much insight into how MEGA-DAgger resolves conflicts between the experts or determines which demonstrations to filter out. Increased explainability and transparency of the algorithm's decision-making process could be valuable for building trust and understanding its behavior.
Scalability with Large Numbers of Experts: As the number of experts increases, the complexity of evaluating and aggregating their demonstrations may become a challenge. Further research is needed to understand the algorithm's scalability and potential ways to address it.

Overall, the MEGA-DAgger algorithm represents a significant advancement in the field of imitation learning, and the paper provides a solid foundation for future research in this area.

Conclusion

The paper presents MEGA-DAgger, a novel imitation learning algorithm that can effectively learn from multiple imperfect experts. By filtering out unsafe demonstrations and intelligently combining the strengths of different experts, MEGA-DAgger is able to outperform both the individual experts and other state-of-the-art interactive imitation learning methods.

This research has important implications for the development of autonomous systems, where access to a single perfect expert is often unrealistic. MEGA-DAgger opens up new possibilities for leveraging the collective knowledge of multiple human experts, even if they are not individually perfect. As autonomous systems continue to become more prevalent in our lives, techniques like MEGA-DAgger will be crucial for ensuring that these systems can learn and perform at the highest levels.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning

Xiaoyu Zhang, Matthew Chang, Pranav Kumar, Saurabh Gupta

A common failure mode for policies trained with imitation is compounding execution errors at test time. When the learned policy encounters states that are not present in the expert demonstrations, the policy fails, leading to degenerate behavior. The Dataset Aggregation, or DAgger approach to this problem simply collects more data to cover these failure states. However, in practice, this is often prohibitively expensive. In this work, we propose Diffusion Meets DAgger (DMD), a method to reap the benefits of DAgger without the cost for eye-in-hand imitation learning problems. Instead of collecting new samples to cover out-of-distribution states, DMD uses recent advances in diffusion models to synthesize these samples. This leads to robust performance from few demonstrations. We compare DMD against behavior cloning baseline across four tasks: pushing, stacking, pouring, and shirt hanging. In pushing, DMD achieves 80% success rate with as few as 8 expert demonstrations, where naive behavior cloning reaches only 20%. In stacking, DMD succeeds on average 92% of the time across 5 cups, versus 40% for BC. When pouring coffee beans, DMD transfers to another cup successfully 80% of the time. Finally, DMD attains 90% success rate for hanging shirt on a clothing rack.

6/6/2024

cs.RO cs.AI cs.CV cs.LG

SAFE-GIL: SAFEty Guided Imitation Learning

Yusuf Umut Ciftci, Zeyuan Feng, Somil Bansal

Behavior Cloning is a popular approach to Imitation Learning, in which a robot observes an expert supervisor and learns a control policy. However, behavior cloning suffers from the compounding error problem - the policy errors compound as it deviates from the expert demonstrations and might lead to catastrophic system failures, limiting its use in safety-critical applications. On-policy data aggregation methods are able to address this issue at the cost of rolling out and repeated training of the imitation policy, which can be tedious and computationally prohibitive. We propose SAFE-GIL, an off-policy behavior cloning method that guides the expert via adversarial disturbance during data collection. The algorithm abstracts the imitation error as an adversarial disturbance in the system dynamics, injects it during data collection to expose the expert to safety critical states, and collects corrective actions. Our method biases training to more closely replicate expert behavior in safety-critical states and allows more variance in less critical states. We compare our method with several behavior cloning techniques and DAgger on autonomous navigation and autonomous taxiing tasks and show higher task success and safety, especially in low data regimes where the likelihood of error is higher, at a slight drop in the performance.

4/9/2024

cs.RO cs.LG cs.SY eess.SY

Towards Imitation Learning in Real World Unstructured Social Mini-Games in Pedestrian Crowds

Rohan Chandra, Haresh Karnan, Negar Mehr, Peter Stone, Joydeep Biswas

Imitation Learning (IL) strategies are used to generate policies for robot motion planning and navigation by learning from human trajectories. Recently, there has been a lot of excitement in applying IL in social interactions arising in urban environments such as university campuses, restaurants, grocery stores, and hospitals. However, obtaining numerous expert demonstrations in social settings might be expensive, risky, or even impossible. Current approaches therefore, focus only on simulated social interaction scenarios. This raises the question: textit{How can a robot learn to imitate an expert demonstrator from real world multi-agent social interaction scenarios}? It remains unknown which, if any, IL methods perform well and what assumptions they require. We benchmark representative IL methods in real world social interaction scenarios on a motion planning task, using a novel pedestrian intersection dataset collected at the University of Texas at Austin campus. Our evaluation reveals two key findings: first, learning multi-agent cost functions is required for learning the diverse behavior modes of agents in tightly coupled interactions and second, conditioning the training of IL methods on partial state information or providing global information in simulation can improve imitation learning, especially in real world social interaction scenarios.

5/28/2024

cs.RO cs.AI cs.LG cs.MA

Online Adaptation for Enhancing Imitation Learning Policies

Federico Malato, Ville Hautamaki

Imitation learning enables autonomous agents to learn from human examples, without the need for a reward signal. Still, if the provided dataset does not encapsulate the task correctly, or when the task is too complex to be modeled, such agents fail to reproduce the expert policy. We propose to recover from these failures through online adaptation. Our approach combines the action proposal coming from a pre-trained policy with relevant experience recorded by an expert. The combination results in an adapted action that closely follows the expert. Our experiments show that an adapted agent performs better than its pure imitation learning counterpart. Notably, adapted agents can achieve reasonable performance even when the base, non-adapted policy catastrophically fails.

6/10/2024

cs.AI cs.LG