PLUTO: Pushing the Limit of Imitation Learning-based Planning for Autonomous Driving

2404.14327

Published 4/23/2024 by Jie Cheng, Yingbing Chen, Qifeng Chen

PLUTO: Pushing the Limit of Imitation Learning-based Planning for Autonomous Driving

Abstract

We present PLUTO, a powerful framework that pushes the limit of imitation learning-based planning for autonomous driving. Our improvements stem from three pivotal aspects: a longitudinal-lateral aware model architecture that enables flexible and diverse driving behaviors; An innovative auxiliary loss computation method that is broadly applicable and efficient for batch-wise calculation; A novel training framework that leverages contrastive learning, augmented by a suite of new data augmentations to regulate driving behaviors and facilitate the understanding of underlying interactions. We assessed our framework using the large-scale real-world nuPlan dataset and its associated standardized planning benchmark. Impressively, PLUTO achieves state-of-the-art closed-loop performance, beating other competing learning-based methods and surpassing the current top-performed rule-based planner for the first time. Results and code are available at https://jchengai.github.io/pluto.

Create account to get full access

Overview

This paper presents Pluto, a novel approach to autonomous driving that pushes the limits of imitation learning-based planning.
Pluto aims to address the challenges of designing robust and safe planning systems for autonomous vehicles in complex real-world environments.
The paper explores how imitation learning can be leveraged to enable autonomous driving systems to learn from expert demonstrations and generalize to unseen scenarios.

Plain English Explanation

Autonomous driving is a complex challenge that requires vehicles to navigate safely and effectively in a wide range of real-world situations. Pluto: Pushing the Limit of Imitation Learning-based Planning for Autonomous Driving explores a new approach called Pluto that uses imitation learning to help autonomous vehicles learn from expert human drivers.

The key idea behind Pluto is to train the autonomous vehicle's planning system by having it observe and learn from demonstrations of skilled human drivers navigating challenging situations. By imitating the decision-making and control inputs of expert drivers, the autonomous system can develop the ability to plan and execute safe and effective driving maneuvers in complex environments.

This imitation learning approach aims to help autonomous vehicles generalize their skills beyond the specific scenarios they were trained on, allowing them to handle a wider range of real-world driving conditions. The paper investigates how Pluto can push the limits of what's possible with imitation learning-based planning for autonomous driving, potentially leading to more robust and capable self-driving systems.

Technical Explanation

The Pluto system leverages imitation learning to train an autonomous driving agent. The agent observes expert demonstrations of human drivers navigating various scenarios and learns to mimic their decision-making and control inputs.

The key technical components of Pluto include:

Expert Demonstration Collection: The researchers gather a dataset of expert human driving demonstrations, capturing the sensor inputs, vehicle state, and control actions.
Imitation Learning: Pluto employs an imitation learning algorithm to train the autonomous agent to predict the expert's control actions given the current driving context.
Hierarchical Planning: Pluto uses a hierarchical planning approach, with a high-level planner generating a global trajectory and a low-level controller responsible for tracking the trajectory and executing control commands.
Simulation-based Training: The autonomous agent is trained in a high-fidelity simulation environment before being deployed in the real world.

The paper presents experimental results demonstrating Pluto's ability to generalize to unseen scenarios and outperform baseline imitation learning-based approaches. The authors also discuss the limitations of the current system and potential avenues for future research.

Critical Analysis

The Pluto approach represents a promising step forward in the development of autonomous driving systems, as it aims to leverage the rich knowledge and decision-making skills of expert human drivers. By imitating these drivers, the autonomous agent can potentially learn to navigate complex real-world scenarios more effectively than traditional rule-based or optimization-based planning approaches.

However, the paper also acknowledges some key limitations of the Pluto system. For example, the reliance on high-quality expert demonstrations may be challenging to obtain in practice, and the system's ability to generalize to completely novel situations may be limited. Additionally, the paper does not fully address the challenges of ensuring the safety and robustness of the autonomous agent in the face of unforeseen events or adversarial conditions.

Further research is needed to explore ways of enhancing the robustness and safety of imitation learning-based autonomous driving systems, such as by incorporating additional sources of information (e.g., environmental sensors, prior knowledge, or safety constraints) or developing new training and validation techniques. Nonetheless, the Pluto approach represents an important step forward in the quest for more capable and reliable autonomous driving systems.

Conclusion

The Pluto paper presents a novel imitation learning-based approach to autonomous driving that aims to push the limits of what's possible with planning systems based on learning from expert demonstrations. By leveraging the decision-making skills of human drivers, Pluto seeks to enable autonomous vehicles to navigate complex real-world scenarios more effectively.

While the paper highlights several promising aspects of the Pluto system, it also acknowledges the need for further research to address the challenges of ensuring the safety and robustness of imitation learning-based autonomous driving. Nonetheless, the work represents an important contribution to the ongoing efforts to develop more capable and reliable self-driving technologies, with the potential to significantly impact the future of transportation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🏷️

Beyond Imitation: A Life-long Policy Learning Framework for Path Tracking Control of Autonomous Driving

C. Gong, C. Lu, Z. Li, Z. Liu, J. Gong, X. Chen

Model-free learning-based control methods have recently shown significant advantages over traditional control methods in avoiding complex vehicle characteristic estimation and parameter tuning. As a primary policy learning method, imitation learning (IL) is capable of learning control policies directly from expert demonstrations. However, the performance of IL policies is highly dependent on the data sufficiency and quality of the demonstrations. To alleviate the above problems of IL-based policies, a lifelong policy learning (LLPL) framework is proposed in this paper, which extends the IL scheme with lifelong learning (LLL). First, a novel IL-based model-free control policy learning method for path tracking is introduced. Even with imperfect demonstration, the optimal control policy can be learned directly from historical driving data. Second, by using the LLL method, the pre-trained IL policy can be safely updated and fine-tuned with incremental execution knowledge. Third, a knowledge evaluation method for policy learning is introduced to avoid learning redundant or inferior knowledge, thus ensuring the performance improvement of online policy learning. Experiments are conducted using a high-fidelity vehicle dynamic model in various scenarios to evaluate the performance of the proposed method. The results show that the proposed LLPL framework can continuously improve the policy performance with collected incremental driving data, and achieves the best accuracy and control smoothness compared to other baseline methods after evolving on a 7 km curved road. Through learning and evaluation with noisy real-life data collected in an off-road environment, the proposed LLPL framework also demonstrates its applicability in learning and evolving in real-life scenarios.

4/29/2024

cs.RO

Planning with Adaptive World Models for Autonomous Driving

Arun Balajee Vasudevan, Neehar Peri, Jeff Schneider, Deva Ramanan

Motion planning is crucial for safe navigation in complex urban environments. Historically, motion planners (MPs) have been evaluated with procedurally-generated simulators like CARLA. However, such synthetic benchmarks do not capture real-world multi-agent interactions. nuPlan, a recently released MP benchmark, addresses this limitation by augmenting real-world driving logs with closed-loop simulation logic, effectively turning the fixed dataset into a reactive simulator. We analyze the characteristics of nuPlan's recorded logs and find that each city has its own unique driving behaviors, suggesting that robust planners must adapt to different environments. We learn to model such unique behaviors with BehaviorNet, a graph convolutional neural network (GCNN) that predicts reactive agent behaviors using features derived from recently-observed agent histories; intuitively, some aggressive agents may tailgate lead vehicles, while others may not. To model such phenomena, BehaviorNet predicts parameters of an agent's motion controller rather than predicting its spacetime trajectory (as most forecasters do). Finally, we present AdaptiveDriver, a model-predictive control (MPC) based planner that unrolls different world models conditioned on BehaviorNet's predictions. Our extensive experiments demonstrate that AdaptiveDriver achieves state-of-the-art results on the nuPlan closed-loop planning benchmark, reducing test error from 6.4% to 4.6%, even when applied to never-before-seen cities.

6/18/2024

cs.RO cs.LG

Can Vehicle Motion Planning Generalize to Realistic Long-tail Scenarios?

Marcel Hallgarten, Julian Zapata, Martin Stoll, Katrin Renz, Andreas Zell

Real-world autonomous driving systems must make safe decisions in the face of rare and diverse traffic scenarios. Current state-of-the-art planners are mostly evaluated on real-world datasets like nuScenes (open-loop) or nuPlan (closed-loop). In particular, nuPlan seems to be an expressive evaluation method since it is based on real-world data and closed-loop, yet it mostly covers basic driving scenarios. This makes it difficult to judge a planner's capabilities to generalize to rarely-seen situations. Therefore, we propose a novel closed-loop benchmark interPlan containing several edge cases and challenging driving scenarios. We assess existing state-of-the-art planners on our benchmark and show that neither rule-based nor learning-based planners can safely navigate the interPlan scenarios. A recently evolving direction is the usage of foundation models like large language models (LLM) to handle generalization. We evaluate an LLM-only planner and introduce a novel hybrid planner that combines an LLM-based behavior planner with a rule-based motion planner that achieves state-of-the-art performance on our benchmark.

4/12/2024

cs.RO cs.AI cs.LG

Learning from Mistakes: a Weakly-supervised Method for Mitigating the Distribution Shift in Autonomous Vehicle Planning

Fazel Arasteh, Mohammed Elmahgiubi, Behzad Khamidehi, Hamidreza Mirkhani, Weize Zhang, Kasra Rezaee

The planning problem constitutes a fundamental aspect of the autonomous driving framework. Recent strides in representation learning have empowered vehicles to comprehend their surrounding environments, thereby facilitating the integration of learning-based planning strategies. Among these approaches, Imitation Learning stands out due to its notable training efficiency. However, traditional Imitation Learning methodologies encounter challenges associated with the co-variate shift phenomenon. We propose Learn from Mistakes (LfM) as a remedy to address this issue. The essence of LfM lies in deploying a pre-trained planner across diverse scenarios. Instances where the planner deviates from its immediate objectives, such as maintaining a safe distance from obstacles or adhering to traffic rules, are flagged as mistakes. The environments corresponding to these mistakes are categorized as out-of-distribution states and compiled into a new dataset termed closed-loop mistakes dataset. Notably, the absence of expert annotations for the closed-loop data precludes the applicability of standard imitation learning approaches. To facilitate learning from the closed-loop mistakes, we introduce Validity Learning, a weakly supervised method, which aims to discern valid trajectories within the current environmental context. Experimental evaluations conducted on the InD and Nuplan datasets reveal substantial enhancements in closed-loop metrics such as Progress and Collision Rate, underscoring the effectiveness of the proposed methodology.

6/4/2024

cs.RO cs.AI cs.LG