GenAD: Generative End-to-End Autonomous Driving

2402.11502

Published 4/9/2024 by Wenzhao Zheng, Ruiqi Song, Xianda Guo, Chenming Zhang, Long Chen

GenAD: Generative End-to-End Autonomous Driving

Abstract

Directly producing planning results from raw sensors has been a long-desired solution for autonomous driving and has attracted increasing attention recently. Most existing end-to-end autonomous driving methods factorize this problem into perception, motion prediction, and planning. However, we argue that the conventional progressive pipeline still cannot comprehensively model the entire traffic evolution process, e.g., the future interaction between the ego car and other traffic participants and the structural trajectory prior. In this paper, we explore a new paradigm for end-to-end autonomous driving, where the key is to predict how the ego car and the surroundings evolve given past scenes. We propose GenAD, a generative framework that casts autonomous driving into a generative modeling problem. We propose an instance-centric scene tokenizer that first transforms the surrounding scenes into map-aware instance tokens. We then employ a variational autoencoder to learn the future trajectory distribution in a structural latent space for trajectory prior modeling. We further adopt a temporal model to capture the agent and ego movements in the latent space to generate more effective future trajectories. GenAD finally simultaneously performs motion prediction and planning by sampling distributions in the learned structural latent space conditioned on the instance tokens and using the learned temporal model to generate futures. Extensive experiments on the widely used nuScenes benchmark show that the proposed GenAD achieves state-of-the-art performance on vision-centric end-to-end autonomous driving with high efficiency. Code: https://github.com/wzzheng/GenAD.

Create account to get full access

Overview

This paper proposes a novel end-to-end autonomous driving system called GenAD that leverages generative modeling techniques.
GenAD aims to enable fully autonomous vehicles by generating diverse driving behaviors and handling complex traffic scenarios.
The system integrates perception, prediction, and control modules in a unified framework to provide a comprehensive autonomous driving solution.

Plain English Explanation

The paper presents a new autonomous driving system called GenAD that takes a different approach compared to traditional autonomous driving systems. Instead of relying on rigid rules and predefined behaviors, GenAD uses generative modeling techniques to create diverse and adaptive driving behaviors.

The key idea is to train a deep learning model that can generate a wide range of plausible driving actions, rather than just following a pre-programmed set of rules. This allows the system to handle complex and unpredictable traffic scenarios more effectively.

GenAD integrates the different components of an autonomous driving system, such as perception, prediction, and control, into a single unified framework. This means the system can make decisions and take actions in a more coordinated and holistic way, rather than treating these components as separate modules.

The researchers believe this end-to-end approach can lead to more robust and capable autonomous driving systems that can navigate the real-world more safely and effectively.

Technical Explanation

The paper proposes a GenAD: Generative End-to-End Autonomous Driving system that combines generative modeling techniques with an integrated perception, prediction, and control architecture.

The key components of GenAD include:

Perception Module: Uses deep learning models to extract relevant information from sensor data, such as the positions and states of surrounding vehicles, pedestrians, and obstacles.
Prediction Module: Leverages GraphAD: Interaction-Aware Scene Graph for End-to-End Autonomous Driving to predict the future trajectories of dynamic agents.
Control Module: Employs Quad: Query-Based Interpretable Neural Motion Planning to generate safe and efficient driving actions.
Generative Model: Trained using Hierarchical Generative Adversarial Imitation Learning for Mid-Level Behavioral Cloning to produce diverse driving behaviors.

The system is designed to handle complex traffic scenarios, including Versatile, Scene-Consistent Traffic Scenario Generation as Augmentation for Autonomous Driving and LEGO-Drive: Language-Enhanced Goal-Oriented Closed-Loop Autonomous Driving tasks.

Critical Analysis

The paper presents a promising approach to autonomous driving that aims to address some of the limitations of traditional rule-based systems. By integrating generative modeling techniques, the GenAD system has the potential to handle a wider range of traffic scenarios and adapt to changing conditions more effectively.

However, the paper does not provide a detailed evaluation of the system's performance or robustness in real-world conditions. The authors mention testing on simulated environments and benchmark datasets, but more comprehensive validation would be needed to assess the system's practical viability.

Additionally, the reliance on deep learning models raises concerns about their interpretability and potential biases. The paper mentions the use of Quad: Query-Based Interpretable Neural Motion Planning to address this, but further investigation into the system's transparency and accountability would be valuable.

Conclusion

The GenAD system proposed in this paper represents an innovative approach to autonomous driving that leverages generative modeling techniques to create more adaptive and diverse driving behaviors. By integrating perception, prediction, and control modules into a unified framework, the system aims to provide a comprehensive solution for navigating complex traffic scenarios.

While the paper presents promising results, further research and validation are needed to assess the practical feasibility and safety of the GenAD system. Nonetheless, this work contributes to the ongoing efforts to develop more capable and reliable autonomous driving systems that can ultimately improve transportation safety and efficiency.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

GAD-Generative Learning for HD Map-Free Autonomous Driving

Weijian Sun, Yanbo Jia, Qi Zeng, Zihao Liu, Jiang Liao, Yue Li, Xianfeng Li

Deep-learning-based techniques have been widely adopted for autonomous driving software stacks for mass production in recent years, focusing primarily on perception modules, with some work extending this method to prediction modules. However, the downstream planning and control modules are still designed with hefty handcrafted rules, dominated by optimization-based methods such as quadratic programming or model predictive control. This results in a performance bottleneck for autonomous driving systems in that corner cases simply cannot be solved by enumerating hand-crafted rules. We present a deep-learning-based approach that brings prediction, decision, and planning modules together with the attempt to overcome the rule-based methods' deficiency in real-world applications of autonomous driving, especially for urban scenes. The DNN model we proposed is solely trained with 10 hours of human driver data, and it supports all mass-production ADAS features available on the market to date. This method is deployed onto a Jiyue test car with no modification to its factory-ready sensor set and compute platform. the feasibility, usability, and commercial potential are demonstrated in this article.

6/3/2024

cs.RO cs.CV

🧪

DualAD: Disentangling the Dynamic and Static World for End-to-End Driving

Simon Doll, Niklas Hanselmann, Lukas Schneider, Richard Schulz, Marius Cordts, Markus Enzweiler, Hendrik P. A. Lensch

State-of-the-art approaches for autonomous driving integrate multiple sub-tasks of the overall driving task into a single pipeline that can be trained in an end-to-end fashion by passing latent representations between the different modules. In contrast to previous approaches that rely on a unified grid to represent the belief state of the scene, we propose dedicated representations to disentangle dynamic agents and static scene elements. This allows us to explicitly compensate for the effect of both ego and object motion between consecutive time steps and to flexibly propagate the belief state through time. Furthermore, dynamic objects can not only attend to the input camera images, but also directly benefit from the inferred static scene structure via a novel dynamic-static cross-attention. Extensive experiments on the challenging nuScenes benchmark demonstrate the benefits of the proposed dual-stream design, especially for modelling highly dynamic agents in the scene, and highlight the improved temporal consistency of our approach. Our method titled DualAD not only outperforms independently trained single-task networks, but also improves over previous state-of-the-art end-to-end models by a large margin on all tasks along the functional chain of driving.

6/11/2024

cs.CV

End-to-end Autonomous Driving: Challenges and Frontiers

Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li

The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework. we maintain an active repository that contains up-to-date literature and open-source projects at https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving.

4/23/2024

cs.RO cs.AI cs.CV cs.LG

Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation

Enhui Ma, Lijun Zhou, Tao Tang, Zhan Zhang, Dong Han, Junpeng Jiang, Kun Zhan, Peng Jia, Xianpeng Lang, Haiyang Sun, Di Lin, Kaicheng Yu

Using generative models to synthesize new data has become a de-facto standard in autonomous driving to address the data scarcity issue. Though existing approaches are able to boost perception models, we discover that these approaches fail to improve the performance of planning of end-to-end autonomous driving models as the generated videos are usually less than 8 frames and the spatial and temporal inconsistencies are not negligible. To this end, we propose Delphi, a novel diffusion-based long video generation method with a shared noise modeling mechanism across the multi-views to increase spatial consistency, and a feature-aligned module to achieves both precise controllability and temporal consistency. Our method can generate up to 40 frames of video without loss of consistency which is about 5 times longer compared with state-of-the-art methods. Instead of randomly generating new data, we further design a sampling policy to let Delphi generate new data that are similar to those failure cases to improve the sample efficiency. This is achieved by building a failure-case driven framework with the help of pre-trained visual language models. Our extensive experiment demonstrates that our Delphi generates a higher quality of long videos surpassing previous state-of-the-art methods. Consequentially, with only generating 4% of the training dataset size, our framework is able to go beyond perception and prediction tasks, for the first time to the best of our knowledge, boost the planning performance of the end-to-end autonomous driving model by a margin of 25%.

6/7/2024

cs.CV