Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving

2406.03877

Published 6/12/2024 by Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, Junchi Yan

Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving

Abstract

In an era marked by the rapid scaling of foundation models, autonomous driving technologies are approaching a transformative threshold where end-to-end autonomous driving (E2E-AD) emerges due to its potential of scaling up in the data-driven manner. However, existing E2E-AD methods are mostly evaluated under the open-loop log-replay manner with L2 errors and collision rate as metrics (e.g., in nuScenes), which could not fully reflect the driving performance of algorithms as recently acknowledged in the community. For those E2E-AD methods evaluated under the closed-loop protocol, they are tested in fixed routes (e.g., Town05Long and Longest6 in CARLA) with the driving score as metrics, which is known for high variance due to the unsmoothed metric function and large randomness in the long route. Besides, these methods usually collect their own data for training, which makes algorithm-level fair comparison infeasible. To fulfill the paramount need of comprehensive, realistic, and fair testing environments for Full Self-Driving (FSD), we present Bench2Drive, the first benchmark for evaluating E2E-AD systems' multiple abilities in a closed-loop manner. Bench2Drive's official training data consists of 2 million fully annotated frames, collected from 10000 short clips uniformly distributed under 44 interactive scenarios (cut-in, overtaking, detour, etc), 23 weathers (sunny, foggy, rainy, etc), and 12 towns (urban, village, university, etc) in CARLA v2. Its evaluation protocol requires E2E-AD models to pass 44 interactive scenarios under different locations and weathers which sums up to 220 routes and thus provides a comprehensive and disentangled assessment about their driving capability under different situations. We implement state-of-the-art E2E-AD models and evaluate them in Bench2Drive, providing insights regarding current status and future directions.

Create account to get full access

Overview

This paper introduces Bench2Drive, a benchmark for evaluating the performance of closed-loop end-to-end autonomous driving systems.
The benchmark aims to assess a wide range of driving abilities, going beyond traditional metrics like lateral and longitudinal control.
The authors argue that a more comprehensive set of benchmarks is needed to push the boundaries of autonomous driving technology.

Plain English Explanation

The paper describes a new way to test and compare self-driving car systems, called Bench2Drive. Current tests for self-driving cars typically focus on how well the car can stay in its lane and control its speed. However, the authors believe this doesn't capture the full range of skills needed for truly capable autonomous driving.

Bench2Drive is designed to assess a much broader set of abilities, like how well the car can navigate busy intersections, handle unexpected events, and interact with other vehicles and pedestrians. The goal is to create a more comprehensive benchmark that can drive the development of more advanced self-driving technology.

By measuring a wider variety of skills, the authors hope Bench2Drive will lead to self-driving cars that are better able to handle the complexities of real-world driving. This could help bring us closer to the promise of safe, reliable self-driving cars that can operate in a wide range of scenarios.

Technical Explanation

The paper introduces the Bench2Drive benchmark, which aims to comprehensively evaluate the performance of closed-loop end-to-end autonomous driving systems. Unlike traditional benchmarks that focus on specific tasks like lateral and longitudinal control, Bench2Drive assesses a much broader range of driving abilities, including [link to "end-to-end autonomous driving challenges and frontiers" paper] navigation, interaction with other road users, and handling of unexpected events.

The benchmark is designed as a closed-loop simulation environment that can generate diverse driving scenarios. It includes [link to "is ego-status all you need?" paper] a variety of road layouts, traffic conditions, and environmental factors to test the robustness and versatility of autonomous driving systems. The authors also propose [link to "NeuronCAP" paper] novel evaluation metrics that go beyond traditional measures like lateral and longitudinal control errors.

To demonstrate the capabilities of Bench2Drive, the authors present results from experiments with several state-of-the-art end-to-end autonomous driving models. The findings highlight the need for a more holistic approach to autonomous driving benchmarking, as the models exhibited varied performance across the different driving abilities tested. The authors argue that Bench2Drive can serve as a valuable tool to drive progress in the field of [link to "D2E autonomous decision-making dataset" paper] closed-loop end-to-end autonomous driving.

Critical Analysis

The Bench2Drive benchmark proposed in this paper represents an important step forward in autonomous driving evaluation. By assessing a broader range of driving abilities, it addresses a key limitation of existing benchmarks that focus primarily on low-level control tasks.

However, the authors acknowledge that Bench2Drive is still a simulation-based environment and may not fully capture the complexity of real-world driving conditions. Additionally, the benchmark currently relies on pre-defined scenarios, which could limit the diversity of situations tested. [link to "unified end-to-end V2X cooperative autonomous" paper] Further research is needed to investigate how the benchmark could be extended to incorporate more dynamic and unpredictable events.

Another potential area for improvement is the evaluation metrics used by Bench2Drive. While the proposed metrics aim to capture higher-level driving abilities, there may be challenges in translating these measures to real-world safety and performance. Continued collaboration with domain experts and regulators could help refine the metrics to ensure they are well-aligned with practical deployment concerns.

Overall, the Bench2Drive benchmark represents a valuable contribution to the field of autonomous driving research. By pushing the boundaries of what can be measured, it has the potential to accelerate the development of more capable and robust self-driving systems. As the authors suggest, further research and refinement of the benchmark will be important to realize its full potential.

Conclusion

The Bench2Drive benchmark introduced in this paper is a significant advancement in the evaluation of closed-loop end-to-end autonomous driving systems. By assessing a wide range of driving abilities beyond traditional metrics, Bench2Drive aims to drive progress in the development of more capable and versatile self-driving technologies.

The authors' emphasis on comprehensive benchmarking is important, as it recognizes the need to move beyond narrow performance measures and address the full complexity of real-world driving. While Bench2Drive is not without limitations, it represents a valuable step forward in autonomous driving research and has the potential to accelerate the progress towards safer and more reliable self-driving cars.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

End-to-end Autonomous Driving: Challenges and Frontiers

Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li

The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework. we maintain an active repository that contains up-to-date literature and open-source projects at https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving.

4/23/2024

cs.RO cs.AI cs.CV cs.LG

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez

End-to-end autonomous driving recently emerged as a promising research direction to target autonomy from a full-stack perspective. Along this line, many of the latest works follow an open-loop evaluation setting on nuScenes to study the planning behavior. In this paper, we delve deeper into the problem by conducting thorough analyses and demystifying more devils in the details. We initially observed that the nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models incorporating ego status, such as the ego vehicle's velocity. These models tend to rely predominantly on the ego vehicle's status for future path planning. Beyond the limitations of the dataset, we also note that current metrics do not comprehensively assess the planning quality, leading to potentially biased conclusions drawn from existing benchmarks. To address this issue, we introduce a new metric to evaluate whether the predicted trajectories adhere to the road. We further propose a simple baseline able to achieve competitive results without relying on perception annotations. Given the current limitations on the benchmark and metrics, we suggest the community reassess relevant prevailing research and be cautious whether the continued pursuit of state-of-the-art would yield convincing and universal conclusions. Code and models are available at url{https://github.com/NVlabs/BEV-Planner}

6/4/2024

cs.CV

NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving

William Ljungbergh, Adam Tonderski, Joakim Johnander, Holger Caesar, Kalle {AA}strom, Michael Felsberg, Christoffer Petersson

We present a versatile NeRF-based simulator for testing autonomous driving (AD) software systems, designed with a focus on sensor-realistic closed-loop evaluation and the creation of safety-critical scenarios. The simulator learns from sequences of real-world driving sensor data and enables reconfigurations and renderings of new, unseen scenarios. In this work, we use our simulator to test the responses of AD models to safety-critical scenarios inspired by the European New Car Assessment Programme (Euro NCAP). Our evaluation reveals that, while state-of-the-art end-to-end planners excel in nominal driving scenarios in an open-loop setting, they exhibit critical flaws when navigating our safety-critical scenarios in a closed-loop setting. This highlights the need for advancements in the safety and real-world usability of end-to-end planners. By publicly releasing our simulator and scenarios as an easy-to-run evaluation suite, we invite the research community to explore, refine, and validate their AD models in controlled, yet highly configurable and challenging sensor-realistic environments. Code and instructions can be found at https://github.com/atonderski/neuro-ncap

4/24/2024

cs.CV

💬

D2E-An Autonomous Decision-making Dataset involving Driver States and Human Evaluation

Zehong Ke, Yanbo Jiang, Yuning Wang, Hao Cheng, Jinhao Li, Jianqiang Wang

With the advancement of deep learning technology, data-driven methods are increasingly used in the decision-making of autonomous driving, and the quality of datasets greatly influenced the model performance. Although current datasets have made significant progress in the collection of vehicle and environment data, emphasis on human-end data including the driver states and human evaluation is not sufficient. In addition, existing datasets consist mostly of simple scenarios such as car following, resulting in low interaction levels. In this paper, we introduce the Driver to Evaluation dataset (D2E), an autonomous decision-making dataset that contains data on driver states, vehicle states, environmental situations, and evaluation scores from human reviewers, covering a comprehensive process of vehicle decision-making. Apart from regular agents and surrounding environment information, we not only collect driver factor data including first-person view videos, physiological signals, and eye attention data, but also provide subjective rating scores from 40 human volunteers. The dataset is mixed of driving simulator scenes and real-road ones. High-interaction situations are designed and filtered to ensure behavior diversity. Through data organization, analysis, and preprocessing, D2E contains over 1100 segments of interactive driving case data covering from human driver factor to evaluation results, supporting the development of data-driven decision-making related algorithms.

6/5/2024

cs.CV cs.DB cs.RO