NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving

2404.07762

Published 4/24/2024 by William Ljungbergh, Adam Tonderski, Joakim Johnander, Holger Caesar, Kalle {AA}strom, Michael Felsberg, Christoffer Petersson

cs.CV

NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving

Abstract

We present a versatile NeRF-based simulator for testing autonomous driving (AD) software systems, designed with a focus on sensor-realistic closed-loop evaluation and the creation of safety-critical scenarios. The simulator learns from sequences of real-world driving sensor data and enables reconfigurations and renderings of new, unseen scenarios. In this work, we use our simulator to test the responses of AD models to safety-critical scenarios inspired by the European New Car Assessment Programme (Euro NCAP). Our evaluation reveals that, while state-of-the-art end-to-end planners excel in nominal driving scenarios in an open-loop setting, they exhibit critical flaws when navigating our safety-critical scenarios in a closed-loop setting. This highlights the need for advancements in the safety and real-world usability of end-to-end planners. By publicly releasing our simulator and scenarios as an easy-to-run evaluation suite, we invite the research community to explore, refine, and validate their AD models in controlled, yet highly configurable and challenging sensor-realistic environments. Code and instructions can be found at https://github.com/atonderski/neuro-ncap

Create account to get full access

Overview

This paper presents a closed-loop unit testing framework called NEuro NCAP for autonomous driving systems.
The framework aims to improve the safety and robustness of autonomous driving algorithms by subjecting them to diverse, challenging scenarios during testing.
Key aspects include the use of neural networks for generating realistic driving scenarios and a closed-loop testing process that incorporates feedback from the autonomous driving system.

Plain English Explanation

The paper describes a new testing framework for autonomous driving systems called NEuro NCAP. The goal is to make these systems safer and more reliable by putting them through rigorous testing in a wide variety of challenging situations.

The framework uses neural networks to generate realistic driving scenarios that the autonomous system must navigate. This is important because real-world driving can involve all sorts of complex and unpredictable situations, and testing needs to account for that complexity. The neural networks can create diverse scenarios that go beyond what traditional testing methods might cover.

Crucially, NEuro NCAP uses a "closed-loop" approach. This means the autonomous driving system is tested in a dynamic, interactive environment where it receives feedback and has to continuously adapt its behavior. This is closer to how the system would operate in the real world, rather than just running it through pre-defined test cases.

By subjecting autonomous driving algorithms to this kind of comprehensive, dynamic testing, the hope is to uncover issues and edge cases that would be missed by more conventional testing methods. This can lead to significant improvements in the safety and robustness of these critical systems before they are deployed in the real world.

Technical Explanation

The key technical aspects of the NEuro NCAP framework include:

Scenario Generation: The paper uses neural networks to generate realistic driving scenarios that can be used to test autonomous driving systems. This allows for the creation of a much more diverse set of test cases compared to manually-crafted scenarios.
Closed-Loop Testing: NEuro NCAP implements a closed-loop testing process where the autonomous driving system is embedded in the simulated environment and receives continuous feedback. This mimics the real-world interaction between the vehicle and its surroundings.
Comprehensive Evaluation: The framework evaluates the autonomous driving system's performance across a wide range of metrics, including vehicle control, safety, and task completion. This holistic assessment helps identify issues that may be missed by more narrowly-focused tests.
Iterative Refinement: The closed-loop nature of the testing allows the autonomous driving system to be iteratively refined and improved based on the feedback received during the testing process.

By combining these elements, NEuro NCAP aims to provide a more rigorous and realistic testing framework for autonomous driving systems, helping to uncover issues and edge cases that may not be detected through traditional testing approaches.

Critical Analysis

The paper acknowledges several key limitations and areas for further research:

The scenario generation process, while more diverse than manual methods, is still limited by the training data and architecture of the neural networks used. More work is needed to ensure the scenarios truly capture the full complexity of real-world driving.
The closed-loop testing is conducted in a simulation environment, which may not fully capture all the nuances and edge cases of the physical world. Bridging the gap between simulation and real-world performance is an ongoing challenge in the field of autonomous driving.
The paper does not provide a comprehensive evaluation of how NEuro NCAP compares to other testing frameworks in terms of the issues it can uncover or the improvements it can drive in autonomous driving systems. Further comparative studies would help establish the framework's relative strengths and weaknesses.
While the framework is designed to be general, its efficacy may depend on the specific autonomous driving algorithms and sensors being tested. More research is needed to understand how well NEuro NCAP performs across a diverse range of autonomous driving technologies.

Overall, the NEuro NCAP framework represents a promising step forward in improving the testing and development of autonomous driving systems. However, as with any new technology, there are still important challenges and open questions that require further research and exploration.

Conclusion

The NEuro NCAP framework proposed in this paper aims to enhance the safety and robustness of autonomous driving systems through a comprehensive, closed-loop testing approach. By leveraging neural networks to generate diverse driving scenarios and embedding the autonomous system in a dynamic, interactive simulation, the framework can uncover issues and edge cases that may be missed by traditional testing methods.

While the paper acknowledges several limitations and areas for further research, the NEuro NCAP approach represents a significant advancement in the field of autonomous driving testing. As autonomous vehicles continue to become more prevalent, the development of rigorous, realistic testing frameworks like this will be crucial for ensuring these systems can operate safely and reliably in the real world.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Controllable Adversaries

Wei-Jer Chang, Francesco Pittaluga, Masayoshi Tomizuka, Wei Zhan, Manmohan Chandraker

Evaluating the performance of autonomous vehicle planning algorithms necessitates simulating long-tail safety-critical traffic scenarios. However, traditional methods for generating such scenarios often fall short in terms of controllability and realism and neglect the dynamics of agent interactions. To mitigate these limitations, we introduce SAFE-SIM, a novel diffusion-based controllable closed-loop safety-critical simulation framework. Our approach yields two distinct advantages: 1) the generation of realistic long-tail safety-critical scenarios that closely emulate real-world conditions, and 2) enhanced controllability, enabling more comprehensive and interactive evaluations. We develop a novel approach to simulate safety-critical scenarios through an adversarial term in the denoising process, which allows an adversarial agent to challenge a planner with plausible maneuvers while all agents in the scene exhibit reactive and realistic behaviors. Furthermore, we propose novel guidance objectives and a partial diffusion process that enables a user to control key aspects of the generated scenarios, such as the collision type and aggressiveness of the adversarial driver, while maintaining the realism of the behavior. We validate our framework empirically using the NuScenes dataset, demonstrating improvements in both realism and controllability. These findings affirm that diffusion models provide a robust and versatile foundation for safety-critical, interactive traffic simulation, extending their utility across the broader landscape of autonomous driving. For supplementary videos, visit our project at https://safe-sim.github.io/.

6/18/2024

cs.RO cs.AI cs.CV cs.LG

NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking

Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, Kashyap Chitta

Benchmarking vision-based driving policies is challenging. On one hand, open-loop evaluation with real data is easy, but these results do not reflect closed-loop performance. On the other, closed-loop evaluation is possible in simulation, but is hard to scale due to its significant computational demands. Further, the simulators available today exhibit a large domain gap to real data. This has resulted in an inability to draw clear conclusions from the rapidly growing body of research on end-to-end autonomous driving. In this paper, we present NAVSIM, a middle ground between these evaluation paradigms, where we use large datasets in combination with a non-reactive simulator to enable large-scale real-world benchmarking. Specifically, we gather simulation-based metrics, such as progress and time to collision, by unrolling bird's eye view abstractions of the test scenes for a short simulation horizon. Our simulation is non-reactive, i.e., the evaluated policy and environment do not influence each other. As we demonstrate empirically, this decoupling allows open-loop metric computation while being better aligned with closed-loop evaluations than traditional displacement errors. NAVSIM enabled a new competition held at CVPR 2024, where 143 teams submitted 463 entries, resulting in several new insights. On a large set of challenging scenarios, we observe that simple methods with moderate compute requirements such as TransFuser can match recent large-scale end-to-end driving architectures such as UniAD. Our modular framework can potentially be extended with new datasets, data curation strategies, and metrics, and will be continually maintained to host future challenges. Our code is available at https://github.com/autonomousvision/navsim.

6/24/2024

cs.CV cs.AI cs.LG cs.RO

Planning with Adaptive World Models for Autonomous Driving

Arun Balajee Vasudevan, Neehar Peri, Jeff Schneider, Deva Ramanan

Motion planning is crucial for safe navigation in complex urban environments. Historically, motion planners (MPs) have been evaluated with procedurally-generated simulators like CARLA. However, such synthetic benchmarks do not capture real-world multi-agent interactions. nuPlan, a recently released MP benchmark, addresses this limitation by augmenting real-world driving logs with closed-loop simulation logic, effectively turning the fixed dataset into a reactive simulator. We analyze the characteristics of nuPlan's recorded logs and find that each city has its own unique driving behaviors, suggesting that robust planners must adapt to different environments. We learn to model such unique behaviors with BehaviorNet, a graph convolutional neural network (GCNN) that predicts reactive agent behaviors using features derived from recently-observed agent histories; intuitively, some aggressive agents may tailgate lead vehicles, while others may not. To model such phenomena, BehaviorNet predicts parameters of an agent's motion controller rather than predicting its spacetime trajectory (as most forecasters do). Finally, we present AdaptiveDriver, a model-predictive control (MPC) based planner that unrolls different world models conditioned on BehaviorNet's predictions. Our extensive experiments demonstrate that AdaptiveDriver achieves state-of-the-art results on the nuPlan closed-loop planning benchmark, reducing test error from 6.4% to 4.6%, even when applied to never-before-seen cities.

6/18/2024

cs.RO cs.LG

Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving

Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, Junchi Yan

In an era marked by the rapid scaling of foundation models, autonomous driving technologies are approaching a transformative threshold where end-to-end autonomous driving (E2E-AD) emerges due to its potential of scaling up in the data-driven manner. However, existing E2E-AD methods are mostly evaluated under the open-loop log-replay manner with L2 errors and collision rate as metrics (e.g., in nuScenes), which could not fully reflect the driving performance of algorithms as recently acknowledged in the community. For those E2E-AD methods evaluated under the closed-loop protocol, they are tested in fixed routes (e.g., Town05Long and Longest6 in CARLA) with the driving score as metrics, which is known for high variance due to the unsmoothed metric function and large randomness in the long route. Besides, these methods usually collect their own data for training, which makes algorithm-level fair comparison infeasible. To fulfill the paramount need of comprehensive, realistic, and fair testing environments for Full Self-Driving (FSD), we present Bench2Drive, the first benchmark for evaluating E2E-AD systems' multiple abilities in a closed-loop manner. Bench2Drive's official training data consists of 2 million fully annotated frames, collected from 10000 short clips uniformly distributed under 44 interactive scenarios (cut-in, overtaking, detour, etc), 23 weathers (sunny, foggy, rainy, etc), and 12 towns (urban, village, university, etc) in CARLA v2. Its evaluation protocol requires E2E-AD models to pass 44 interactive scenarios under different locations and weathers which sums up to 220 routes and thus provides a comprehensive and disentangled assessment about their driving capability under different situations. We implement state-of-the-art E2E-AD models and evaluate them in Bench2Drive, providing insights regarding current status and future directions.

6/12/2024

cs.RO cs.CV