Coverage Metrics for a Scenario Database for the Scenario-Based Assessment of Automated Driving Systems

Read original: arXiv:2409.01139 - Published 9/4/2024 by Erwin de Gelder, Maren Buermann, Olaf Op den Camp

🗣️

Overview

This paper discusses coverage metrics for a scenario database used in the scenario-based assessment of automated driving systems.
The researchers propose several metrics to quantify the diversity and coverage of scenarios in the database.
The goal is to ensure the scenario database comprehensively tests the capabilities of automated driving systems.

Plain English Explanation

The paper focuses on evaluating a database of driving scenarios used to test self-driving car systems. These scenarios are designed to represent the wide range of situations a self-driving car might encounter in the real world. The researchers developed several metrics to measure how well the scenario database covers different driving situations.

For example, one metric looks at how diverse the scenarios are - does the database include a wide variety of road types, weather conditions, traffic patterns, and other factors? Another metric examines how comprehensively the scenarios test the self-driving system's capabilities - does it cover edge cases and rare events as well as common driving situations?

The goal is to ensure the scenario database thoroughly challenges the self-driving system and identifies any weaknesses or blind spots before it is deployed on public roads. Comprehensive testing in simulation can help catch issues that might not be discovered until too late in real-world trials, improving the safety and reliability of self-driving cars.

Technical Explanation

The paper introduces several coverage metrics to quantify the diversity and completeness of a scenario database used for the assessment of automated driving systems:

Scenario diversity: This metric measures the variety of factors (e.g. road types, weather, traffic) represented in the scenarios. It aims to ensure the database covers a wide range of driving contexts.
Scenario coverage: This metric evaluates how comprehensively the scenarios test the capabilities of the automated driving system. It looks at whether the database includes edge cases, rare events, and other challenging situations.
Scenario frequency: This metric tracks how often each scenario occurs in the database. It helps identify potential over- or under-representation of certain driving situations.

The researchers applied these metrics to an example scenario database and found that while it had good diversity, there were gaps in coverage of certain edge cases. They discuss how these metrics can guide the ongoing curation and expansion of the database to make it more comprehensive.

Critical Analysis

The coverage metrics proposed in this paper provide a systematic way to evaluate the suitability of a scenario database for assessing automated driving systems. However, the authors acknowledge several limitations:

The metrics focus on the statistical properties of the scenario database, but do not directly measure the real-world relevance or criticality of the scenarios.
Defining the appropriate thresholds or target values for each metric is subjective and context-dependent.
The database may need to be periodically re-evaluated and expanded as automated driving technology and real-world driving patterns evolve over time.

Additionally, this paper does not address how the scenario database should be used in conjunction with other testing methods, such as closed-course trials or on-road testing. A more holistic testing approach that integrates multiple assessment techniques may be necessary to truly validate the safety and performance of automated driving systems.

Conclusion

This paper presents a set of coverage metrics to evaluate the scenario databases used in the scenario-based assessment of automated driving systems. By quantifying the diversity, coverage, and frequency of the scenarios, these metrics can help ensure the databases comprehensively test the capabilities of self-driving car technology.

As automated driving systems become more advanced and complex, thorough simulation-based testing will be crucial to identify and address potential safety issues before real-world deployment. The proposed metrics provide a systematic framework for continually improving and expanding these scenario databases to keep pace with the evolving needs of the industry.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🗣️

Coverage Metrics for a Scenario Database for the Scenario-Based Assessment of Automated Driving Systems

Erwin de Gelder, Maren Buermann, Olaf Op den Camp

Automated Driving Systems (ADSs) have the potential to make mobility services available and safe for all. A multi-pillar Safety Assessment Framework (SAF) has been proposed for the type-approval process of ADSs. The SAF requires that the test scenarios for the ADS adequately covers the Operational Design Domain (ODD) of the ADS. A common method for generating test scenarios involves basing them on scenarios identified and characterized from driving data. This work addresses two questions when collecting scenarios from driving data. First, do the collected scenarios cover all relevant aspects of the ADS' ODD? Second, do the collected scenarios cover all relevant aspects that are in the driving data, such that no potentially important situations are missed? This work proposes coverage metrics that provide a quantitative answer to these questions. The proposed coverage metrics are illustrated by means of an experiment in which over 200000 scenarios from 10 different scenario categories are collected from the HighD data set. The experiment demonstrates that a coverage of 100 % can be achieved under certain conditions, and it also identifies which data and scenarios could be added to enhance the coverage outcomes in case a 100 % coverage has not been achieved. Whereas this work presents metrics for the quantification of the coverage of driving data and the identified scenarios, this paper concludes with future research directions, including the quantification of the completeness of driving data and the identified scenarios.

9/4/2024

Towards a Completeness Argumentation for Scenario Concepts

Christoph Glasmacher, Hendrik Weber, Lutz Eckstein

Scenario-based testing has become a promising approach to overcome the complexity of real-world traffic for safety assurance of automated vehicles. Within scenario-based testing, a system under test is confronted with a set of predefined scenarios. This set shall ensure more efficient testing of an automated vehicle operating in an open context compared to real-world testing. However, the question arises if a scenario catalog can cover the open context sufficiently to allow an argumentation for sufficiently safe driving functions and how this can be proven. Within this paper, a methodology is proposed to argue a sufficient completeness of a scenario concept using a goal structured notation. Thereby, the distinction between completeness and coverage is discussed. For both, methods are proposed for a streamlined argumentation and regarding evidence. These methods are applied to a scenario concept and the inD dataset to prove the usability.

4/3/2024

📊

Benchmarks for Retrospective Automated Driving System Crash Rate Analysis Using Police-Reported Crash Data

John M. Scanlon, Kristofer D. Kusano, Laura A. Fraade-Blanar, Timothy L. McMurry, Yin-Hsiu Chen, Trent Victor

With fully automated driving systems (ADS; SAE level 4) ride-hailing services expanding in the US, we are now approaching an inflection point, where the process of retrospectively evaluating ADS safety impact can start to yield statistically credible conclusions. An ADS safety impact measurement requires a comparison to a benchmark crash rate. This study aims to address, update, and extend the existing literature by leveraging police-reported crashes to generate human crash rates for multiple geographic areas with current ADS deployments. All of the data leveraged is publicly accessible, and the benchmark determination methodology is intended to be repeatable and transparent. Generating a benchmark that is comparable to ADS crash data is associated with certain challenges, including data selection, handling underreporting and reporting thresholds, identifying the population of drivers and vehicles to compare against, choosing an appropriate severity level to assess, and matching crash and mileage exposure data. Consequently, we identify essential steps when generating benchmarks, and present our analyses amongst a backdrop of existing ADS benchmark literature. One analysis presented is the usage of established underreporting correction methodology to publicly available human driver police-reported data to improve comparability to publicly available ADS crash data. We also identify important dependencies in controlling for geographic region, road type, and vehicle type, and show how failing to control for these features can bias results. This body of work aims to contribute to the ability of the community - researchers, regulators, industry, and experts - to reach consensus on how to estimate accurate benchmarks.

7/25/2024

Dance of the ADS: Orchestrating Failures through Historically-Informed Scenario Fuzzing

Tong Wang, Taotao Gu, Huan Deng, Hu Li, Xiaohui Kuang, Gang Zhao

As autonomous driving systems (ADS) advance towards higher levels of autonomy, orchestrating their safety verification becomes increasingly intricate. This paper unveils ScenarioFuzz, a pioneering scenario-based fuzz testing methodology. Designed like a choreographer who understands the past performances, it uncovers vulnerabilities in ADS without the crutch of predefined scenarios. Leveraging map road networks, such as OPENDRIVE, we extract essential data to form a foundational scenario seed corpus. This corpus, enriched with pertinent information, provides the necessary boundaries for fuzz testing in the absence of starting scenarios. Our approach integrates specialized mutators and mutation techniques, combined with a graph neural network model, to predict and filter out high-risk scenario seeds, optimizing the fuzzing process using historical test data. Compared to other methods, our approach reduces the time cost by an average of 60.3%, while the number of error scenarios discovered per unit of time increases by 103%. Furthermore, we propose a self-supervised collision trajectory clustering method, which aids in identifying and summarizing 54 high-risk scenario categories prone to inducing ADS faults. Our experiments have successfully uncovered 58 bugs across six tested systems, emphasizing the critical safety concerns of ADS.

7/8/2024