A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality, and a Future Outlook

2401.01454

Published 4/24/2024 by Mingyu Liu, Ekim Yurtsever, Jonathan Fossaert, Xingcheng Zhou, Walter Zimmer, Yuning Cui, Bare Luka Zagar, Alois C. Knoll

cs.CV

A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality, and a Future Outlook

Abstract

Autonomous driving has rapidly developed and shown promising performance due to recent advances in hardware and deep learning techniques. High-quality datasets are fundamental for developing reliable autonomous driving algorithms. Previous dataset surveys either focused on a limited number or lacked detailed investigation of dataset characteristics. To this end, we present an exhaustive study of 265 autonomous driving datasets from multiple perspectives, including sensor modalities, data size, tasks, and contextual conditions. We introduce a novel metric to evaluate the impact of datasets, which can also be a guide for creating new datasets. Besides, we analyze the annotation processes, existing labeling tools, and the annotation quality of datasets, showing the importance of establishing a standard annotation pipeline. On the other hand, we thoroughly analyze the impact of geographical and adversarial environmental conditions on the performance of autonomous driving systems. Moreover, we exhibit the data distribution of several vital datasets and discuss their pros and cons accordingly. Finally, we discuss the current challenges and the development trend of the future autonomous driving datasets.

Create account to get full access

Overview

This paper provides a comprehensive survey of autonomous driving datasets, covering aspects such as data statistics, annotation, and future outlook.
The survey examines various popular datasets used in autonomous driving research, including Collaborative Perception Datasets for Autonomous Driving, End-to-End Autonomous Driving: Challenges and Frontiers, Towards Autonomous Driving on Small-Scale Cars: A Survey, A Survey on the Robustness of Trajectory Prediction for Autonomous Vehicles, and AI Competitions and Benchmarks for Dataset Development.
The paper aims to provide researchers and practitioners with a comprehensive understanding of the current state of autonomous driving datasets and their potential future directions.

Plain English Explanation

This research paper looks at the different datasets that are used in the field of autonomous driving. Autonomous driving is the technology that allows cars to drive themselves without a human driver. The paper examines the statistics and details of these datasets, how the data is labeled and annotated, and what the future might hold for this type of data.

The researchers surveyed several popular datasets that are commonly used in autonomous driving research. These include datasets that focus on collaborative perception, where multiple vehicles share information to understand their environment. Other datasets look at the challenges and frontiers of building end-to-end autonomous driving systems, which means systems that can handle the entire driving process from start to finish without human intervention.

The paper also examines datasets that use small-scale cars, rather than full-size vehicles, to study autonomous driving. Additionally, it reviews datasets that look at how accurately autonomous vehicles can predict the trajectories, or paths, of other vehicles and obstacles around them. Finally, the paper discusses how AI competitions and benchmarks are being used to drive the development of new and improved autonomous driving datasets.

The goal of this comprehensive survey is to give researchers and engineers working on autonomous driving a better understanding of the current state of the datasets available to them, as well as where this field might be headed in the future. By having a clear picture of the data landscape, they can make more informed decisions about which datasets to use and how to advance the state of the art in autonomous driving technology.

Technical Explanation

The paper presents a thorough survey of autonomous driving datasets, covering aspects such as data statistics, annotation, and future outlook. The researchers examined a variety of popular datasets used in autonomous driving research, including:

Collaborative Perception Datasets for Autonomous Driving: These datasets focus on the sharing of information between multiple vehicles to improve their understanding of the driving environment.
End-to-End Autonomous Driving: Challenges and Frontiers: These datasets address the complex challenges of developing autonomous driving systems that can handle the entire driving process without human intervention.
Towards Autonomous Driving on Small-Scale Cars: A Survey: These datasets use small-scale vehicles rather than full-size cars to study autonomous driving, which can be more cost-effective and efficient for certain research purposes.
A Survey on the Robustness of Trajectory Prediction for Autonomous Vehicles: These datasets focus on the ability of autonomous vehicles to accurately predict the trajectories, or paths, of other vehicles and obstacles in the driving environment.
AI Competitions and Benchmarks for Dataset Development: This research examines how AI competitions and benchmarks are being used to drive the development of new and improved autonomous driving datasets.

By providing a comprehensive overview of these datasets, the paper aims to equip researchers and practitioners with a deeper understanding of the current state of autonomous driving data and its potential future directions. This knowledge can help inform their decision-making and guide their efforts in advancing the field of autonomous driving.

Critical Analysis

The paper offers a thorough and well-structured survey of autonomous driving datasets, providing valuable insights for researchers and practitioners in the field. However, it is important to note a few potential limitations and areas for further research:

The paper focuses primarily on datasets that are publicly available, which may not capture the full landscape of autonomous driving data, as some companies and organizations may have proprietary datasets that are not included in the survey.
The evaluation of the datasets is limited to high-level characteristics, such as data statistics and annotation methods. A more in-depth analysis of the quality, diversity, and suitability of the datasets for specific research tasks could provide even deeper insights.
The paper does not delve into the potential biases or limitations inherent in the datasets, which could influence the performance and generalization of autonomous driving models trained on them. Addressing these biases could be an important area for future research.
The survey does not explore the ethical and privacy implications of the data collection and usage in autonomous driving, which is a crucial consideration as this technology becomes more widespread.

Despite these potential areas for improvement, the paper offers a comprehensive and valuable resource for researchers and practitioners working in the field of autonomous driving. By providing a clear overview of the current dataset landscape, the paper sets the stage for further discussions and advancements in this rapidly evolving field.

Conclusion

This survey paper provides a thorough examination of the current state of autonomous driving datasets, covering key aspects such as data statistics, annotation, and future outlook. By surveying a range of popular datasets used in autonomous driving research, the paper offers researchers and practitioners a comprehensive understanding of the available resources and their potential applications.

The findings of this study can serve as a valuable reference point for those working to advance the state of autonomous driving technology. By understanding the strengths, limitations, and future directions of these datasets, researchers can make more informed decisions about the data they use and identify areas for further exploration and improvement. Additionally, the insights gained from this survey can inform the development of new datasets and benchmarks that address the evolving needs of the autonomous driving community.

Overall, this paper represents a significant contribution to the growing body of knowledge in the field of autonomous driving, providing a solid foundation for ongoing research and development efforts. As the technology continues to progress, the insights and recommendations presented in this survey will become increasingly important in guiding the future of this transformative field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Collaborative Perception Datasets in Autonomous Driving: A Survey

Melih Yazgan, Mythra Varun Akkanapragada, J. Marius Zoellner

This survey offers a comprehensive examination of collaborative perception datasets in the context of Vehicle-to-Infrastructure (V2I), Vehicle-to-Vehicle (V2V), and Vehicle-to-Everything (V2X). It highlights the latest developments in large-scale benchmarks that accelerate advancements in perception tasks for autonomous vehicles. The paper systematically analyzes a variety of datasets, comparing them based on aspects such as diversity, sensor setup, quality, public availability, and their applicability to downstream tasks. It also highlights the key challenges such as domain shift, sensor setup limitations, and gaps in dataset diversity and availability. The importance of addressing privacy and security concerns in the development of datasets is emphasized, regarding data sharing and dataset creation. The conclusion underscores the necessity for comprehensive, globally accessible datasets and collaborative efforts from both technological and research communities to overcome these challenges and fully harness the potential of autonomous driving.

4/23/2024

cs.CV cs.RO

🚀

Towards Scenario- and Capability-Driven Dataset Development and Evaluation: An Approach in the Context of Mapless Automated Driving

Felix Grun, Marcus Nolte, Markus Maurer

The foundational role of datasets in defining the capabilities of deep learning models has led to their rapid proliferation. At the same time, published research focusing on the process of dataset development for environment perception in automated driving has been scarce, thereby reducing the applicability of openly available datasets and impeding the development of effective environment perception systems. Sensor-based, mapless automated driving is one of the contexts where this limitation is evident. While leveraging real-time sensor data, instead of pre-defined HD maps promises enhanced adaptability and safety by effectively navigating unexpected environmental changes, it also increases the demands on the scope and complexity of the information provided by the perception system. To address these challenges, we propose a scenario- and capability-based approach to dataset development. Grounded in the principles of ISO 21448 (safety of the intended functionality, SOTIF), extended by ISO/TR 4804, our approach facilitates the structured derivation of dataset requirements. This not only aids in the development of meaningful new datasets but also enables the effective comparison of existing ones. Applying this methodology to a broad range of existing lane detection datasets, we identify significant limitations in current datasets, particularly in terms of real-world applicability, a lack of labeling of critical features, and an absence of comprehensive information for complex driving maneuvers.

5/1/2024

cs.CV

Collective Perception Datasets for Autonomous Driving: A Comprehensive Review

Sven Teufel, Jorg Gamerdinger, Jan-Patrick Kirchner, Georg Volk, Oliver Bringmann

To ensure safe operation of autonomous vehicles in complex urban environments, complete perception of the environment is necessary. However, due to environmental conditions, sensor limitations, and occlusions, this is not always possible from a single point of view. To address this issue, collective perception is an effective method. Realistic and large-scale datasets are essential for training and evaluating collective perception methods. This paper provides the first comprehensive technical review of collective perception datasets in the context of autonomous driving. The survey analyzes existing V2V and V2X datasets, categorizing them based on different criteria such as sensor modalities, environmental conditions, and scenario variety. The focus is on their applicability for the development of connected automated vehicles. This study aims to identify the key criteria of all datasets and to present their strengths, weaknesses, and anomalies. Finally, this survey concludes by making recommendations regarding which dataset is most suitable for collective 3D object detection, tracking, and semantic segmentation.

5/28/2024

cs.CV

End-to-end Autonomous Driving: Challenges and Frontiers

Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li

The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework. we maintain an active repository that contains up-to-date literature and open-source projects at https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving.

4/23/2024

cs.RO cs.AI cs.CV cs.LG