Pit30M: A Benchmark for Global Localization in the Age of Self-Driving Cars

Read original: arXiv:2012.12437 - Published 5/2/2024 by Julieta Martinez, Sasha Doubov, Jack Fan, Ioan Andrei B^arsan, Shenlong Wang, Gell'ert M'attyus, Raquel Urtasun

🔗

Overview

The researchers introduce a new large-scale dataset called Pit30M, which contains over 30 million frames of image and LiDAR data captured under diverse conditions.
They benchmark existing methods for image and LiDAR retrieval and introduce a new convolutional network-based LiDAR retrieval method.
The goal is to understand whether retrieval-based localization approaches are good enough for self-driving vehicles at a city scale.

Plain English Explanation

The researchers are interested in understanding how well retrieval-based localization techniques can work for self-driving cars. Retrieval-based localization means using a database of pre-recorded sensor data (like images or LiDAR scans) to figure out where a vehicle is located. This is an alternative to methods that rely on expensive sensors or detailed 3D maps.

To study this, the researchers created a new dataset called Pit30M. This dataset has over 30 million frames of image and LiDAR data, which is 10 to 100 times larger than what has been used in previous research. The data was captured in diverse conditions, like different seasons, weather, times of day, and traffic levels. The dataset also includes accurate ground truth information about the vehicle's location.

The researchers annotated the dataset with additional information, like historical weather data and semantic segmentation of the images and LiDAR scans. Semantic segmentation can be used as a proxy for measuring occlusion, which is important for localization.

Using this new dataset, the researchers benchmarked several existing methods for retrieving matching images and LiDAR scans. They also introduced a new convolutional neural network-based LiDAR retrieval method that performs well compared to the state of the art.

Overall, this work provides the first large-scale benchmark for sub-meter retrieval-based localization at a city scale, which is an important step towards understanding the potential of these techniques for self-driving vehicles.

Technical Explanation

The researchers introduce a new large-scale dataset called Pit30M, which contains over 30 million frames of synchronized image and LiDAR data captured under diverse conditions. This is significantly larger than the datasets used in previous work, which were typically on the order of hundreds of thousands to a few million frames.

The Pit30M dataset includes data collected across different seasons, weather conditions, times of day, and traffic levels. It also provides accurate ground truth localization information, as well as additional annotations like historical weather data and semantic segmentation of the image and LiDAR data.

Using this dataset, the researchers benchmark multiple existing methods for image and LiDAR retrieval. They also introduce a new convolutional network-based LiDAR retrieval method that is competitive with the state of the art. This serves as an important baseline for future work on retrieval-based localization at a city scale.

The researchers' key insight is that large-scale, diverse datasets like Pit30M are necessary to properly evaluate the performance of retrieval-based localization approaches, which could be an attractive alternative to expensive sensor suites or detailed 3D maps for self-driving vehicles.

Critical Analysis

The researchers acknowledge several limitations and areas for further research in their work. First, while Pit30M is a significant step forward in terms of dataset size and diversity, it still only represents a small portion of a typical city-scale environment. Extending the dataset to larger geographic areas and longer time periods would be valuable.

Additionally, the researchers note that their semantic segmentation annotations, while useful as a proxy for occlusion, do not directly measure the impact of occlusion on retrieval-based localization. More direct occlusion measurements would provide deeper insights.

Finally, the benchmark focuses on retrieval-based approaches, but does not compare these to other localization techniques, such as those that rely on detailed 3D maps or high-precision sensors. A more comprehensive benchmark across different localization approaches would help better understand the strengths and weaknesses of retrieval-based methods.

Overall, the Pit30M dataset and the researchers' benchmarking efforts represent an important step forward, but there is still significant room for further research and improvement in this area.

Conclusion

This work introduces the Pit30M dataset, a large-scale, diverse dataset for evaluating retrieval-based localization approaches for self-driving vehicles. The dataset provides over 30 million frames of synchronized image and LiDAR data, along with accurate ground truth localization information and additional annotations.

By benchmarking existing retrieval methods and introducing a new convolutional network-based LiDAR retrieval approach, the researchers have established an important baseline for studying the potential of retrieval-based localization at a city scale. This is a crucial step towards understanding whether these techniques can be viable alternatives to expensive sensor suites or detailed 3D maps for self-driving cars.

While the Pit30M dataset and the researchers' findings represent significant progress, there are still opportunities for further improvements and extensions, such as expanding the geographic and temporal coverage of the dataset and exploring more direct measurements of occlusion. Nevertheless, this work provides a valuable foundation for future research in this important area of self-driving vehicle technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔗

Pit30M: A Benchmark for Global Localization in the Age of Self-Driving Cars

Julieta Martinez, Sasha Doubov, Jack Fan, Ioan Andrei B^arsan, Shenlong Wang, Gell'ert M'attyus, Raquel Urtasun

We are interested in understanding whether retrieval-based localization approaches are good enough in the context of self-driving vehicles. Towards this goal, we introduce Pit30M, a new image and LiDAR dataset with over 30 million frames, which is 10 to 100 times larger than those used in previous work. Pit30M is captured under diverse conditions (i.e., season, weather, time of the day, traffic), and provides accurate localization ground truth. We also automatically annotate our dataset with historical weather and astronomical data, as well as with image and LiDAR semantic segmentation as a proxy measure for occlusion. We benchmark multiple existing methods for image and LiDAR retrieval and, in the process, introduce a simple, yet effective convolutional network-based LiDAR retrieval method that is competitive with the state of the art. Our work provides, for the first time, a benchmark for sub-metre retrieval-based localization at city scale. The dataset, its Python SDK, as well as more information about the sensors, calibration, and metadata, are available on the project website: https://pit30m.github.io/

5/2/2024

DurLAR: A High-fidelity 128-channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-modal Autonomous Driving Applications

Li Li, Khalid N. Ismail, Hubert P. H. Shum, Toby P. Breckon

We present DurLAR, a high-fidelity 128-channel 3D LiDAR dataset with panoramic ambient (near infrared) and reflectivity imagery, as well as a sample benchmark task using depth estimation for autonomous driving applications. Our driving platform is equipped with a high resolution 128 channel LiDAR, a 2MPix stereo camera, a lux meter and a GNSS/INS system. Ambient and reflectivity images are made available along with the LiDAR point clouds to facilitate multi-modal use of concurrent ambient and reflectivity scene information. Leveraging DurLAR, with a resolution exceeding that of prior benchmarks, we consider the task of monocular depth estimation and use this increased availability of higher resolution, yet sparse ground truth scene depth information to propose a novel joint supervised/self-supervised loss formulation. We compare performance over both our new DurLAR dataset, the established KITTI benchmark and the Cityscapes dataset. Our evaluation shows our joint use supervised and self-supervised loss terms, enabled via the superior ground truth resolution and availability within DurLAR improves the quantitative and qualitative performance of leading contemporary monocular depth estimation approaches (RMSE=3.639, Sq Rel=0.936).

6/17/2024

👨‍🏫

ParisLuco3D: A high-quality target dataset for domain generalization of LiDAR perception

Jules Sanchez, Louis Soum-Fontez, Jean-Emmanuel Deschaud, Francois Goulette

LiDAR is an essential sensor for autonomous driving by collecting precise geometric information regarding a scene. %Exploiting this information for perception is interesting as the amount of available data increases. As the performance of various LiDAR perception tasks has improved, generalizations to new environments and sensors has emerged to test these optimized models in real-world conditions. This paper provides a novel dataset, ParisLuco3D, specifically designed for cross-domain evaluation to make it easier to evaluate the performance utilizing various source datasets. Alongside the dataset, online benchmarks for LiDAR semantic segmentation, LiDAR object detection, and LiDAR tracking are provided to ensure a fair comparison across methods. The ParisLuco3D dataset, evaluation scripts, and links to benchmarks can be found at the following website:https://npm3d.fr/parisluco3d

6/5/2024

DIDLM:A Comprehensive Multi-Sensor Dataset with Infrared Cameras, Depth Cameras, LiDAR, and 4D Millimeter-Wave Radar in Challenging Scenarios for 3D Mapping

WeiSheng Gong, Chen He, KaiJie Su, QingYong Li

This study presents a comprehensive multi-sensor dataset designed for 3D mapping in challenging indoor and outdoor environments. The dataset comprises data from infrared cameras, depth cameras, LiDAR, and 4D millimeter-wave radar, facilitating exploration of advanced perception and mapping techniques. Integration of diverse sensor data enhances perceptual capabilities in extreme conditions such as rain, snow, and uneven road surfaces. The dataset also includes interactive robot data at different speeds indoors and outdoors, providing a realistic background environment. Slam comparisons between similar routes are conducted, analyzing the influence of different complex scenes on various sensors. Various SLAM algorithms are employed to process the dataset, revealing performance differences among algorithms in different scenarios. In summary, this dataset addresses the problem of data scarcity in special environments, fostering the development of perception and mapping algorithms for extreme conditions. Leveraging multi-sensor data including infrared, depth cameras, LiDAR, 4D millimeter-wave radar, and robot interactions, the dataset advances intelligent mapping and perception capabilities.Our dataset is available at https://github.com/GongWeiSheng/DIDLM.

4/16/2024