Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence Theory

Read original: arXiv:2405.10575 - Published 5/20/2024 by Jonas Kalble, Sascha Wirges, Maxim Tatarchenko, Eddy Ilg

🏋️

Overview

This paper explores the importance of accurate scene geometry representation for automated driving systems.
Current approaches use camera images to predict occupancy maps that represent the surrounding geometry.
The authors identify issues with the methods used to convert LiDAR scans into occupancy grid maps for training these systems.
They present a novel approach using evidence theory that yields more accurate occupancy map reconstructions.

Plain English Explanation

Self-driving cars need to understand the 3D geometry of their surroundings to navigate safely. Current approaches use camera images to predict occupancy maps - digital representations of the empty and occupied space in the environment. These occupancy maps are key inputs for predicting future occupancy and planning safe paths.

To train these occupancy prediction models, the researchers need accurate ground truth data. This data is typically generated by converting 3D point cloud scans from LiDAR sensors into occupancy grid maps. However, the authors found that the standard techniques for this conversion result in very low quality occupancy maps.

To address this, the researchers developed a new approach using evidence theory that can reconstruct occupancy maps more accurately from LiDAR data. Their method outperforms existing techniques both qualitatively and quantitatively, and also provides meaningful uncertainty estimates. This improved occupancy data can then be used to train better occupancy prediction models for self-driving cars.

Technical Explanation

The key technical contribution of this paper is a novel approach for converting LiDAR point cloud data into high-quality occupancy grid maps. Existing methods struggle to accurately reconstruct the underlying geometry, resulting in noisy and distorted occupancy representations.

The authors propose using Dempster-Shafer theory of evidence to fuse the LiDAR measurements and reason about the occupancy state of each grid cell. This allows them to handle the inherent uncertainty in the LiDAR data more effectively than previous techniques.

Qualitative and quantitative evaluations on the nuScenes and Waymo datasets show that this evidence-based occupancy mapping approach outperforms standard methods by a substantial margin. When converting the occupancy maps back to depth estimates and comparing them to the original LiDAR, the authors report a 30-52% improvement in mean absolute error on nuScenes and 53% on Waymo.

The authors further demonstrate the practical value of their improved occupancy maps by using them to train a state-of-the-art occupancy prediction model. This leads to a 25% reduction in mean absolute error compared to using the standard occupancy ground truth.

Critical Analysis

The paper presents a compelling technical solution to a critical problem in autonomous driving perception. However, a few potential limitations and areas for further research are worth noting:

The evidence theory-based occupancy mapping approach introduces additional computational complexity compared to simpler techniques. The authors do not provide performance benchmarks, so the real-world feasibility for deployment in self-driving cars is unclear.
The evaluation is limited to just two datasets - nuScenes and Waymo. Further testing on a broader range of real-world driving scenarios would help validate the generalizability of the approach.
The paper does not address how the occupancy map uncertainty estimates produced by the method could be integrated into and benefit downstream planning and decision-making components of a self-driving system.
While the results demonstrate substantial improvements in occupancy map accuracy, it is unclear how this translates to tangible safety and performance gains for the overall self-driving system. More end-to-end testing would help quantify the real-world impact.

Overall, this research represents an important advancement in the critical area of scene geometry representation for automated driving. The novel evidence-based occupancy mapping approach is a promising step forward, though further development and validation is needed to fully realize its potential.

Conclusion

This paper tackles a fundamental challenge in autonomous driving - accurately representing the 3D geometry of a vehicle's surroundings. The authors identify shortcomings in how LiDAR data is currently converted into occupancy maps, which serve as crucial inputs for self-driving systems.

By applying Dempster-Shafer evidence theory, the researchers developed a new occupancy mapping technique that produces higher-quality reconstructions of the environment. These improved occupancy maps can then be used to train better occupancy prediction models - a key component for enabling robust, safe, and reliable autonomous driving.

While further validation and optimization are needed, this work represents an important advancement in the field of autonomous perception. Accurately modeling the geometric structure of the environment is a critical prerequisite for self-driving cars to navigate the real world effectively and safely.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏋️

Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence Theory

Jonas Kalble, Sascha Wirges, Maxim Tatarchenko, Eddy Ilg

Automated driving fundamentally requires knowledge about the surrounding geometry of the scene. Modern approaches use only captured images to predict occupancy maps that represent the geometry. Training these approaches requires accurate data that may be acquired with the help of LiDAR scanners. We show that the techniques used for current benchmarks and training datasets to convert LiDAR scans into occupancy grid maps yield very low quality, and subsequently present a novel approach using evidence theory that yields more accurate reconstructions. We demonstrate that these are superior by a large margin, both qualitatively and quantitatively, and that we additionally obtain meaningful uncertainty estimates. When converting the occupancy maps back to depth estimates and comparing them with the raw LiDAR measurements, our method yields a MAE improvement of 30% to 52% on nuScenes and 53% on Waymo over other occupancy ground-truth data. Finally, we use the improved occupancy maps to train a state-of-the-art occupancy prediction method and demonstrate that it improves the MAE by 25% on nuScenes.

5/20/2024

AdaOcc: Adaptive-Resolution Occupancy Prediction

Chao Chen, Ruoyu Wang, Yuliang Guo, Cheng Zhao, Xinyu Huang, Chen Feng, Liu Ren

Autonomous driving in complex urban scenarios requires 3D perception to be both comprehensive and precise. Traditional 3D perception methods focus on object detection, resulting in sparse representations that lack environmental detail. Recent approaches estimate 3D occupancy around vehicles for a more comprehensive scene representation. However, dense 3D occupancy prediction increases computational demands, challenging the balance between efficiency and resolution. High-resolution occupancy grids offer accuracy but demand substantial computational resources, while low-resolution grids are efficient but lack detail. To address this dilemma, we introduce AdaOcc, a novel adaptive-resolution, multi-modal prediction approach. Our method integrates object-centric 3D reconstruction and holistic occupancy prediction within a single framework, performing highly detailed and precise 3D reconstruction only in regions of interest (ROIs). These high-detailed 3D surfaces are represented in point clouds, thus their precision is not constrained by the predefined grid resolution of the occupancy map. We conducted comprehensive experiments on the nuScenes dataset, demonstrating significant improvements over existing methods. In close-range scenarios, we surpass previous baselines by over 13% in IOU, and over 40% in Hausdorff distance. In summary, AdaOcc offers a more versatile and effective framework for delivering accurate 3D semantic occupancy prediction across diverse driving scenarios.

8/27/2024

OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments

Chubin Zhang, Juncheng Yan, Yi Wei, Jiaxin Li, Li Liu, Yansong Tang, Yueqi Duan, Jiwen Lu

Occupancy prediction reconstructs 3D structures of surrounding environments. It provides detailed information for autonomous driving planning and navigation. However, most existing methods heavily rely on the LiDAR point clouds to generate occupancy ground truth, which is not available in the vision-based system. In this paper, we propose an OccNeRF method for training occupancy networks without 3D supervision. Different from previous works which consider a bounded scene, we parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range. The neural rendering is adopted to convert occupancy fields to multi-camera depth maps, supervised by multi-frame photometric consistency. Moreover, for semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model. Extensive experiments for both self-supervised depth estimation and 3D occupancy prediction tasks on nuScenes and SemanticKITTI datasets demonstrate the effectiveness of our method.

8/22/2024

UnO: Unsupervised Occupancy Fields for Perception and Forecasting

Ben Agro, Quinlan Sykora, Sergio Casas, Thomas Gilles, Raquel Urtasun

Perceiving the world and forecasting its future state is a critical task for self-driving. Supervised approaches leverage annotated object labels to learn a model of the world -- traditionally with object detections and trajectory predictions, or temporal bird's-eye-view (BEV) occupancy fields. However, these annotations are expensive and typically limited to a set of predefined categories that do not cover everything we might encounter on the road. Instead, we learn to perceive and forecast a continuous 4D (spatio-temporal) occupancy field with self-supervision from LiDAR data. This unsupervised world model can be easily and effectively transferred to downstream tasks. We tackle point cloud forecasting by adding a lightweight learned renderer and achieve state-of-the-art performance in Argoverse 2, nuScenes, and KITTI. To further showcase its transferability, we fine-tune our model for BEV semantic occupancy forecasting and show that it outperforms the fully supervised state-of-the-art, especially when labeled data is scarce. Finally, when compared to prior state-of-the-art on spatio-temporal geometric occupancy prediction, our 4D world model achieves a much higher recall of objects from classes relevant to self-driving.

6/14/2024