A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective

2405.05173

Published 5/21/2024 by Huaiyuan Xu, Junliang Chen, Shiyu Meng, Yi Wang, Lap-Pui Chau

🖼️

Abstract

3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird's-eye view (BEV) perception, 3D occupancy perception has the nature of multi-source input and the necessity for information fusion. However, the difference is that it captures vertical structures that are ignored by 2D BEV. In this survey, we review the most recent works on 3D occupancy perception, and provide in-depth analyses of methodologies with various input modalities. Specifically, we summarize general network pipelines, highlight information fusion techniques, and discuss effective network training. We evaluate and analyze the occupancy perception performance of the state-of-the-art on the most popular datasets. Furthermore, challenges and future research directions are discussed. We hope this paper will inspire the community and encourage more research work on 3D occupancy perception. A comprehensive list of studies in this survey is publicly available in an active repository that continuously collects the latest work: https://github.com/HuaiyuanXu/3D-Occupancy-Perception.

Create account to get full access

Overview

3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles
It is a comprehensive perception capability that is emerging as a trend in autonomous driving perception systems
Unlike traditional bird's-eye view (BEV) perception, 3D occupancy perception captures vertical structures that are ignored by 2D BEV

Plain English Explanation

3D occupancy perception is a technology used in autonomous vehicles to create a detailed understanding of the 3D environment around the vehicle. Unlike traditional 2D bird's-eye view perception, 3D occupancy perception can detect and map vertical structures like buildings and bridges that may be important for safe navigation. This comprehensive 3D understanding is becoming increasingly important in the development of autonomous driving systems, as it allows the vehicle to better perceive and navigate its surroundings. The technology relies on fusing data from multiple sensors to create a complete 3D model of the environment.

Technical Explanation

This paper provides a survey of the latest research on 3D occupancy perception for autonomous vehicles. Like traditional bird's-eye view perception, 3D occupancy perception involves processing data from multiple sensors and fusing that information to create a detailed 3D map of the environment. However, the key difference is that 3D occupancy perception is able to capture vertical structures that are missed by 2D bird's-eye view approaches.

The paper summarizes common network architectures and information fusion techniques used in 3D occupancy perception systems. It also discusses effective training methods for these models. The performance of state-of-the-art 3D occupancy perception systems is evaluated on popular datasets, and the paper outlines important challenges and future research directions in this area.

Critical Analysis

The paper provides a comprehensive overview of the current state of 3D occupancy perception research, highlighting the key technical approaches and the benefits of this technology for autonomous driving. However, the discussion of limitations and challenges is relatively brief. For example, the paper does not delve into potential issues around sensor reliability, data sparsity, or computational efficiency, which could be important practical concerns for deploying these systems in the real world.

Additionally, while the paper cites [several relevant studies](https://aimodels.fyi/papers/arxiv/collaborative-semantic-occupancy-prediction-hybrid-feature-fusion, https://aimodels.fyi/papers/arxiv/real-time-3d-semantic-occupancy-prediction-autonomous, https://aimodels.fyi/papers/arxiv/occfusion-straightforward-effective-multi-sensor-fusion-framework, https://aimodels.fyi/papers/arxiv/unified-spatio-temporal-tri-perspective-view-representation), it would be helpful to have a more in-depth critical analysis of the strengths and weaknesses of these specific approaches. This could provide readers with a clearer understanding of the current state-of-the-art and help guide future research directions.

Conclusion

3D occupancy perception is an important emerging technology for autonomous driving that goes beyond traditional 2D bird's-eye view perception by capturing critical vertical structures in the environment. This survey paper provides a comprehensive overview of the latest research in this area, highlighting common network architectures, data fusion techniques, and performance on benchmark datasets. While the paper outlines some of the key challenges, a more detailed critical analysis could further strengthen the insights provided. Overall, this technology represents a significant step forward in the development of autonomous vehicles that can safely navigate complex 3D environments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Vision-based 3D occupancy prediction in autonomous driving: a review and outlook

Yanan Zhang, Jinqing Zhang, Zengran Wang, Junhao Xu, Di Huang

In recent years, autonomous driving has garnered escalating attention for its potential to relieve drivers' burdens and improve driving safety. Vision-based 3D occupancy prediction, which predicts the spatial occupancy status and semantics of 3D voxel grids around the autonomous vehicle from image inputs, is an emerging perception task suitable for cost-effective perception system of autonomous driving. Although numerous studies have demonstrated the greater advantages of 3D occupancy prediction over object-centric perception tasks, there is still a lack of a dedicated review focusing on this rapidly developing field. In this paper, we first introduce the background of vision-based 3D occupancy prediction and discuss the challenges in this task. Secondly, we conduct a comprehensive survey of the progress in vision-based 3D occupancy prediction from three aspects: feature enhancement, deployment friendliness and label efficiency, and provide an in-depth analysis of the potentials and challenges of each category of methods. Finally, we present a summary of prevailing research trends and propose some inspiring future outlooks. To provide a valuable reference for researchers, a regularly updated collection of related papers, datasets, and codes is organized at https://github.com/zya3d/Awesome-3D-Occupancy-Prediction.

5/7/2024

cs.CV

🔮

Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles

Rui Song, Chenwei Liang, Hu Cao, Zhiran Yan, Walter Zimmer, Markus Gross, Andreas Festag, Alois Knoll

Collaborative perception in automated vehicles leverages the exchange of information between agents, aiming to elevate perception results. Previous camera-based collaborative 3D perception methods typically employ 3D bounding boxes or bird's eye views as representations of the environment. However, these approaches fall short in offering a comprehensive 3D environmental prediction. To bridge this gap, we introduce the first method for collaborative 3D semantic occupancy prediction. Particularly, it improves local 3D semantic occupancy predictions by hybrid fusion of (i) semantic and occupancy task features, and (ii) compressed orthogonal attention features shared between vehicles. Additionally, due to the lack of a collaborative perception dataset designed for semantic occupancy prediction, we augment a current collaborative perception dataset to include 3D collaborative semantic occupancy labels for a more robust evaluation. The experimental findings highlight that: (i) our collaborative semantic occupancy predictions excel above the results from single vehicles by over 30%, and (ii) models anchored on semantic occupancy outpace state-of-the-art collaborative 3D detection techniques in subsequent perception applications, showcasing enhanced accuracy and enriched semantic-awareness in road environments.

4/26/2024

cs.CV

🧪

Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review

Yining Shi, Kun Jiang, Jiusi Li, Zelin Qian, Junze Wen, Mengmeng Yang, Ke Wang, Diange Yang

Grid-centric perception is a crucial field for mobile robot perception and navigation. Nonetheless, grid-centric perception is less prevalent than object-centric perception as autonomous vehicles need to accurately perceive highly dynamic, large-scale traffic scenarios and the complexity and computational costs of grid-centric perception are high. In recent years, the rapid development of deep learning techniques and hardware provides fresh insights into the evolution of grid-centric perception. The fundamental difference between grid-centric and object-centric pipeline lies in that grid-centric perception follows a geometry-first paradigm which is more robust to the open-world driving scenarios with endless long-tailed semantically-unknown obstacles. Recent researches demonstrate the great advantages of grid-centric perception, such as comprehensive fine-grained environmental representation, greater robustness to occlusion and irregular shaped objects, better ground estimation, and safer planning policies. There is also a growing trend that the capacity of occupancy networks are greatly expanded to 4D scene perception and prediction and latest techniques are highly related to new research topics such as 4D occupancy forecasting, generative AI and world models in the field of autonomous driving. Given the lack of current surveys for this rapidly expanding field, we present a hierarchically-structured review of grid-centric perception for autonomous vehicles. We organize previous and current knowledge of occupancy grid techniques along the main vein from 2D BEV grids to 3D occupancy to 4D occupancy forecasting. We additionally summarize label-efficient occupancy learning and the role of grid-centric perception in driving systems. Lastly, we present a summary of the current research trend and provide future outlooks.

6/11/2024

cs.CV cs.RO

UnO: Unsupervised Occupancy Fields for Perception and Forecasting

Ben Agro, Quinlan Sykora, Sergio Casas, Thomas Gilles, Raquel Urtasun

Perceiving the world and forecasting its future state is a critical task for self-driving. Supervised approaches leverage annotated object labels to learn a model of the world -- traditionally with object detections and trajectory predictions, or temporal bird's-eye-view (BEV) occupancy fields. However, these annotations are expensive and typically limited to a set of predefined categories that do not cover everything we might encounter on the road. Instead, we learn to perceive and forecast a continuous 4D (spatio-temporal) occupancy field with self-supervision from LiDAR data. This unsupervised world model can be easily and effectively transferred to downstream tasks. We tackle point cloud forecasting by adding a lightweight learned renderer and achieve state-of-the-art performance in Argoverse 2, nuScenes, and KITTI. To further showcase its transferability, we fine-tune our model for BEV semantic occupancy forecasting and show that it outperforms the fully supervised state-of-the-art, especially when labeled data is scarce. Finally, when compared to prior state-of-the-art on spatio-temporal geometric occupancy prediction, our 4D world model achieves a much higher recall of objects from classes relevant to self-driving.

6/14/2024

cs.CV cs.AI cs.LG cs.RO