A Survey of Deep Learning Based Radar and Vision Fusion for 3D Object Detection in Autonomous Driving

2406.00714

Published 6/4/2024 by Di Wu, Feng Yang, Benlian Xu, Pan Liao, Bo Liu

A Survey of Deep Learning Based Radar and Vision Fusion for 3D Object Detection in Autonomous Driving

Abstract

With the rapid advancement of autonomous driving technology, there is a growing need for enhanced safety and efficiency in the automatic environmental perception of vehicles during their operation. In modern vehicle setups, cameras and mmWave radar (radar), being the most extensively employed sensors, demonstrate complementary characteristics, inherently rendering them conducive to fusion and facilitating the achievement of both robust performance and cost-effectiveness. This paper focuses on a comprehensive survey of radar-vision (RV) fusion based on deep learning methods for 3D object detection in autonomous driving. We offer a comprehensive overview of each RV fusion category, specifically those employing region of interest (ROI) fusion and end-to-end fusion strategies. As the most promising fusion strategy at present, we provide a deeper classification of end-to-end fusion methods, including those 3D bounding box prediction based and BEV based approaches. Moreover, aligning with recent advancements, we delineate the latest information on 4D radar and its cutting-edge applications in autonomous vehicles (AVs). Finally, we present the possible future trends of RV fusion and summarize this paper.

Create account to get full access

Overview

This paper is a comprehensive survey of deep learning-based methods that fuse radar and vision data for 3D object detection in autonomous driving.
The authors review a range of techniques that combine information from radar and camera sensors to improve the accuracy and robustness of 3D object detection, a critical task for self-driving cars.
The survey covers the key challenges, recent advancements, and open research questions in this rapidly evolving field.

Plain English Explanation

Self-driving cars rely on accurate 3D object detection to perceive their surroundings and navigate safely. Radar and camera sensors provide complementary information - radar can measure the distance and velocity of objects, while cameras can provide detailed visual information. By fusing the data from these two sensor modalities, deep learning models can achieve more reliable 3D object detection compared to using either sensor alone.

This paper summarizes the state-of-the-art in deep learning techniques for radar-vision fusion in autonomous driving. The authors examine how researchers are tackling challenges like aligning the data from the different sensors, fusing features at multiple levels, and handling the limitations of each sensor to produce accurate 3D bounding boxes around detected objects.

The survey also covers how these fusion models can be used for tasks beyond just object detection, such as human detection from radar data. Overall, this paper provides a comprehensive overview of an important research area that is helping to make self-driving cars safer and more capable.

Technical Explanation

The paper begins by discussing the key challenges in 3D object detection for autonomous driving, including occlusion, varying object sizes, and the limitations of individual sensors like radar and cameras. It then reviews a range of deep learning-based sensor fusion approaches to address these challenges.

One class of methods focuses on aligning the data from radar and cameras to enable effective fusion. This includes techniques for spatial and temporal synchronization, as well as learning domain translation models to map features between the different sensor modalities.

The survey also covers multi-level fusion strategies that combine radar and vision features at different stages of the detection pipeline, from low-level input features to high-level semantic representations. This allows the model to leverage the complementary strengths of each sensor.

To handle the limitations of individual sensors, the authors discuss approaches that fuse radar and lidar data or use radar to supplement long-range camera-based detection. The survey also covers how radar data can be used for tasks beyond just object detection, such as human activity recognition.

Throughout the paper, the authors highlight key insights, benchmark performance, and outline directions for future research in this rapidly evolving field.

Critical Analysis

The paper provides a comprehensive and well-structured overview of the state-of-the-art in deep learning-based radar-vision fusion for 3D object detection. The authors thoroughly cover the technical approaches, performance, and research challenges in this domain.

One limitation noted is that the survey focuses primarily on fusion for 3D object detection, while there may be other promising applications of radar-vision fusion, such as scene understanding, trajectory prediction, and multi-agent interaction modeling. Exploring these broader use cases could be an interesting avenue for future work.

Additionally, the paper does not delve too deeply into the practical challenges of deploying these fusion models in real-world autonomous driving scenarios. Issues like sensor calibration, data synchronization, computational efficiency, and robustness to environmental conditions could be important considerations that warrant further investigation.

Overall, this survey serves as an excellent reference for researchers and engineers working on sensor fusion for self-driving cars. By highlighting the key advances and open problems in this field, it provides a solid foundation for driving future progress.

Conclusion

This paper presents a comprehensive survey of deep learning-based radar and vision fusion techniques for 3D object detection in autonomous driving. The authors examine a range of approaches that leverage the complementary strengths of radar and camera sensors to achieve more reliable and robust 3D object perception.

The survey covers critical topics such as sensor data alignment, multi-level fusion strategies, and handling sensor limitations. It also discusses how radar data can be used for broader perception tasks beyond just object detection. By outlining the state-of-the-art, key challenges, and future research directions, this work serves as a valuable resource for the autonomous driving research community.

As self-driving car technology continues to evolve, the fusion of radar and vision data will likely play an increasingly important role in enabling safe and reliable navigation. This survey provides a solid foundation for understanding the current progress and future opportunities in this rapidly advancing field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Enhanced Automotive Object Detection via RGB-D Fusion in a DiffusionDet Framework

Eliraz Orfaig, Inna Stainvas, Igal Bilik

Vision-based autonomous driving requires reliable and efficient object detection. This work proposes a DiffusionDet-based framework that exploits data fusion from the monocular camera and depth sensor to provide the RGB and depth (RGB-D) data. Within this framework, ground truth bounding boxes are randomly reshaped as part of the training phase, allowing the model to learn the reverse diffusion process of noise addition. The system methodically enhances a randomly generated set of boxes at the inference stage, guiding them toward accurate final detections. By integrating the textural and color features from RGB images with the spatial depth information from the LiDAR sensors, the proposed framework employs a feature fusion that substantially enhances object detection of automotive targets. The $2.3$ AP gain in detecting automotive targets is achieved through comprehensive experiments using the KITTI dataset. Specifically, the improved performance of the proposed approach in detecting small objects is demonstrated.

6/6/2024

cs.CV

🔎

Timely Fusion of Surround Radar/Lidar for Object Detection in Autonomous Driving Systems

Wenjing Xie, Tao Hu, Neiwen Ling, Guoliang Xing, Chun Jason Xue, Nan Guan

Fusing Radar and Lidar sensor data can fully utilize their complementary advantages and provide more accurate reconstruction of the surrounding for autonomous driving systems. Surround Radar/Lidar can provide 360-degree view sampling with the minimal cost, which are promising sensing hardware solutions for autonomous driving systems. However, due to the intrinsic physical constraints, the rotating speed of surround Radar, and thus the frequency to generate Radar data frames, is much lower than surround Lidar. Existing Radar/Lidar fusion methods have to work at the low frequency of surround Radar, which cannot meet the high responsiveness requirement of autonomous driving systems.This paper develops techniques to fuse surround Radar/Lidar with working frequency only limited by the faster surround Lidar instead of the slower surround Radar, based on the state-of-the-art object detection model MVDNet. The basic idea of our approach is simple: we let MVDNet work with temporally unaligned data from Radar/Lidar, so that fusion can take place at any time when a new Lidar data frame arrives, instead of waiting for the slow Radar data frame. However, directly applying MVDNet to temporally unaligned Radar/Lidar data greatly degrades its object detection accuracy. The key information revealed in this paper is that we can achieve high output frequency with little accuracy loss by enhancing the training procedure to explore the temporal redundancy in MVDNet so that it can tolerate the temporal unalignment of input data. We explore several different ways of training enhancement and compare them quantitatively with experiments.

5/28/2024

cs.CV cs.AI

Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System

Daniel Dworak, Mateusz Komorkiewicz, Pawe{l} Skruch, Jerzy Baranowski

In this paper, we propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems. Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance. Precisely, we extract 2D features from camera images using a state-of-the-art deep learning architecture and then apply a novel Cross-Domain Spatial Matching (CDSM) transformation method to convert these features into 3D space. We then fuse them with extracted radar data using a complementary fusion strategy to produce a final 3D object representation. To demonstrate the effectiveness of our approach, we evaluate it on the NuScenes dataset. We compare our approach to both single-sensor performance and current state-of-the-art fusion methods. Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.

4/26/2024

cs.CV

🔎

RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection

Jisong Kim, Minjae Seong, Geonho Bang, Dongsuk Kum, Jun Won Choi

While LiDAR sensors have been successfully applied to 3D object detection, the affordability of radar and camera sensors has led to a growing interest in fusing radars and cameras for 3D object detection. However, previous radar-camera fusion models were unable to fully utilize the potential of radar information. In this paper, we propose Radar-Camera Multi-level fusion (RCM-Fusion), which attempts to fuse both modalities at both feature and instance levels. For feature-level fusion, we propose a Radar Guided BEV Encoder which transforms camera features into precise BEV representations using the guidance of radar Bird's-Eye-View (BEV) features and combines the radar and camera BEV features. For instance-level fusion, we propose a Radar Grid Point Refinement module that reduces localization error by accounting for the characteristics of the radar point clouds. The experiments conducted on the public nuScenes dataset demonstrate that our proposed RCM-Fusion achieves state-of-the-art performances among single frame-based radar-camera fusion methods in the nuScenes 3D object detection benchmark. Code will be made publicly available.

5/17/2024

cs.CV