SDGE: Stereo Guided Depth Estimation for 360$^circ$ Camera Sets

2402.11791

Published 4/3/2024 by Jialei Xu, Wei Yin, Dong Gong, Junjun Jiang, Xianming Liu

🧪

Abstract

Depth estimation is a critical technology in autonomous driving, and multi-camera systems are often used to achieve a 360$^circ$ perception. These 360$^circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image. Alternatively, monocular methods may not produce consistent cross-view predictions. To address these issues, we propose the Stereo Guided Depth Estimation (SGDE) method, which enhances depth estimation of the full image by explicitly utilizing multi-view stereo results on the overlap. We suggest building virtual pinhole cameras to resolve the distortion problem of fisheye cameras and unify the processing for the two types of 360$^circ$ cameras. For handling the varying noise on camera poses caused by unstable movement, the approach employs a self-calibration method to obtain highly accurate relative poses of the adjacent cameras with minor overlap. These enable the use of robust stereo methods to obtain high-quality depth prior in the overlap region. This prior serves not only as an additional input but also as pseudo-labels that enhance the accuracy of depth estimation methods and improve cross-view prediction consistency. The effectiveness of SGDE is evaluated on one fisheye camera dataset, Synthetic Urban, and two pinhole camera datasets, DDAD and nuScenes. Our experiments demonstrate that SGDE is effective for both supervised and self-supervised depth estimation, and highlight the potential of our method for advancing downstream autonomous driving technologies, such as 3D object detection and occupancy prediction.

Create account to get full access

Overview

Depth estimation is crucial for autonomous driving, which often uses 360-degree camera systems
These 360-degree camera setups can have low-quality or limited overlap between views, making traditional multi-view stereo methods ineffective
Monocular depth estimation methods may also struggle to maintain consistent predictions across different views
To address these challenges, the researchers propose the Stereo Guided Depth Estimation (SGDE) method

Plain English Explanation

Autonomous vehicles need to be able to understand the 3D world around them, which is where depth estimation comes in. These vehicles often use 360-degree camera systems to get a full view of their surroundings. However, the regions where these cameras overlap can sometimes be low-quality or limited, making it difficult to use traditional stereo vision techniques to estimate depth. On the other hand, using a single camera (monocular) to estimate depth can lead to inconsistent predictions when looking at different views.

The SGDE method aims to solve these problems by explicitly using the stereo information from the overlap regions to enhance the overall depth estimation. The approach first converts the fisheye camera images into a more standard pinhole camera model to simplify the processing. It then employs a self-calibration technique to accurately determine the relative positions of the adjacent cameras, even when the vehicle is moving unsteadily. This enables the use of robust stereo methods to obtain high-quality depth information in the overlap areas.

This depth information from the overlap regions is then used in two ways: as an additional input to the depth estimation model, and as pseudo-labels to help improve the model's accuracy and consistency across different views. By leveraging the strengths of both stereo and monocular depth estimation, SGDE is able to produce more reliable depth maps for the entire 360-degree field of view.

Technical Explanation

The SGDE method first addresses the distortion issues of fisheye cameras by building virtual pinhole cameras, which unifies the processing for both fisheye and standard pinhole 360-degree camera setups. To handle the varying noise in camera poses caused by unstable vehicle movement, the approach employs a self-calibration technique to accurately determine the relative positions of adjacent cameras, even when there is limited overlap between them.

With the camera geometry resolved, SGDE is able to leverage robust stereo matching methods to obtain high-quality depth priors in the overlap regions between adjacent camera views. This depth information is then used in two ways: 1) as an additional input to the depth estimation network, and 2) as pseudo-labels to improve the network's depth prediction accuracy and consistency across views.

The effectiveness of SGDE is evaluated on three datasets: one using fisheye cameras (Synthetic Urban) and two using standard pinhole cameras (DDAD and nuScenes). The results demonstrate that SGDE can enhance both supervised and self-supervised depth estimation, and highlight its potential to advance downstream autonomous driving technologies, such as 3D object detection and occupancy prediction.

Critical Analysis

The paper provides a compelling solution to the challenge of depth estimation in 360-degree camera systems used for autonomous driving. By explicitly leveraging the stereo information in the overlap regions, SGDE is able to overcome the limitations of both traditional multi-view stereo and monocular depth estimation methods.

However, the paper does not explore the impact of SGDE on real-world deployment scenarios, such as how the method would handle varying environmental conditions (e.g., lighting, weather) or the presence of dynamic objects. Additionally, the authors do not discuss the computational complexity of the approach and how it might scale with the number of cameras in the 360-degree system.

Further research could investigate the robustness of SGDE in more realistic driving environments, as well as explore ways to optimize the computational efficiency of the method to enable real-time performance on embedded platforms commonly used in autonomous vehicles.

Conclusion

The Stereo Guided Depth Estimation (SGDE) method presented in this paper offers a promising solution for improving depth estimation in 360-degree camera systems for autonomous driving. By unifying the processing of fisheye and pinhole cameras, employing self-calibration techniques, and leveraging stereo information from overlap regions, SGDE is able to produce more accurate and consistent depth predictions across the entire field of view.

The researchers have demonstrated the effectiveness of SGDE on several benchmark datasets, highlighting its potential to enhance downstream autonomous driving technologies, such as 3D object detection and occupancy prediction. While the paper does not address real-world deployment challenges, the core ideas presented could pave the way for further advancements in 360-degree depth estimation for autonomous vehicles.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔄

Cross-spectral Gated-RGB Stereo Depth Estimation

Samuel Brucker, Stefanie Walz, Mario Bijelic, Felix Heide

Gated cameras flood-illuminate a scene and capture the time-gated impulse response of a scene. By employing nanosecond-scale gates, existing sensors are capable of capturing mega-pixel gated images, delivering dense depth improving on today's LiDAR sensors in spatial resolution and depth precision. Although gated depth estimation methods deliver a million of depth estimates per frame, their resolution is still an order below existing RGB imaging methods. In this work, we combine high-resolution stereo HDR RCCB cameras with gated imaging, allowing us to exploit depth cues from active gating, multi-view RGB and multi-view NIR sensing -- multi-view and gated cues across the entire spectrum. The resulting capture system consists only of low-cost CMOS sensors and flood-illumination. We propose a novel stereo-depth estimation method that is capable of exploiting these multi-modal multi-view depth cues, including the active illumination that is measured by the RCCB camera when removing the IR-cut filter. The proposed method achieves accurate depth at long ranges, outperforming the next best existing method by 39% for ranges of 100 to 220m in MAE on accumulated LiDAR ground-truth. Our code, models and datasets are available at https://light.princeton.edu/gatedrccbstereo/ .

5/22/2024

cs.CV

Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation

Ning-Hsu Wang, Yu-Lun Liu

Accurately estimating depth in 360-degree imagery is crucial for virtual reality, autonomous navigation, and immersive media applications. Existing depth estimation methods designed for perspective-view imagery fail when applied to 360-degree images due to different camera projections and distortions, whereas 360-degree methods perform inferior due to the lack of labeled data pairs. We propose a new depth estimation framework that utilizes unlabeled 360-degree data effectively. Our approach uses state-of-the-art perspective depth estimation models as teacher models to generate pseudo labels through a six-face cube projection technique, enabling efficient labeling of depth in 360-degree images. This method leverages the increasing availability of large datasets. Our approach includes two main stages: offline mask generation for invalid regions and an online semi-supervised joint training regime. We tested our approach on benchmark datasets such as Matterport3D and Stanford2D3D, showing significant improvements in depth estimation accuracy, particularly in zero-shot scenarios. Our proposed training pipeline can enhance any 360 monocular depth estimator and demonstrates effective knowledge transfer across different camera projections and data types. See our project page for results: https://albert100121.github.io/Depth-Anywhere/

6/19/2024

cs.CV

DoubleTake: Geometry Guided Depth Estimation

Mohamed Sayed, Filippo Aleotti, Jamie Watson, Zawar Qureshi, Guillermo Garcia-Hernando, Gabriel Brostow, Sara Vicente, Michael Firman

Estimating depth from a sequence of posed RGB images is a fundamental computer vision task, with applications in augmented reality, path planning etc. Prior work typically makes use of previous frames in a multi view stereo framework, relying on matching textures in a local neighborhood. In contrast, our model leverages historical predictions by giving the latest 3D geometry data as an extra input to our network. This self-generated geometric hint can encode information from areas of the scene not covered by the keyframes and it is more regularized when compared to individual predicted depth maps for previous frames. We introduce a Hint MLP which combines cost volume features with a hint of the prior geometry, rendered as a depth map from the current camera location, together with a measure of the confidence in the prior geometry. We demonstrate that our method, which can run at interactive speeds, achieves state-of-the-art estimates of depth and 3D scene reconstruction in both offline and incremental evaluation scenarios.

6/27/2024

cs.CV cs.LG

Stereo-LiDAR Depth Estimation with Deformable Propagation and Learned Disparity-Depth Conversion

Ang Li, Anning Hu, Wei Xi, Wenxian Yu, Danping Zou

Accurate and dense depth estimation with stereo cameras and LiDAR is an important task for automatic driving and robotic perception. While sparse hints from LiDAR points have improved cost aggregation in stereo matching, their effectiveness is limited by the low density and non-uniform distribution. To address this issue, we propose a novel stereo-LiDAR depth estimation network with Semi-Dense hint Guidance, named SDG-Depth. Our network includes a deformable propagation module for generating a semi-dense hint map and a confidence map by propagating sparse hints using a learned deformable window. These maps then guide cost aggregation in stereo matching. To reduce the triangulation error in depth recovery from disparity, especially in distant regions, we introduce a disparity-depth conversion module. Our method is both accurate and efficient. The experimental results on benchmark tests show its superior performance. Our code is available at https://github.com/SJTU-ViSYS/SDG-Depth.

4/12/2024

cs.CV