DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment

Read original: arXiv:2403.18274 - Published 7/18/2024 by Jiuming Liu, Dong Zhuo, Zhiheng Feng, Siting Zhu, Chensheng Peng, Zhe Liu, Hesheng Wang

DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment

Overview

Presents a deep visual-LiDAR odometry (DVLO) system that fuses local and global features from visual and LiDAR data
Employs a bi-directional structure alignment module to improve the consistency between the visual and LiDAR feature representations
Demonstrates state-of-the-art performance on several public datasets for 6-DoF odometry estimation

Plain English Explanation

DVLO is a system that combines information from cameras (visual) and laser scanners (LiDAR) to accurately estimate the 6-degree-of-freedom (6-DoF) motion of a robot or vehicle as it moves through an environment. By fusing local and global features from the visual and LiDAR data, DVLO is able to build a more complete and consistent representation of the surroundings. The bi-directional structure alignment module helps ensure that the visual and LiDAR features are well-aligned, improving the overall odometry estimation. This approach outperforms other state-of-the-art visual-LiDAR odometry methods on several benchmark datasets.

Technical Explanation

The DVLO system consists of several key components:

Local-to-Global Feature Fusion: DVLO extracts both local and global features from the visual and LiDAR data. The local features capture fine-grained details, while the global features provide a broader context. These features are then fused together to create a more comprehensive representation of the environment.
Bi-Directional Structure Alignment: To ensure consistency between the visual and LiDAR feature representations, DVLO employs a bi-directional structure alignment module. This module learns to align the visual and LiDAR features by exploiting the underlying 3D structure of the environment, as demonstrated in the LVCP-LIDAR-VISION and Global-Local Collaborative Inference approaches.
Odometry Estimation: The fused and aligned visual-LiDAR features are then used to estimate the 6-DoF odometry of the robot or vehicle, similar to the approach used in the LIO-GVM system.

The authors evaluate DVLO on several public datasets and demonstrate state-of-the-art performance in terms of odometry estimation accuracy compared to other visual-LiDAR odometry methods.

Critical Analysis

The paper provides a comprehensive and technically sound approach to visual-LiDAR odometry. However, some potential limitations and areas for further research are:

The performance of DVLO may be sensitive to the quality and alignment of the visual and LiDAR data, which can be challenging to obtain in real-world scenarios.
The computational complexity of the system, particularly the bi-directional structure alignment module, may limit its deployment on resource-constrained platforms.
The authors do not extensively discuss the robustness of DVLO to challenging environmental conditions, such as dynamic scenes or adverse weather, which could be an important consideration for real-world applications.

Further research could explore ways to improve the efficiency and robustness of the DVLO system, as well as investigate its performance in more diverse and challenging environments.

Conclusion

The DVLO system presented in this paper represents a significant advancement in the field of visual-LiDAR odometry. By effectively fusing local and global features from the visual and LiDAR data, and aligning the feature representations using a bi-directional structure alignment module, DVLO demonstrates state-of-the-art performance in 6-DoF odometry estimation. While the system may have some limitations, the core technical innovations and empirical results suggest that DVLO could be a valuable tool for robot and vehicle localization in a wide range of applications, such as autonomous navigation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment

Jiuming Liu, Dong Zhuo, Zhiheng Feng, Siting Zhu, Chensheng Peng, Zhe Liu, Hesheng Wang

Information inside visual and LiDAR data is well complementary derived from the fine-grained texture of images and massive geometric information in point clouds. However, it remains challenging to explore effective visual-LiDAR fusion, mainly due to the intrinsic data structure inconsistency between two modalities: Image pixels are regular and dense, but LiDAR points are unordered and sparse. To address the problem, we propose a local-to-global fusion network (DVLO) with bi-directional structure alignment. To obtain locally fused features, we project points onto the image plane as cluster centers and cluster image pixels around each center. Image pixels are pre-organized as pseudo points for image-to-point structure alignment. Then, we convert points to pseudo images by cylindrical projection (point-to-image structure alignment) and perform adaptive global feature fusion between point features and local fused features. Our method achieves state-of-the-art performance on KITTI odometry and FlyingThings3D scene flow datasets compared to both single-modal and multi-modal methods. Codes are released at https://github.com/IRMVLab/DVLO.

7/18/2024

LVCP: LiDAR-Vision Tightly Coupled Collaborative Real-time Relative Positioning

Zhuozhu Jian, Qixuan Li, Shengtao Zheng, Xueqian Wang, Xinlei Chen

In air-ground collaboration scenarios without GPS and prior maps, the relative positioning of drones and unmanned ground vehicles (UGVs) has always been a challenge. For a drone equipped with monocular camera and an UGV equipped with LiDAR as an external sensor, we propose a robust and real-time relative pose estimation method (LVCP) based on the tight coupling of vision and LiDAR point cloud information, which does not require prior information such as maps or precise initial poses. Given that large-scale point clouds generated by 3D sensors has more accurate spatial geometric information than the feature point cloud generated by image, we utilize LiDAR point clouds to correct the drift in visual-inertial odometry (VIO) when the camera undergoes significant shaking or the IMU has a low signal-to-noise ratio. To achieve this, we propose a novel coarse-to-fine framework for LiDAR-vision collaborative localization. In this framework, we construct point-plane association based on spatial geometric information, and innovatively construct a point-aided Bundle Adjustment (BA) problem as the backend to simultaneously estimate the relative pose of the camera and LiDAR and correct the VIO drift. In this process, we propose a particle swarm optimization (PSO) based sampling algorithm to complete the coarse estimation of the current camera-LiDAR pose. In this process, the initial pose of the camera used for sampling is obtained based on VIO propagation, and the valid feature-plane association number (VFPN) is used to trigger PSO-sampling process. Additionally, we propose a method that combines Structure from Motion (SFM) and multi-level sampling to initialize the algorithm, addressing the challenge of lacking initial values.

7/16/2024

New!Panoramic Direct LiDAR-assisted Visual Odometry

Zikang Yuan, Tianle Xu, Xiaoxiang Wang, Jinni Geng, Xin Yang

Enhancing visual odometry by exploiting sparse depth measurements from LiDAR is a promising solution for improving tracking accuracy of an odometry. Most existing works utilize a monocular pinhole camera, yet could suffer from poor robustness due to less available information from limited field-of-view (FOV). This paper proposes a panoramic direct LiDAR-assisted visual odometry, which fully associates the 360-degree FOV LiDAR points with the 360-degree FOV panoramic image datas. 360-degree FOV panoramic images can provide more available information, which can compensate inaccurate pose estimation caused by insufficient texture or motion blur from a single view. In addition to constraints between a specific view at different times, constraints can also be built between different views at the same moment. Experimental results on public datasets demonstrate the benefit of large FOV of our panoramic direct LiDAR-assisted visual odometry to state-of-the-art approaches.

9/17/2024

Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection

Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu

Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous lidar-based OVD methods only focus on the usage of object-level features, ignoring the essence of scene-level information. In this paper, we propose a Global-Local Collaborative Scheme (GLIS) for the lidar-based OVD task, which contains a local branch to generate object-level detection result and a global branch to obtain scene-level global feature. With the global-local information, a Large Language Model (LLM) is applied for chain-of-thought inference, and the detection result can be refined accordingly. We further propose Reflected Pseudo Labels Generation (RPLG) to generate high-quality pseudo labels for supervision and Background-Aware Object Localization (BAOL) to select precise object proposals. Extensive experiments on ScanNetV2 and SUN RGB-D demonstrate the superiority of our methods. Code is released at https://github.com/GradiusTwinbee/GLIS.

7/15/2024