DVI-SLAM: A Dual Visual Inertial SLAM Network

Read original: arXiv:2309.13814 - Published 5/28/2024 by Xiongfeng Peng, Zhihua Liu, Weiming Li, Ping Tan, SoonYong Cho, Qiang Wang

🌐

Overview

This paper proposes a novel deep learning-based visual SLAM (Simultaneous Localization and Mapping) network that integrates both photometric and re-projection factors to improve performance.
The network dynamically learns and adjusts the confidence maps of these visual factors, and can be further extended to include inertial measurement unit (IMU) factors as well.
The proposed method significantly outperforms state-of-the-art methods on several public datasets, including TartanAir, EuRoC, and ETH3D-SLAM.

Plain English Explanation

Visual SLAM is a technology that allows devices like robots or augmented reality headsets to understand their surroundings and location by analyzing camera images. Recent deep learning-based visual SLAM methods have made significant progress, but there is still room for improvement in how they use visual information and integrate with inertial sensors.

This paper introduces a new deep learning-based visual SLAM network that combines two key visual factors - photometric (how pixels change over time) and re-projection (how 3D points project onto the camera image) - in a novel way. The network dynamically adjusts the confidence, or importance, of these two visual factors, which helps it better estimate the device's location and the 3D map of the environment.

The researchers also show that this network can be expanded to include data from inertial sensors, which measure things like acceleration and rotation. By fusing all of these different sensor inputs, the SLAM system can achieve even higher accuracy in tracking the device's movement and building the 3D map.

The paper demonstrates that this new SLAM network significantly outperforms previous state-of-the-art methods on several widely-used benchmark datasets. For example, on the EuRoC dataset, the new method reduced the error in estimating the device's trajectory by over 45% for monocular (single camera) configurations and 36% for stereo (dual camera) configurations, compared to the best previous approaches.

Technical Explanation

The core innovation in this paper is the proposed Dual-Factor Visual SLAM (DF-SLAM) network, which integrates both photometric and re-projection visual factors into an end-to-end differentiable structure.

The photometric factor captures how pixel intensities change over time as the camera moves, while the re-projection factor models how 3D points in the environment project onto the 2D camera image. The network uses a Multi-Factor Data Association (MFDA) module to dynamically learn and adjust the confidence maps for these two visual factors.

This allows the network to adaptively emphasize the more reliable visual cues during SLAM, which leads to improved trajectory estimation and 3D mapping performance. The researchers also show how the DF-SLAM network can be extended to incorporate inertial measurement unit (IMU) factors as an additional information source.

Extensive experiments on the TartanAir, EuRoC, and ETH3D-SLAM datasets demonstrate the superior performance of the proposed DF-SLAM network compared to state-of-the-art visual SLAM methods. Notably, the addition of IMU factors further boosts the accuracy, reducing the absolute trajectory error by 45.3% and 36.2% for monocular and stereo configurations on the EuRoC dataset, respectively.

Critical Analysis

The paper presents a well-designed and thoroughly evaluated deep learning-based visual SLAM system that effectively integrates photometric and re-projection visual factors. The authors acknowledge that while their method outperforms existing approaches, there is still room for improvement, particularly in handling more challenging environments and further leveraging additional sensor modalities like ground-to-IMU calibration.

One potential limitation is the reliance on relatively large and high-quality datasets like TartanAir and EuRoC for training and evaluation. It would be interesting to see how the DF-SLAM network performs on more diverse and realistic datasets that may include factors like varying illumination, dynamic objects, and sensor noise.

Additionally, while the paper demonstrates the benefits of incorporating IMU data, the authors do not provide a detailed analysis of the individual contributions of the photometric, re-projection, and IMU factors. Further research could investigate the optimal weighting and fusion of these different information sources for different application scenarios.

Overall, this paper represents a significant advancement in deep learning-based visual SLAM and provides a strong foundation for future work in this area, particularly in the direction of robust multi-sensor fusion for accurate localization and mapping.

Conclusion

This paper introduces a novel deep learning-based visual SLAM network, called DF-SLAM, that dynamically integrates photometric and re-projection visual factors to improve tracking and mapping performance. The authors show that this approach significantly outperforms state-of-the-art methods on several benchmark datasets, and can be further enhanced by incorporating inertial measurement unit (IMU) data.

The research advances the field of visual SLAM by demonstrating the benefits of adaptively fusing different visual cues, and provides a framework for seamlessly integrating multiple sensor modalities. This work has important implications for a wide range of applications, such as autonomous navigation, augmented reality, and robotics, where accurate and reliable localization and mapping are critical.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌐

DVI-SLAM: A Dual Visual Inertial SLAM Network

Xiongfeng Peng, Zhihua Liu, Weiming Li, Ping Tan, SoonYong Cho, Qiang Wang

Recent deep learning based visual simultaneous localization and mapping (SLAM) methods have made significant progress. However, how to make full use of visual information as well as better integrate with inertial measurement unit (IMU) in visual SLAM has potential research value. This paper proposes a novel deep SLAM network with dual visual factors. The basic idea is to integrate both photometric factor and re-projection factor into the end-to-end differentiable structure through multi-factor data association module. We show that the proposed network dynamically learns and adjusts the confidence maps of both visual factors and it can be further extended to include the IMU factors as well. Extensive experiments validate that our proposed method significantly outperforms the state-of-the-art methods on several public datasets, including TartanAir, EuRoC and ETH3D-SLAM. Specifically, when dynamically fusing the three factors together, the absolute trajectory error for both monocular and stereo configurations on EuRoC dataset has reduced by 45.3% and 36.2% respectively.

5/28/2024

🤿

SL-SLAM: A robust visual-inertial SLAM based deep feature extraction and matching

Zhang Xiao, Shuaixin Li

This paper explores how deep learning techniques can improve visual-based SLAM performance in challenging environments. By combining deep feature extraction and deep matching methods, we introduce a versatile hybrid visual SLAM system designed to enhance adaptability in challenging scenarios, such as low-light conditions, dynamic lighting, weak-texture areas, and severe jitter. Our system supports multiple modes, including monocular, stereo, monocular-inertial, and stereo-inertial configurations. We also perform analysis how to combine visual SLAM with deep learning methods to enlighten other researches. Through extensive experiments on both public datasets and self-sampled data, we demonstrate the superiority of the SL-SLAM system over traditional approaches. The experimental results show that SL-SLAM outperforms state-of-the-art SLAM algorithms in terms of localization accuracy and tracking robustness. For the benefit of community, we make public the source code at https://github.com/zzzzxxxx111/SLslam.

6/5/2024

Fusion LiDAR-Inertial-Encoder data for High-Accuracy SLAM

Manh Do Duc, Thanh Nguyen Canh, Minh DoNgoc, Xiem HoangVan

In the realm of robotics, achieving simultaneous localization and mapping (SLAM) is paramount for autonomous navigation, especially in challenging environments like texture-less structures. This paper proposed a factor-graph-based model that tightly integrates IMU and encoder sensors to enhance positioning in such environments. The system operates by meticulously evaluating the data from each sensor. Based on these evaluations, weights are dynamically adjusted to prioritize the more reliable source of information at any given moment. The robot's state is initialized using IMU data, while the encoder aids motion estimation in long corridors. Discrepancies between the two states are used to correct IMU drift. The effectiveness of this method is demonstrably validated through experimentation. Compared to Karto SLAM, a widely used SLAM algorithm, this approach achieves an improvement of 26.98% in rotation angle error and 67.68% reduction in position error. These results convincingly demonstrate the method's superior accuracy and robustness in texture-less environments.

7/18/2024

⚙️

MAVIS: Multi-Camera Augmented Visual-Inertial SLAM using SE2(3) Based Exact IMU Pre-integration

Yifu Wang, Yonhon Ng, Inkyu Sa, Alvaro Parra, Cristian Rodriguez, Tao Jun Lin, Hongdong Li

We present a novel optimization-based Visual-Inertial SLAM system designed for multiple partially overlapped camera systems, named MAVIS. Our framework fully exploits the benefits of wide field-of-view from multi-camera systems, and the metric scale measurements provided by an inertial measurement unit (IMU). We introduce an improved IMU pre-integration formulation based on the exponential function of an automorphism of SE_2(3), which can effectively enhance tracking performance under fast rotational motion and extended integration time. Furthermore, we extend conventional front-end tracking and back-end optimization module designed for monocular or stereo setup towards multi-camera systems, and introduce implementation details that contribute to the performance of our system in challenging scenarios. The practical validity of our approach is supported by our experiments on public datasets. Our MAVIS won the first place in all the vision-IMU tracks (single and multi-session SLAM) on Hilti SLAM Challenge 2023 with 1.7 times the score compared to the second place.

7/17/2024