BundledSLAM: An Accurate Visual SLAM System Using Multiple Cameras

2403.19886

Published 4/1/2024 by Han Song, Cong Liu, Huafeng Dai

$BundledSLAM: An Accurate Visual SLAM System Using Multiple Cameras$

Abstract

Multi-camera SLAM systems offer a plethora of advantages, primarily stemming from their capacity to amalgamate information from a broader field of view, thereby resulting in heightened robustness and improved localization accuracy. In this research, we present a significant extension and refinement of the state-of-the-art stereo SLAM system, known as ORB-SLAM2, with the objective of attaining even higher precision.To accomplish this objective, we commence by mapping measurements from all cameras onto a virtual camera termed BundledFrame. This virtual camera is meticulously engineered to seamlessly adapt to multi-camera configurations, facilitating the effective fusion of data captured from multiple cameras. Additionally, we harness extrinsic parameters in the bundle adjustment (BA) process to achieve precise trajectory estimation.Furthermore, we conduct an extensive analysis of the role of bundle adjustment (BA) in the context of multi-camera scenarios, delving into its impact on tracking, local mapping, and global optimization. Our experimental evaluation entails comprehensive comparisons between ground truth data and the state-of-the-art SLAM system. To rigorously assess the system's performance, we utilize the EuRoC datasets. The consistent results of our evaluations demonstrate the superior accuracy of our system in comparison to existing approaches.

Create account to get full access

Introduction

The provided text discusses the advantages of multi-camera Visual Simultaneous Localization and Mapping (SLAM) systems over traditional monocular, stereo, and RGBD camera setups. Despite extensive research on monocular SLAM systems, there are relatively few Visual-Inertial Odometry (VIO) solutions designed for multi-camera SLAM systems.

The text highlights the importance of having a wide field of view (FoV) for effective perception capabilities in many robot applications and Micro Aerial Vehicles (MAVs). However, current research on visual SLAM primarily focuses on monocular, stereo, and RGBD cameras, which can face challenges such as limited FoV and a single orientation, potentially impacting their robustness and accuracy due to limited visual data collection.

The significant advantage of multi-camera SLAM systems lies in their wide FoV. This characteristic not only addresses the robustness and accuracy issues found in previous SLAM systems but also enhances the efficiency of map construction. Additionally, if certain cameras are obstructed or malfunctioning, the remaining cameras can continue to function normally and provide an ample supply of 3D data points for map generation.

$Figure 1: Pipeline of BundledSLAM$

Figure 1: Pipeline of BundledSLAM

The provided text discusses the advantages and previous work on multi-camera simultaneous localization and mapping (SLAM) systems. It highlights the early research by Pless on utilizing multiple cameras for Structure from Motion. Subsequent works by Frahm et al., Sola et al., and Harmat explored representing multiple cameras as a virtual camera, adapting single-camera SLAM algorithms for multi-camera setups, and improving camera models.

Harmat's work on MCPTAM introduced concepts like multiple key frames and spherical coordinate updates. Other researchers like Tribou, Yang et al., and others extended multi-camera approaches for robust pose estimation and mapping on drone and micro aerial vehicle platforms.

The paper aims to enhance the accuracy of ORB-SLAM2 by incorporating pose estimation and map reuse from multiple cameras. It proposes amalgamating image features from all cameras for feature matching, tracking, and place recognition during loop closure. The system achieves pose updates and optimization by minimizing a cost function involving multiple cameras.

Inspired by Wang et al.'s work on treating SLAM systems as virtual sensors, the authors introduce a virtual camera called "BundledFrame" to map measurements from all cameras. This allows efficient combination of data from multiple cameras and the application of bundle adjustment with extrinsic parameters for pose optimization in a multi-camera SLAM system.

The key contributions include a comprehensive multi-camera SLAM system with loop closure and map reuse, and an extensible approach using the "Bundled" data structure to consolidate data from multiple cameras into "BundledFrames" or "BundledKeyframes" for tracking, place recognition, and optimization.

BundledSLAM

The paper describes a multi-camera SLAM system called BundledSLAM. The system pipeline has three main threads: tracking, local mapping, and loop closing.

The tracking module estimates incremental motion by identifying feature matches in the local map and minimizing reprojection errors using motion-only bundle adjustment. It determines if the current frame qualifies as a new keyframe to integrate into the local mapping thread.

The local mapping thread manages new keyframes, involving updates to data associations, new map point creation, and removal of redundant data. It optimizes the local map through local bundle adjustment.

The loop closing thread detects significant loops and performs pose-graph optimization. It also initiates a global bundle adjustment thread to correct accumulated errors.

BundledSLAM extracts features from input images and discards the original images, operating only on the features. It implements feature matching across cameras to assign unique feature IDs. The data structure "Bundled" stores associations between feature IDs and observed cameras.

The system uses a monocular camera projection model instead of a rectified stereo model. It defines cost functions to optimize camera poses and map points through bundle adjustment during tracking, local mapping, and loop closing.

For loop detection, BundledSLAM queries a database using bag-of-words place recognition and an inverted index built from keyframes, providing a broader search scope than ORB-SLAM2.

The paper provides trajectory and error comparisons against ORB-SLAM2 and VINS-Stereo on the EuRoC dataset, demonstrating the performance of BundledSLAM.

Evaluation

The paper evaluates the performance of the proposed BundledSLAM system, a multi-camera SLAM system, against state-of-the-art methods ORB-SLAM2 and VINS-Stereo using the EuRoC dataset. The EuRoC dataset comprises sequences captured by a micro aerial vehicle equipped with stereo cameras and an IMU, with varying difficulty levels based on lighting conditions, scene texture, and motion speed.

The authors report the root mean square error (RMSE) of the absolute translation errors for the evaluated methods across different sequences. BundledSLAM demonstrates superior accuracy compared to ORB-SLAM2 and VINS-Stereo, with the best results highlighted in bold for each sequence. The loop closing module with global bundle adjustment is activated for specific sequences.

The paper includes a figure comparing the estimated trajectories of ORB-SLAM2, VINS-Stereo, and ground truth, visually showcasing the higher accuracy of BundledSLAM. Additionally, a figure depicting the absolute pose error (APE) further reinforces the consistent superior performance of BundledSLAM over the other methods across all sequences.

The authors acknowledge the difficulty faced by both BundledSLAM and ORB-SLAM2 in handling high-motion-speed sequences like V2_03_difficult, suggesting the potential benefit of incorporating additional sensors like an inertial measurement unit (IMU).

V Conclusion

The paper introduces BundledSLAM, a visual simultaneous localization and mapping (SLAM) system designed to leverage multiple cameras. BundledSLAM integrates data from various cameras into a unified "bundled frame" structure, enabling real-time pose tracking, local mapping for pose and map point optimization, and loop closing to ensure global consistency.

The evaluation, conducted using the EuRoC dataset, demonstrates that BundledSLAM consistently outperforms the original system, exhibiting exceptional accuracy in both the best and average results.

To enhance system robustness, particularly in scenarios with motion blur or limited texture features, the authors plan to explore sensor fusion by incorporating Inertial Measurement Units (IMUs). However, they acknowledge the potential computational complexity introduced by additional sensors.

As part of future research, strategies to reduce this complexity while maintaining or improving system performance will be prioritized.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤯

Design and Evaluation of a Generic Visual SLAM Framework for Multi-Camera Systems

Pushyami Kaveti, Shankara Narayanan Vaidyanathan, Arvind Thamilchelvan, Hanumant Singh

Multi-camera systems have been shown to improve the accuracy and robustness of SLAM estimates, yet state-of-the-art SLAM systems predominantly support monocular or stereo setups. This paper presents a generic sparse visual SLAM framework capable of running on any number of cameras and in any arrangement. Our SLAM system uses the generalized camera model, which allows us to represent an arbitrary multi-camera system as a single imaging device. Additionally, it takes advantage of the overlapping fields of view (FoV) by extracting cross-matched features across cameras in the rig. This limits the linear rise in the number of features with the number of cameras and keeps the computational load in check while enabling an accurate representation of the scene. We evaluate our method in terms of accuracy, robustness, and run time on indoor and outdoor datasets that include challenging real-world scenarios such as narrow corridors, featureless spaces, and dynamic objects. We show that our system can adapt to different camera configurations and allows real-time execution for typical robotic applications. Finally, we benchmark the impact of the critical design parameters - the number of cameras and the overlap between their FoV that define the camera configuration for SLAM. All our software and datasets are freely available for further research.

5/10/2024

cs.RO

Multicam-SLAM: Non-overlapping Multi-camera SLAM for Indirect Visual Localization and Navigation

Shenghao Li, Luchao Pang, Xianglong Hu

This paper presents a novel approach to visual simultaneous localization and mapping (SLAM) using multiple RGB-D cameras. The proposed method, Multicam-SLAM, significantly enhances the robustness and accuracy of SLAM systems by capturing more comprehensive spatial information from various perspectives. This method enables the accurate determination of pose relationships among multiple cameras without the need for overlapping fields of view. The proposed Muticam-SLAM includes a unique multi-camera model, a multi-keyframes structure, and several parallel SLAM threads. The multi-camera model allows for the integration of data from multiple cameras, while the multi-keyframes and parallel SLAM threads ensure efficient and accurate pose estimation and mapping. Extensive experiments in various environments demonstrate the superior accuracy and robustness of the proposed method compared to conventional single-camera SLAM systems. The results highlight the potential of the proposed Multicam-SLAM for more complex and challenging applications. Code is available at url{https://github.com/AlterPang/Multi_ORB_SLAM}.

6/26/2024

cs.RO cs.CV

Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras

Huajian Huang, Longwei Li, Hui Cheng, Sai-Kit Yeung

The integration of neural rendering and the SLAM system recently showed promising results in joint localization and photorealistic view reconstruction. However, existing methods, fully relying on implicit representations, are so resource-hungry that they cannot run on portable devices, which deviates from the original intention of SLAM. In this paper, we present Photo-SLAM, a novel SLAM framework with a hyper primitives map. Specifically, we simultaneously exploit explicit geometric features for localization and learn implicit photometric features to represent the texture information of the observed environment. In addition to actively densifying hyper primitives based on geometric features, we further introduce a Gaussian-Pyramid-based training method to progressively learn multi-level features, enhancing photorealistic mapping performance. The extensive experiments with monocular, stereo, and RGB-D datasets prove that our proposed system Photo-SLAM significantly outperforms current state-of-the-art SLAM systems for online photorealistic mapping, e.g., PSNR is 30% higher and rendering speed is hundreds of times faster in the Replica dataset. Moreover, the Photo-SLAM can run at real-time speed using an embedded platform such as Jetson AGX Orin, showing the potential of robotics applications.

4/9/2024

cs.CV

🤿

SL-SLAM: A robust visual-inertial SLAM based deep feature extraction and matching

Zhang Xiao, Shuaixin Li

This paper explores how deep learning techniques can improve visual-based SLAM performance in challenging environments. By combining deep feature extraction and deep matching methods, we introduce a versatile hybrid visual SLAM system designed to enhance adaptability in challenging scenarios, such as low-light conditions, dynamic lighting, weak-texture areas, and severe jitter. Our system supports multiple modes, including monocular, stereo, monocular-inertial, and stereo-inertial configurations. We also perform analysis how to combine visual SLAM with deep learning methods to enlighten other researches. Through extensive experiments on both public datasets and self-sampled data, we demonstrate the superiority of the SL-SLAM system over traditional approaches. The experimental results show that SL-SLAM outperforms state-of-the-art SLAM algorithms in terms of localization accuracy and tracking robustness. For the benefit of community, we make public the source code at https://github.com/zzzzxxxx111/SLslam.

6/5/2024

cs.RO