SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM

2312.02126

Published 4/17/2024 by Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, Jonathon Luiten

cs.CV cs.AI cs.RO

🤔

Abstract

Dense simultaneous localization and mapping (SLAM) is crucial for robotics and augmented reality applications. However, current methods are often hampered by the non-volumetric or implicit way they represent a scene. This work introduces SplaTAM, an approach that, for the first time, leverages explicit volumetric representations, i.e., 3D Gaussians, to enable high-fidelity reconstruction from a single unposed RGB-D camera, surpassing the capabilities of existing methods. SplaTAM employs a simple online tracking and mapping system tailored to the underlying Gaussian representation. It utilizes a silhouette mask to elegantly capture the presence of scene density. This combination enables several benefits over prior representations, including fast rendering and dense optimization, quickly determining if areas have been previously mapped, and structured map expansion by adding more Gaussians. Extensive experiments show that SplaTAM achieves up to 2x superior performance in camera pose estimation, map construction, and novel-view synthesis over existing methods, paving the way for more immersive high-fidelity SLAM applications.

Create account to get full access

Overview

Introduces SplaTAM, a new approach to dense simultaneous localization and mapping (SLAM) that uses explicit volumetric representations (3D Gaussians) to enable high-fidelity scene reconstruction from a single unposed RGB-D camera
Outperforms existing SLAM methods in camera pose estimation, map construction, and novel-view synthesis
Enables benefits like fast rendering, dense optimization, and structured map expansion

Plain English Explanation

SplaTAM is a new SLAM technique that aims to create detailed 3D maps of a scene using a single camera. Unlike previous methods that represent the scene in a non-volumetric or implicit way, SplaTAM uses an explicit volumetric representation - 3D Gaussians.

This Gaussian-based approach allows SplaTAM to reconstruct scenes with much higher fidelity compared to existing SLAM systems. It uses a simple online tracking and mapping system tailored to the Gaussian representation, and a silhouette mask to capture the presence of scene density.

The key benefits of SplaTAM's Gaussian representation include fast rendering, dense optimization, quickly determining previously mapped areas, and structured map expansion. These capabilities enable SplaTAM to outperform other SLAM methods in important metrics like camera pose estimation, map construction, and generating novel views of the scene.

Technical Explanation

SplaTAM is a SLAM system that leverages explicit 3D Gaussian representations to enable high-fidelity scene reconstruction from a single unposed RGB-D camera. This contrasts with previous SLAM approaches that used non-volumetric or implicit scene representations.

The system employs a simple online tracking and mapping pipeline tailored to the Gaussian representation. It utilizes a silhouette mask to elegantly capture the presence of scene density. This combination of a Gaussian representation and silhouette mask enables several benefits, including:

Fast rendering and dense optimization
Quickly determining if areas have been previously mapped
Structured map expansion by adding more Gaussians

Extensive experiments show that SplaTAM outperforms existing SLAM methods by up to 2x in camera pose estimation, map construction, and novel-view synthesis. This paves the way for more immersive and high-fidelity SLAM applications, such as in robotics and augmented reality.

The technical approach draws inspiration from prior work on Gaussian splatting and 3D Gaussian representations for SLAM and rendering, as well as the Z-Splat technique for efficient camera-centric rendering.

Critical Analysis

The paper provides a compelling demonstration of the benefits of using an explicit volumetric Gaussian representation for SLAM, which outperforms existing implicit or non-volumetric approaches. However, the authors do note some limitations, such as the potential for drift in the mapping process and the need for further optimization to achieve real-time performance.

Additionally, the paper does not address the computational complexity of the Gaussian representation or the memory requirements for storing and processing the 3D Gaussian fields. These factors could be important considerations for deploying SplaTAM in resource-constrained environments like mobile devices.

Further research could explore ways to address these limitations, such as investigating techniques for loop closure or drift correction, or developing more efficient Gaussian encoding and processing algorithms. Evaluating the system's performance in a broader range of real-world scenarios would also help validate its practical applicability.

Conclusion

SplaTAM represents an exciting advancement in SLAM technology, leveraging explicit 3D Gaussian representations to enable high-fidelity scene reconstruction from a single camera. The system's superior performance in key metrics like camera pose estimation, map construction, and novel-view synthesis suggests it could have a significant impact on robotics, augmented reality, and other applications that rely on accurate and immersive 3D scene understanding.

While the paper identifies some areas for further research and optimization, the core ideas behind SplaTAM demonstrate the power of explicit volumetric representations in SLAM and pave the way for more advanced and capable real-time 3D mapping systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians

Erik Sandstrom, Keisuke Tateno, Michael Oechsle, Michael Niemeyer, Luc Van Gool, Martin R. Oswald, Federico Tombari

3D Gaussian Splatting has emerged as a powerful representation of geometry and appearance for RGB-only dense Simultaneous Localization and Mapping (SLAM), as it provides a compact dense map representation while enabling efficient and high-quality map rendering. However, existing methods show significantly worse reconstruction quality than competing methods using other 3D representations, e.g. neural points clouds, since they either do not employ global map and pose optimization or make use of monocular depth. In response, we propose the first RGB-only SLAM system with a dense 3D Gaussian map representation that utilizes all benefits of globally optimized tracking by adapting dynamically to keyframe pose and depth updates by actively deforming the 3D Gaussian map. Moreover, we find that refining the depth updates in inaccurate areas with a monocular depth estimator further improves the accuracy of the 3D reconstruction. Our experiments on the Replica, TUM-RGBD, and ScanNet datasets indicate the effectiveness of globally optimized 3D Gaussians, as the approach achieves superior or on par performance with existing RGB-only SLAM methods methods in tracking, mapping and rendering accuracy while yielding small map sizes and fast runtimes. The source code is available at https://github.com/eriksandstroem/Splat-SLAM.

5/28/2024

cs.CV

🗣️

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

Chi Yan, Delin Qu, Dan Xu, Bin Zhao, Zhigang Wang, Dong Wang, Xuelong Li

In this paper, we introduce textbf{GS-SLAM} that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping (SLAM) system. It facilitates a better balance between efficiency and accuracy. Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussians in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas. This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods. Moreover, in the pose tracking process, an effective coarse-to-fine technique is designed to select reliable 3D Gaussian representations to optimize camera pose, resulting in runtime reduction and robust estimation. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets. Project page: https://gs-slam.github.io/.

4/9/2024

cs.CV

Gaussian Splatting SLAM

Hidenobu Matsuki, Riku Murai, Paul H. J. Kelly, Andrew J. Davison

We present the first application of 3D Gaussian Splatting in monocular SLAM, the most fundamental but the hardest setup for Visual SLAM. Our method, which runs live at 3fps, utilises Gaussians as the only 3D representation, unifying the required representation for accurate, efficient tracking, mapping, and high-quality rendering. Designed for challenging monocular settings, our approach is seamlessly extendable to RGB-D SLAM when an external depth sensor is available. Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera. First, to move beyond the original 3DGS algorithm, which requires accurate poses from an offline Structure from Motion (SfM) system, we formulate camera tracking for 3DGS using direct optimisation against the 3D Gaussians, and show that this enables fast and robust tracking with a wide basin of convergence. Second, by utilising the explicit nature of the Gaussians, we introduce geometric verification and regularisation to handle the ambiguities occurring in incremental 3D dense reconstruction. Finally, we introduce a full SLAM system which not only achieves state-of-the-art results in novel view synthesis and trajectory estimation but also reconstruction of tiny and even transparent objects.

4/16/2024

cs.CV cs.RO

RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting

Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, Kun Zhou

We present Real-time Gaussian SLAM (RTG-SLAM), a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth in a different way from color rendering, we let a single opaque Gaussian well fit a local surface region without the need of multiple overlapping Gaussians, hence largely reducing the memory and computation cost. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors, and with large depth errors. We also categorize all Gaussians into stable and unstable ones, where the stable Gaussians are expected to well fit previously observed RGBD images and otherwise unstable. We only optimize the unstable Gaussians and only render the pixels occupied by unstable Gaussians. In this way, both the number of Gaussians to be optimized and pixels to be rendered are largely reduced, and the optimization can be done in real time. We show real-time reconstructions of a variety of large scenes. Compared with the state-of-the-art NeRF-based RGBD SLAM, our system achieves comparable high-quality reconstruction but with around twice the speed and half the memory cost, and shows superior performance in the realism of novel view synthesis and camera tracking accuracy.

5/10/2024

cs.CV