PO-VINS: An Efficient and Robust Pose-Only Visual-Inertial State Estimator With LiDAR Enhancement

Read original: arXiv:2305.12644 - Published 9/12/2024 by Hailiang Tang, Tisheng Zhang, Liqiang Wang, Guan Wang, Xiaoji Niu

🔎

Overview

Pose adjustment (PA) with a pose-only visual representation is as effective as bundle adjustment (BA) but more computationally efficient.
Pose-only solution has not been properly integrated into a tightly-coupled visual-inertial state estimator (VISE) for real-time navigation.
This study proposes a tightly-coupled LiDAR-enhanced VISE called PO-VINS that uses a full pose-only form for visual and LiDAR-depth measurements to improve efficiency.
PO-VINS derives analytical depth uncertainty to cull LiDAR depth outliers and uses a multi-state constraint (MSC)-based LiDAR-depth measurement model to balance efficiency and robustness.

Plain English Explanation

The researchers developed a new system for real-time navigation that combines visual, inertial, and LiDAR sensor data. Previous methods used a computationally expensive approach called bundle adjustment to estimate the camera's position and orientation.

The researchers found that a simpler "pose-only" approach can achieve similar accuracy with much less computational power. They integrated this pose-only method into a tightly-coupled visual-inertial system, creating a new system called PO-VINS.

PO-VINS also incorporates LiDAR depth data to further improve accuracy and robustness. It uses an analytical model to identify and remove outliers in the LiDAR measurements. The visual, inertial, and LiDAR data are all tightly integrated using an optimization technique called factor graph to efficiently estimate the system's state.

The researchers extensively tested PO-VINS and found it provided equal or better accuracy compared to state-of-the-art methods, while being 33-56% more computationally efficient. The improved efficiency and robustness of PO-VINS make it well-suited for real-time navigation applications on resource-constrained platforms like mobile robots.

Technical Explanation

The core innovation of this work is the development of PO-VINS, a tightly-coupled LiDAR-enhanced visual-inertial state estimator that leverages a pose-only visual representation. This pose-only approach has been shown to be equivalent to the computationally expensive bundle adjustment (BA) method, while significantly improving efficiency.

To integrate the pose-only visual solution into a VISE, the researchers derived the analytical depth uncertainty from the pose-only visual representation. This allows them to employ an outlier-culling method to remove erroneous LiDAR depth measurements. They also propose a multi-state constraint (MSC)-based LiDAR-depth measurement model that balances efficiency and robustness.

The pose-only visual and LiDAR-depth measurements, along with the IMU-preintegration measurements, are then tightly integrated under the factor graph optimization framework to perform efficient and accurate state estimation.

Extensive experiments on private and public datasets demonstrate that PO-VINS achieves improved or comparable accuracy to state-of-the-art methods. Importantly, the state-estimation efficiency of PO-VINS is improved by 33% and 56% on a laptop PC and an onboard ARM computer, respectively, compared to the baseline LE-VINS method. PO-VINS also exhibits improved robustness through the proposed outlier-culling method and the MSC-based LiDAR-depth measurement model.

Critical Analysis

The researchers have thoroughly evaluated PO-VINS and demonstrated its advantages in terms of efficiency and robustness without compromising accuracy. However, the paper does not discuss any potential limitations or caveats of the proposed system.

One area that could benefit from further investigation is the [impact of the pose-only representation on the overall system's local observability. The researchers should explore how this approach affects the system's ability to observably estimate all relevant states, particularly in challenging scenarios.

Additionally, while the experiments on public and private datasets provide a comprehensive evaluation, it would be valuable to see how PO-VINS performs in a wider range of real-world scenarios, such as environments with varying lighting conditions or sensor degradation.

Overall, the researchers have made a significant contribution by demonstrating the viability of a pose-only visual representation in a tightly-coupled visual-inertial-LiDAR state estimator. Further refinement and broader validation could solidify PO-VINS as a highly efficient and robust solution for real-time navigation applications.

Conclusion

This study proposes PO-VINS, a tightly-coupled LiDAR-enhanced visual-inertial state estimator that leverages a pose-only visual representation to achieve improved computational efficiency without sacrificing accuracy. The researchers derive an analytical depth uncertainty model to cull LiDAR depth outliers and introduce a multi-state constraint-based LiDAR-depth measurement model to balance efficiency and robustness.

Extensive experimental results demonstrate that PO-VINS outperforms or matches the performance of state-of-the-art methods while reducing computational load by 33-56%. The improved efficiency and robustness of PO-VINS make it a promising solution for real-time navigation tasks on resource-constrained platforms, such as mobile robots and autonomous vehicles.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

PO-VINS: An Efficient and Robust Pose-Only Visual-Inertial State Estimator With LiDAR Enhancement

Hailiang Tang, Tisheng Zhang, Liqiang Wang, Guan Wang, Xiaoji Niu

The pose adjustment (PA) with a pose-only visual representation has been proven equivalent to the bundle adjustment (BA), while significantly improving the computational efficiency. However, the pose-only solution has not yet been properly considered in a tightly-coupled visual-inertial state estimator (VISE) with a normal configuration for real-time navigation. In this study, we propose a tightly-coupled LiDAR-enhanced VISE, named PO-VINS, with a full pose-only form for visual and LiDAR-depth measurements. Based on the pose-only visual representation, we derive the analytical depth uncertainty, which is then employed for rejecting LiDAR depth outliers. Besides, we propose a multi-state constraint (MSC)-based LiDAR-depth measurement model with a pose-only form, to balance efficiency and robustness. The pose-only visual and LiDAR-depth measurements and the IMU-preintegration measurements are tightly integrated under the factor graph optimization framework to perform efficient and accurate state estimation. Exhaustive experimental results on private and public datasets indicate that the proposed PO-VINS yields improved or comparable accuracy to sate-of-the-art methods. Compared to the baseline method LE-VINS, the state-estimation efficiency of PO-VINS is improved by 33% and 56% on the laptop PC and the onboard ARM computer, respectively. Besides, PO-VINS yields higher accuracy and robustness than LE-VINS by employing the proposed uncertainty-based outlier-culling method and the MSC-based measurement model for LiDAR depth.

9/12/2024

Visual-Inertial SLAM as Simple as A, B, VINS

Nathaniel Merrill, Guoquan Huang

We present AB-VINS, a different kind of visual-inertial SLAM system. Unlike most VINS systems which only use hand-crafted techniques, AB-VINS makes use of three different deep networks. Instead of estimating sparse feature positions, AB-VINS only estimates the scale and bias parameters (a and b) of monocular depth maps, as well as other terms to correct the depth using multi-view information which results in a compressed feature state. Despite being an optimization-based system, the main VIO thread of AB-VINS surpasses the efficiency of a state-of-the-art filter-based method while also providing dense depth. While state-of-the-art loop-closing SLAM systems have to relinearize a number of variables linear the number of keyframes, AB-VINS can perform loop closures while only affecting a constant number of variables. This is due to a novel data structure called the memory tree, in which the keyframe poses are defined relative to each other rather than all in one global frame, allowing for all but a few states to be fixed. AB-VINS is not as accurate as state-of-the-art VINS systems, but it is shown through careful experimentation to be more robust.

6/18/2024

📉

VINS-Multi: A Robust Asynchronous Multi-camera-IMU State Estimator

Luqi Wang, Yang Xu, Shaojie Shen

State estimation is a critical foundational module in robotics applications, where robustness and performance are paramount. Although in recent years, many works have been focusing on improving one of the most widely adopted state estimation methods, visual inertial odometry (VIO), by incorporating multiple cameras, these efforts predominantly address synchronous camera systems. Asynchronous cameras, which offer simpler hardware configurations and enhanced resilience, have been largely overlooked. To fill this gap, this paper presents VINS-Multi, a novel multi-camera-IMU state estimator for asynchronous cameras. The estimator comprises parallel front ends, a front end coordinator, and a back end optimization module capable of handling asynchronous input frames. It utilizes the frames effectively through a dynamic feature number allocation and a frame priority coordination strategy. The proposed estimator is integrated into a customized quadrotor platform and tested in multiple realistic and challenging scenarios to validate its practicality. Additionally, comprehensive benchmark results are provided to showcase the robustness and superior performance of the proposed estimator.

5/24/2024

🚀

Local Observability of VINS and LINS

Xinran Li

This work analyzes unobservable directions of Vision-aided Inertial Navigation System (VINS) and Lidar-aided Inertial Navigation System (LINS) nonlinear model. Under the assumption that there exist two features observed by the camera without occlusion, the unobservable directions of VINS are uniformly globally translation and global rotations about the gravity vector. The unobservable directions of LINS are same as VINS, while only one feature need to be observed. Also, a constraint in Observability-Constrained VINS (OC-VINS) is proved.

4/12/2024