PGD-VIO: An Accurate Plane-Aided Visual-Inertial Odometry with Graph-Based Drift Suppression

Read original: arXiv:2407.17709 - Published 7/26/2024 by Yidi Zhang, Fulin Tang, Zewen Xu, Yihong Wu, Pengju Ma

PGD-VIO: An Accurate Plane-Aided Visual-Inertial Odometry with Graph-Based Drift Suppression

Overview

PGD-VIO is an accurate visual-inertial odometry (VIO) system that uses plane detection and graph-based drift suppression.
It combines data from cameras and inertial measurement units (IMUs) to estimate the 6-degree-of-freedom pose of a mobile device.
The key innovations are the use of planes detected in the environment to improve pose estimation and a graph-based optimization to reduce drift over time.

Plain English Explanation

PGD-VIO is a technology that helps devices like phones or robots keep track of where they are moving. It does this by combining information from cameras and motion sensors.

The cameras are used to detect flat surfaces, like walls or floors, in the environment. Recognizing these planar features provides additional clues about the device's position and orientation that can improve the accuracy of the motion tracking.

Over time, small errors in the motion tracking can build up, causing the device's estimated position to drift away from the true location. To fix this, PGD-VIO uses a graph-based optimization technique. This allows it to periodically correct the motion estimates and keep the tracking accurate even over long distances.

Technical Explanation

PGD-VIO utilizes both visual and inertial data to estimate the 6-degree-of-freedom (6-DoF) pose of a mobile device. The key innovations are the incorporation of planar geometric constraints and a graph-based optimization to mitigate drift.

The system first detects and segments planar regions in the camera images using a plane-fitting algorithm. It then incorporates these detected planes as constraints in the VIO optimization, helping to further constrain the pose estimates.

Additionally, PGD-VIO builds a pose graph over time, connecting consecutive poses with edge constraints. This graph is periodically optimized to correct for accumulated drift, effectively suppressing long-term error growth.

The authors evaluate PGD-VIO on public benchmark datasets, demonstrating improved trajectory and pose estimation accuracy compared to state-of-the-art VIO approaches.

Critical Analysis

The paper provides a thorough technical description of the PGD-VIO system and validates its performance on standard benchmarks. However, it does not explore some potential limitations or areas for future work.

For example, the plane detection algorithm may struggle in environments with few planar surfaces or complex geometries. Additionally, the graph-based optimization assumes static environments, which may not hold in dynamic real-world settings.

Further research could investigate the robustness of PGD-VIO to challenging conditions, such as rapid motions, lighting changes, or occlusions. Integrating semantic understanding of the environment beyond just geometric planes could also be a promising direction to explore.

Conclusion

PGD-VIO presents an innovative approach to visual-inertial odometry that leverages plane detection and graph-based optimization to achieve high-accuracy pose estimation. By incorporating environmental constraints and correcting for drift, the system demonstrates improved tracking performance over prior VIO methods.

While the paper does not address all potential limitations, the core ideas behind PGD-VIO represent an important advancement in the field of mobile robot localization and navigation. Further development and real-world deployment of such techniques could enable more robust and reliable autonomous systems in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

PGD-VIO: An Accurate Plane-Aided Visual-Inertial Odometry with Graph-Based Drift Suppression

Yidi Zhang, Fulin Tang, Zewen Xu, Yihong Wu, Pengju Ma

Generally, high-level features provide more geometrical information compared to point features, which can be exploited to further constrain motions. Planes are commonplace in man-made environments, offering an active means to reduce drift, due to their extensive spatial and temporal observability. To make full use of planar information, we propose a novel visual-inertial odometry (VIO) using an RGBD camera and an inertial measurement unit (IMU), effectively integrating point and plane features in an extended Kalman filter (EKF) framework. Depth information of point features is leveraged to improve the accuracy of point triangulation, while plane features serve as direct observations added into the state vector. Notably, to benefit long-term navigation,a novel graph-based drift detection strategy is proposed to search overlapping and identical structures in the plane map so that the cumulative drift is suppressed subsequently. The experimental results on two public datasets demonstrate that our system outperforms state-of-the-art methods in localization accuracy and meanwhile generates a compact and consistent plane map, free of expensive global bundle adjustment and loop closing techniques.

7/26/2024

Plane2Depth: Hierarchical Adaptive Plane Guidance for Monocular Depth Estimation

Li Liu, Ruijie Zhu, Jiacheng Deng, Ziyang Song, Wenfei Yang, Tianzhu Zhang

Monocular depth estimation aims to infer a dense depth map from a single image, which is a fundamental and prevalent task in computer vision. Many previous works have shown impressive depth estimation results through carefully designed network structures, but they usually ignore the planar information and therefore perform poorly in low-texture areas of indoor scenes. In this paper, we propose Plane2Depth, which adaptively utilizes plane information to improve depth prediction within a hierarchical framework. Specifically, in the proposed plane guided depth generator (PGDG), we design a set of plane queries as prototypes to softly model planes in the scene and predict per-pixel plane coefficients. Then the predicted plane coefficients can be converted into metric depth values with the pinhole camera model. In the proposed adaptive plane query aggregation (APGA) module, we introduce a novel feature interaction approach to improve the aggregation of multi-scale plane features in a top-down manner. Extensive experiments show that our method can achieve outstanding performance, especially in low-texture or repetitive areas. Furthermore, under the same backbone network, our method outperforms the state-of-the-art methods on the NYU-Depth-v2 dataset, achieves competitive results with state-of-the-art methods KITTI dataset and can be generalized to unseen scenes effectively.

9/5/2024

DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment

Jiuming Liu, Dong Zhuo, Zhiheng Feng, Siting Zhu, Chensheng Peng, Zhe Liu, Hesheng Wang

Information inside visual and LiDAR data is well complementary derived from the fine-grained texture of images and massive geometric information in point clouds. However, it remains challenging to explore effective visual-LiDAR fusion, mainly due to the intrinsic data structure inconsistency between two modalities: Image pixels are regular and dense, but LiDAR points are unordered and sparse. To address the problem, we propose a local-to-global fusion network (DVLO) with bi-directional structure alignment. To obtain locally fused features, we project points onto the image plane as cluster centers and cluster image pixels around each center. Image pixels are pre-organized as pseudo points for image-to-point structure alignment. Then, we convert points to pseudo images by cylindrical projection (point-to-image structure alignment) and perform adaptive global feature fusion between point features and local fused features. Our method achieves state-of-the-art performance on KITTI odometry and FlyingThings3D scene flow datasets compared to both single-modal and multi-modal methods. Codes are released at https://github.com/IRMVLab/DVLO.

7/18/2024

IMU-Aided Event-based Stereo Visual Odometry

Junkai Niu, Sheng Zhong, Yi Zhou

Direct methods for event-based visual odometry solve the mapping and camera pose tracking sub-problems by establishing implicit data association in a way that the generative model of events is exploited. The main bottlenecks faced by state-of-the-art work in this field include the high computational complexity of mapping and the limited accuracy of tracking. In this paper, we improve our previous direct pipeline textit{Event-based Stereo Visual Odometry} in terms of accuracy and efficiency. To speed up the mapping operation, we propose an efficient strategy of edge-pixel sampling according to the local dynamics of events. The mapping performance in terms of completeness and local smoothness is also improved by combining the temporal stereo results and the static stereo results. To circumvent the degeneracy issue of camera pose tracking in recovering the yaw component of general 6-DoF motion, we introduce as a prior the gyroscope measurements via pre-integration. Experiments on publicly available datasets justify our improvement. We release our pipeline as an open-source software for future research in this field.

5/8/2024