Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning

2405.16754

Published 5/28/2024 by Youqi Pan, Wugen Zhou, Yingdian Cao, Hongbin Zha

Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning

Abstract

Visual-inertial odometry (VIO) has demonstrated remarkable success due to its low-cost and complementary sensors. However, existing VIO methods lack the generalization ability to adjust to different environments and sensor attributes. In this paper, we propose Adaptive VIO, a new monocular visual-inertial odometry that combines online continual learning with traditional nonlinear optimization. Adaptive VIO comprises two networks to predict visual correspondence and IMU bias. Unlike end-to-end approaches that use networks to fuse the features from two modalities (camera and IMU) and predict poses directly, we combine neural networks with visual-inertial bundle adjustment in our VIO system. The optimized estimates will be fed back to the visual and IMU bias networks, refining the networks in a self-supervised manner. Such a learning-optimization-combined framework and feedback mechanism enable the system to perform online continual learning. Experiments demonstrate that our Adaptive VIO manifests adaptive capability on EuRoC and TUM-VI datasets. The overall performance exceeds the currently known learning-based VIO methods and is comparable to the state-of-the-art optimization-based methods.

Create account to get full access

Overview

Presents a deep learning approach called Adaptive VIO for visual-inertial odometry (VIO) with online continual learning
Aims to improve robustness and adaptability of VIO systems to changing environments and sensor characteristics
Introduces a novel architecture and training strategy to enable the model to continuously learn and adapt over time

Plain English Explanation

The paper describes a new deep learning-based system called Adaptive VIO for visual-inertial odometry (VIO) - the task of estimating the 6D pose (position and orientation) of a moving camera using both visual and inertial (accelerometer and gyroscope) sensors.

The key innovation is the ability for the Adaptive VIO system to continuously learn and adapt over time, rather than being limited to a fixed set of training data. This allows the system to maintain high performance as the environment or sensor characteristics change, which is important for real-world applications like autonomous navigation.

The authors achieve this through a novel neural network architecture and training strategy called "online continual learning". The model is designed to efficiently update its internal representations and parameters as new data becomes available, without catastrophically forgetting past knowledge. This enables the system to continuously refine and improve its VIO estimates over an extended period of usage.

Technical Explanation

The Adaptive VIO system is built upon a deep learning architecture that takes in visual and inertial sensor data and outputs 6D pose estimates. The core of the system is a dual-branch neural network, with separate branches for processing the visual and inertial data streams. The outputs of these branches are then combined to produce the final pose estimate.

To enable online continual learning, the authors introduce several key innovations:

Dual-Pathway Network: The dual-branch design allows the visual and inertial pathways to specialize and adapt independently, improving overall robustness.
Online Adaptation Module: This module dynamically updates the network parameters during inference to account for changing conditions, without requiring full retraining.
Auxiliary Prediction Tasks: The network is trained not only on the main VIO task, but also on auxiliary tasks like depth and ego-motion prediction. This promotes the learning of more general, transferable representations.
Distillation-based Training: When updating the network with new data, a distillation loss is used to preserve performance on previous tasks and avoid catastrophic forgetting.

Through extensive experiments, the authors demonstrate that Adaptive VIO outperforms state-of-the-art VIO methods on a variety of benchmarks, especially in scenarios with changing environments or sensor characteristics.

Critical Analysis

The Adaptive VIO approach presents a promising step towards more robust and adaptable visual-inertial odometry systems. The online continual learning capabilities are a significant advancement, as they address a key limitation of many existing deep learning-based VIO methods.

However, the paper does not discuss some potential limitations or areas for further research:

The computational overhead of the online adaptation module and its impact on real-time performance is not thoroughly evaluated.
The adaptability of the system is demonstrated on a limited set of environments and sensor configurations. Its generalization to a wider range of real-world scenarios is not yet clear.
The memory and storage requirements of the continual learning approach are not quantified, which could be an important practical consideration.

Additionally, it would be valuable to see comparisons to other recently proposed VIO methods that also aim to improve robustness, such as DVI-SLAM, Salient Sparse VO, and Attention-based Deep Learning Architecture for Real-Time VIO.

Conclusion

The Adaptive VIO system presented in this paper represents an important advancement in the field of visual-inertial odometry. By incorporating online continual learning capabilities, the system can adapt to changing environments and sensor characteristics, making it more robust and reliable for real-world applications.

The technical innovations, such as the dual-pathway network and distillation-based training, enable the model to continuously refine its VIO estimates over time. This is a significant step forward compared to traditional deep learning-based VIO methods, which are typically limited to a fixed set of training data.

While the paper highlights the promising performance of Adaptive VIO, further research is needed to fully understand its practical limitations and how it compares to other state-of-the-art approaches. Nonetheless, this work represents an important contribution to the ongoing efforts to develop more adaptable and resilient visual-inertial odometry systems for autonomous navigation and robotics.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤖

VIO-DualProNet: Visual-Inertial Odometry with Learning Based Process Noise Covariance

Dan Solodar, Itzik Klein

Visual-inertial odometry (VIO) is a vital technique used in robotics, augmented reality, and autonomous vehicles. It combines visual and inertial measurements to accurately estimate position and orientation. Existing VIO methods assume a fixed noise covariance for the inertial uncertainty. However, accurately determining in real-time the noise variance of the inertial sensors presents a significant challenge as the uncertainty changes throughout the operation leading to suboptimal performance and reduced accuracy. To circumvent this, we propose VIO-DualProNet, a novel approach that utilizes deep learning methods to dynamically estimate the inertial noise uncertainty in real-time. By designing and training a deep neural network to predict inertial noise uncertainty using only inertial sensor measurements, and integrating it into the VINS-Mono algorithm, we demonstrate a substantial improvement in accuracy and robustness, enhancing VIO performance and potentially benefiting other VIO-based systems for precise localization and mapping across diverse conditions.

4/30/2024

cs.RO cs.SY eess.SY

Low Latency Visual Inertial Odometry with On-Sensor Accelerated Optical Flow for Resource-Constrained UAVs

Jonas Kuhne, Michele Magno, Luca Benini

Visual Inertial Odometry (VIO) is the task of estimating the movement trajectory of an agent from an onboard camera stream fused with additional Inertial Measurement Unit (IMU) measurements. A crucial subtask within VIO is the tracking of features, which can be achieved through Optical Flow (OF). As the calculation of OF is a resource-demanding task in terms of computational load and memory footprint, which needs to be executed at low latency, especially in robotic applications, OF estimation is today performed on powerful CPUs or GPUs. This restricts its use in a broad spectrum of applications where the deployment of such powerful, power-hungry processors is unfeasible due to constraints related to cost, size, and power consumption. On-sensor hardware acceleration is a promising approach to enable low latency VIO even on resource-constrained devices such as nano drones. This paper assesses the speed-up in a VIO sensor system exploiting a compact OF sensor consisting of a global shutter camera and an Application Specific Integrated Circuit (ASIC). By replacing the feature tracking logic of the VINS-Mono pipeline with data from this OF camera, we demonstrate a 49.4% reduction in latency and a 53.7% reduction of compute load of the VIO pipeline over the original VINS-Mono implementation, allowing VINS-Mono operation up to 50 FPS instead of 20 FPS on the quad-core ARM Cortex-A72 processor of a Raspberry Pi Compute Module 4.

6/21/2024

cs.CV eess.IV

📈

Online Calibration of a Single-Track Ground Vehicle Dynamics Model by Tight Fusion with Visual-Inertial Odometry

Haolong Li, Joerg Stueckler

Wheeled mobile robots need the ability to estimate their motion and the effect of their control actions for navigation planning. In this paper, we present ST-VIO, a novel approach which tightly fuses a single-track dynamics model for wheeled ground vehicles with visual inertial odometry (VIO). Our method calibrates and adapts the dynamics model online to improve the accuracy of forward prediction conditioned on future control inputs. The single-track dynamics model approximates wheeled vehicle motion under specific control inputs on flat ground using ordinary differential equations. We use a singularity-free and differentiable variant of the single-track model to enable seamless integration as dynamics factor into VIO and to optimize the model parameters online together with the VIO state variables. We validate our method with real-world data in both indoor and outdoor environments with different terrain types and wheels. In experiments, we demonstrate that ST-VIO can not only adapt to wheel or ground changes and improve the accuracy of prediction under new control inputs, but can even improve tracking accuracy.

5/29/2024

cs.RO cs.CV

Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry

Takayuki Kanai, Igor Vasiljevic, Vitor Guizilini, Kazuhiro Shintani

Monocular visual odometry is a key technology in a wide variety of autonomous systems. Relative to traditional feature-based methods, that suffer from failures due to poor lighting, insufficient texture, large motions, etc., recent learning-based SLAM methods exploit iterative dense bundle adjustment to address such failure cases and achieve robust accurate localization in a wide variety of real environments, without depending on domain-specific training data. However, despite its potential, learning-based SLAM still struggles with scenarios involving large motion and object dynamics. In this paper, we diagnose key weaknesses in a popular learning-based SLAM model (DROID-SLAM) by analyzing major failure cases on outdoor benchmarks and exposing various shortcomings of its optimization process. We then propose the use of self-supervised priors leveraging a frozen large-scale pre-trained monocular depth estimation to initialize the dense bundle adjustment process, leading to robust visual odometry without the need to fine-tune the SLAM backbone. Despite its simplicity, our proposed method demonstrates significant improvements on KITTI odometry, as well as the challenging DDAD benchmark. Code and pre-trained models will be released upon publication.

6/4/2024

cs.CV cs.RO