OptiState: State Estimation of Legged Robots using Gated Networks with Transformer-based Vision and Kalman Filtering

Read original: arXiv:2401.16719 - Published 4/30/2024 by Alexander Schperberg, Yusuke Tanaka, Saviz Mowlavi, Feng Xu, Bharathan Balaji, Dennis Hong

OptiState: State Estimation of Legged Robots using Gated Networks with Transformer-based Vision and Kalman Filtering

Overview

This paper proposes a novel state estimation framework called OptiState for legged robots, leveraging gated networks with transformer-based vision and Kalman filtering.
The approach aims to address the challenges of accurate state estimation in legged robots, which is crucial for control and navigation.
OptiState integrates visual cues from a transformer-based vision module with inertial measurements and contact information to estimate the robot's full state, including position, orientation, and joint angles.

Plain English Explanation

The researchers developed a new system called OptiState to help legged robots, like four-legged or two-legged robots, understand their own position, orientation, and joint movements. This is an important capability for robots to navigate and control their movements effectively.

OptiState combines different types of sensor data to get a more accurate overall picture of the robot's state. It uses a transformer-based vision module to analyze visual information, and then combines that with data from the robot's internal sensors like accelerometers and joint position sensors. It also incorporates information about when the robot's feet make contact with the ground.

By bringing all of these different data sources together using a gated network and Kalman filtering, OptiState can estimate the robot's full state - including its position, orientation, and joint angles - more accurately than previous approaches. This can help legged robots move around and interact with their environment more effectively.

Technical Explanation

The paper presents a state estimation framework called OptiState that integrates visual cues from a transformer-based vision module, inertial measurements, and contact information to estimate the full state of legged robots, including position, orientation, and joint angles.

The core components of OptiState include:

A transformer-based vision module that extracts visual features from camera images
A gated network that fuses the visual features with inertial measurements and contact information
A Kalman filtering module that combines the fused sensor data to estimate the robot's state

The transformer-based vision module uses a convolutional neural network with a transformer architecture to process camera images and extract relevant visual features. These features are then passed to the gated network, which learns to integrate the visual, inertial, and contact data to produce an estimate of the robot's full state.

The Kalman filtering module then takes the output of the gated network and applies recursive state estimation to smooth and refine the state estimate, leveraging the different sensor modalities' complementary strengths and weaknesses.

The researchers evaluate OptiState on simulated and real-world legged robot datasets, demonstrating improved state estimation accuracy compared to previous approaches that relied on more limited sensor suites or less sophisticated fusion techniques.

Critical Analysis

The paper provides a comprehensive and well-designed state estimation framework for legged robots, addressing key challenges in this domain. The integration of transformer-based vision, gated networks, and Kalman filtering represents a novel and promising approach.

However, the paper does not delve deeply into the limitations of the proposed system. For example, the performance of OptiState may degrade in scenarios with significant occlusions or rapidly changing lighting conditions, which could impact the reliability of the vision-based state estimates. Additionally, the computational complexity of the transformer-based vision module and the gated network may pose challenges for real-time deployment on resource-constrained robot platforms.

Further research is needed to explore the robustness and resilience of the system to different environmental conditions and to optimize the system's performance and efficiency for real-world applications.

Conclusion

The OptiState framework presented in this paper offers a promising approach to state estimation for legged robots, leveraging the complementary strengths of visual, inertial, and contact sensing data. By integrating transformer-based vision, gated networks, and Kalman filtering, the system can accurately estimate the full state of legged robots, which is crucial for enabling robust control and navigation capabilities.

While the paper demonstrates the effectiveness of the proposed approach, further research is needed to address potential limitations and optimize the system for real-world deployment. Nonetheless, the core ideas and methodologies presented in this work pave the way for advancements in the field of legged robot state estimation, with promising implications for the future of autonomous mobile robotics.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

OptiState: State Estimation of Legged Robots using Gated Networks with Transformer-based Vision and Kalman Filtering

Alexander Schperberg, Yusuke Tanaka, Saviz Mowlavi, Feng Xu, Bharathan Balaji, Dennis Hong

State estimation for legged robots is challenging due to their highly dynamic motion and limitations imposed by sensor accuracy. By integrating Kalman filtering, optimization, and learning-based modalities, we propose a hybrid solution that combines proprioception and exteroceptive information for estimating the state of the robot's trunk. Leveraging joint encoder and IMU measurements, our Kalman filter is enhanced through a single-rigid body model that incorporates ground reaction force control outputs from convex Model Predictive Control optimization. The estimation is further refined through Gated Recurrent Units, which also considers semantic insights and robot height from a Vision Transformer autoencoder applied on depth images. This framework not only furnishes accurate robot state estimates, including uncertainty evaluations, but can minimize the nonlinear errors that arise from sensor measurements and model simplifications through learning. The proposed methodology is evaluated in hardware using a quadruped robot on various terrains, yielding a 65% improvement on the Root Mean Squared Error compared to our VIO SLAM baseline. Code example: https://github.com/AlexS28/OptiState

4/30/2024

Simultaneous State Estimation and Contact Detection for Legged Robots by Multiple-Model Kalman Filtering

Marcel Menner, Karl Berntorp

This paper proposes an algorithm for combined contact detection and state estimation for legged robots. The proposed algorithm models the robot's movement as a switched system, in which different modes relate to different feet being in contact with the ground. The key element in the proposed algorithm is an interacting multiple-model Kalman filter, which identifies the currently-active mode defining contacts, while estimating the state. The rationale for the proposed estimation framework is that contacts (and contact forces) impact the robot's state and vice versa. This paper presents validation studies with a quadruped using (i) the high-fidelity simulator Gazebo for a comparison with ground truth values and a baseline estimator, and (ii) hardware experiments with the Unitree A1 robot. The simulation study shows that the proposed algorithm outperforms the baseline estimator, which does not simultaneous detect contacts. The hardware experiments showcase the applicability of the proposed algorithm and highlights the ability to detect contacts.

4/5/2024

The Kinetics Observer: A Tightly Coupled Estimator for Legged Robots

Arnaud Demont (CNRS-AIST JRL, LISV), Mehdi Benallegue (CNRS-AIST JRL), Abdelaziz Benallegue (LISV, UVSQ), Pierre Gergondet (CNRS-AIST JRL), Antonin Dallard (LIRMM), Rafael Cisneros (CNRS-AIST JRL), Masaki Murooka (CNRS-AIST JRL), Fumio Kanehiro (CNRS-AIST JRL)

In this paper, we propose the Kinetics Observer, a novel estimator addressing the challenge of state estimation for legged robots using proprioceptive sensors (encoders, IMU and force/torque sensors). Based on a Multiplicative Extended Kalman Filter, the Kinetics Observer allows the real-time simultaneous estimation of contact and perturbation forces, and of the robot's kinematics, which are accurate enough to perform proprioceptive odometry. Thanks to a visco-elastic model of the contacts linking their kinematics to the ones of the centroid of the robot, the Kinetics Observer ensures a tight coupling between the whole-body kinematics and dynamics of the robot. This coupling entails a redundancy of the measurements that enhances the robustness and the accuracy of the estimation. This estimator was tested on two humanoid robots performing long distance walking on even terrain and non-coplanar multi-contact locomotion.

6/21/2024

Optimized Kalman Filter based State Estimation and Height Control in Hopping Robots

Samuel Burns, Matthew Woodward

Quadrotor-based multimodal hopping and flying locomotion significantly improves efficiency and operation time as compared to purely flying systems. However, effective control necessitates continuous estimation of the vertical states. A single hopping state estimator has been shown (Kang 2024), in which two vertical states (position, acceleration) are measured and only velocity is estimated using a moving horizon estimation and visual inertial odometry at 200 Hz. This technique requires complex sensors (IMU, lidar, depth camera, contact force sensor), and computationally intensive calculations (12-core, 5 GHz processor), for a maximum hop height of $sim$0.6 m at 3.65 kg. Here we show a trained Kalman filter based hopping vertical state estimator (HVSE), requiring only vertical acceleration measurements. Our results show the HVSE can estimate more states (position, velocity) with a mean-absolute-error in the hop apex ratio (height error/ground truth) of 12.5%, running $sim$4.2x faster (840 Hz) on a substantially less powerful processor (dual-core 240 MHz) with over $sim$6.7x the hopping height (4.02 m) at 20% of the mass (672 g). The presented general HVSE, and training procedure are broadly applicable to jumping, hopping, and legged robots across a wide range of sizes and hopping heights.

8/23/2024