Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging

2404.19541

Published 5/1/2024 by Rayan Armani, Changlin Qian, Jiaxi Jiang, Christian Holz

Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging

Abstract

While camera-based capture systems remain the gold standard for recording human motion, learning-based tracking systems based on sparse wearable sensors are gaining popularity. Most commonly, they use inertial sensors, whose propensity for drift and jitter have so far limited tracking accuracy. In this paper, we propose Ultra Inertial Poser, a novel 3D full body pose estimation method that constrains drift and jitter in inertial tracking via inter-sensor distances. We estimate these distances across sparse sensor setups using a lightweight embedded tracker that augments inexpensive off-the-shelf 6D inertial measurement units with ultra-wideband radio-based ranging$-$dynamically and without the need for stationary reference anchors. Our method then fuses these inter-sensor distances with the 3D states estimated from each sensor Our graph-based machine learning model processes the 3D states and distances to estimate a person's 3D full body pose and translation. To train our model, we synthesize inertial measurements and distance estimates from the motion capture database AMASS. For evaluation, we contribute a novel motion dataset of 10 participants who performed 25 motion types, captured by 6 wearable IMU+UWB trackers and an optical motion capture system, totaling 200 minutes of synchronized sensor data (UIP-DB). Our extensive experiments show state-of-the-art performance for our method over PIP and TIP, reducing position error from $13.62$ to $10.65cm$ ($22%$ better) and lowering jitter from $1.56$ to $0.055km/s^3$ (a reduction of $97%$).

Create account to get full access

Overview

This paper presents "Ultra Inertial Poser", a scalable system for motion capture and tracking using sparse inertial sensors and ultra-wideband (UWB) ranging.
The system aims to provide a cost-effective and versatile alternative to traditional marker-based motion capture systems, which can be expensive and require complex setups.
By leveraging the combination of inertial measurement units (IMUs) and UWB technology, the authors demonstrate how they can achieve accurate human pose estimation and tracking in a variety of environments.

Plain English Explanation

The paper describes a new technology called "Ultra Inertial Poser" that can track human movement and posture using a small number of sensors. Traditionally, motion capture systems used specialized cameras and reflective markers to track body movements, but these systems can be costly and require a lot of setup.

The Ultra Inertial Poser system instead uses a combination of two technologies: inertial measurement units (IMUs) and ultra-wideband (UWB) ranging. IMUs are small sensors that can measure things like acceleration and rotation, while UWB is a radio technology that can precisely measure the distance between devices.

By using these two technologies together, the researchers were able to create a motion tracking system that is more affordable and easier to set up than traditional systems. The IMUs provide information about the movement of different body parts, while the UWB ranging helps to determine the overall position and orientation of the person being tracked.

The researchers tested their system in various scenarios, including both indoor and outdoor environments, and found that it was able to accurately capture the movements and poses of people. This kind of technology could be useful for a wide range of applications, such as virtual reality, sports training, and rehabilitation.

Technical Explanation

The core of the Ultra Inertial Poser system is the fusion of inertial measurement units (IMUs) and ultra-wideband (UWB) ranging. IMUs provide information about the acceleration, angular velocity, and orientation of different body parts, while UWB ranging is used to estimate the overall position and orientation of the subject.

The authors propose a novel optimization-based approach to integrate the IMU and UWB data, which they call a "sparse Kalman filter". This filter is designed to handle the limited number of IMU sensors (typically around 6-10) and the sparse UWB anchors (around 4-8) deployed in the environment.

The optimization problem is formulated to jointly estimate the subject's full-body pose, including the global position and orientation, as well as the local joint angles. This is achieved by minimizing the error between the sensor measurements and the predicted values based on the estimated pose.

The authors also introduce several techniques to improve the robustness and scalability of the system, such as:

Adaptive sensor selection to prioritize the most reliable IMU and UWB measurements
Recursive optimization to efficiently update the pose estimate as new sensor data arrives
Inertial-based initialization to quickly bootstrap the pose estimation process

The researchers evaluated their system in a variety of environments, including both indoor and outdoor settings, and compared its performance to a professional motion capture system. The results show that the Ultra Inertial Poser can achieve comparable accuracy while being more scalable and cost-effective.

Critical Analysis

The paper presents a well-designed and comprehensive approach to motion capture and tracking using sparse inertial sensors and UWB ranging. The authors have addressed several key challenges, such as handling the limited number of sensors, optimizing the pose estimation, and improving the system's robustness and scalability.

One potential limitation of the system is its reliance on UWB technology, which may not be readily available or accessible in all environments. The authors acknowledge this and suggest the potential use of alternative ranging technologies, such as monocular video or simulated avatars, as a future research direction.

Additionally, the paper does not provide a detailed analysis of the system's performance in more complex or challenging scenarios, such as rapid movements, occlusions, or large groups of people. Further research and evaluation in these areas would help to better understand the limitations and potential use cases of the Ultra Inertial Poser system.

Conclusion

The Ultra Inertial Poser presented in this paper offers a promising solution for scalable and cost-effective motion capture and tracking, leveraging the combination of inertial sensors and UWB ranging. The system's ability to accurately capture human pose and movement in a variety of environments, while requiring a relatively sparse sensor setup, makes it a compelling alternative to traditional motion capture systems.

As the authors suggest, the integration of other sensing modalities, such as monocular video or simulated avatars, could further enhance the system's capabilities and broaden its potential applications in areas like virtual reality, sports analytics, and healthcare. Overall, this research represents an important step forward in developing scalable and accessible motion capture solutions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

DragPoser: Motion Reconstruction from Variable Sparse Tracking Signals via Latent Space Optimization

Jose Luis Ponton, Eduard Pujol, Andreas Aristidou, Carlos Andujar, Nuria Pelechano

High-quality motion reconstruction that follows the user's movements can be achieved by high-end mocap systems with many sensors. However, obtaining such animation quality with fewer input devices is gaining popularity as it brings mocap closer to the general public. The main challenges include the loss of end-effector accuracy in learning-based approaches, or the lack of naturalness and smoothness in IK-based solutions. In addition, such systems are often finely tuned to a specific number of trackers and are highly sensitive to missing data e.g., in scenarios where a sensor is occluded or malfunctions. In response to these challenges, we introduce DragPoser, a novel deep-learning-based motion reconstruction system that accurately represents hard and dynamic on-the-fly constraints, attaining real-time high end-effectors position accuracy. This is achieved through a pose optimization process within a structured latent space. Our system requires only one-time training on a large human motion dataset, and then constraints can be dynamically defined as losses, while the pose is iteratively refined by computing the gradients of these losses within the latent space. To further enhance our approach, we incorporate a Temporal Predictor network, which employs a Transformer architecture to directly encode temporality within the latent space. This network ensures the pose optimization is confined to the manifold of valid poses and also leverages past pose data to predict temporally coherent poses. Results demonstrate that DragPoser surpasses both IK-based and the latest data-driven methods in achieving precise end-effector positioning, while it produces natural poses and temporally coherent motion. In addition, our system showcases robustness against on-the-fly constraint modifications, and exhibits exceptional adaptability to various input configurations and changes.

6/24/2024

cs.GR cs.AI cs.CV

Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs

Yiming Bao, Xu Zhao, Dahong Qian

Temporal 3D human pose estimation from monocular videos is a challenging task in human-centered computer vision due to the depth ambiguity of 2D-to-3D lifting. To improve accuracy and address occlusion issues, inertial sensor has been introduced to provide complementary source of information. However, it remains challenging to integrate heterogeneous sensor data for producing physically rational 3D human poses. In this paper, we propose a novel framework, Real-time Optimization and Fusion (RTOF), to address this issue. We first incorporate sparse inertial orientations into a parametric human skeleton to refine 3D poses in kinematics. The poses are then optimized by energy functions built on both visual and inertial observations to reduce the temporal jitters. Our framework outputs smooth and biomechanically plausible human motion. Comprehensive experiments with ablation studies demonstrate its rationality and efficiency. On Total Capture dataset, the pose estimation error is significantly decreased compared to the baseline method.

4/30/2024

cs.CV cs.HC

👀

Reconstructing Human Pose from Inertial Measurements: A Generative Model-based Compressive Sensing Approach

Nguyen Quang Hieu, Dinh Thai Hoang, Diep N. Nguyen, Mohammad Abu Alsheikh

The ability to sense, localize, and estimate the 3D position and orientation of the human body is critical in virtual reality (VR) and extended reality (XR) applications. This becomes more important and challenging with the deployment of VR/XR applications over the next generation of wireless systems such as 5G and beyond. In this paper, we propose a novel framework that can reconstruct the 3D human body pose of the user given sparse measurements from Inertial Measurement Unit (IMU) sensors over a noisy wireless environment. Specifically, our framework enables reliable transmission of compressed IMU signals through noisy wireless channels and effective recovery of such signals at the receiver, e.g., an edge server. This task is very challenging due to the constraints of transmit power, recovery accuracy, and recovery latency. To address these challenges, we first develop a deep generative model at the receiver to recover the data from linear measurements of IMU signals. The linear measurements of the IMU signals are obtained by a linear projection with a measurement matrix based on the compressive sensing theory. The key to the success of our framework lies in the novel design of the measurement matrix at the transmitter, which can not only satisfy power constraints for the IMU devices but also obtain a highly accurate recovery for the IMU signals at the receiver. This can be achieved by extending the set-restricted eigenvalue condition of the measurement matrix and combining it with an upper bound for the power transmission constraint. Our framework can achieve robust performance for recovering 3D human poses from noisy compressed IMU signals. Additionally, our pre-trained deep generative model achieves signal reconstruction accuracy comparable to an optimization-based approach, i.e., Lasso, but is an order of magnitude faster.

5/14/2024

cs.HC

❗

Utilizing acceleration measurements to improve TDOA based localization

Marcin Kolakowski

In this paper localization using UWB positioning system and an inertial unit containing a single accelerometer is considered. The main part of the paper describes a novel algorithm for person localization. The algorithm is based on modified Extended Kalman Filter and utilizes TDOA (Time Difference of Arrival) results obtained from UWB system and results of acceleration measurement performed by the localized tag device. The proposed algorithm has been experimentally investigated through simulation and experiments. The results are included in the paper.

4/1/2024

eess.SP