Reconstructing Human Pose from Inertial Measurements: A Generative Model-based Compressive Sensing Approach

2310.20228

Published 5/14/2024 by Nguyen Quang Hieu, Dinh Thai Hoang, Diep N. Nguyen, Mohammad Abu Alsheikh

👀

Abstract

The ability to sense, localize, and estimate the 3D position and orientation of the human body is critical in virtual reality (VR) and extended reality (XR) applications. This becomes more important and challenging with the deployment of VR/XR applications over the next generation of wireless systems such as 5G and beyond. In this paper, we propose a novel framework that can reconstruct the 3D human body pose of the user given sparse measurements from Inertial Measurement Unit (IMU) sensors over a noisy wireless environment. Specifically, our framework enables reliable transmission of compressed IMU signals through noisy wireless channels and effective recovery of such signals at the receiver, e.g., an edge server. This task is very challenging due to the constraints of transmit power, recovery accuracy, and recovery latency. To address these challenges, we first develop a deep generative model at the receiver to recover the data from linear measurements of IMU signals. The linear measurements of the IMU signals are obtained by a linear projection with a measurement matrix based on the compressive sensing theory. The key to the success of our framework lies in the novel design of the measurement matrix at the transmitter, which can not only satisfy power constraints for the IMU devices but also obtain a highly accurate recovery for the IMU signals at the receiver. This can be achieved by extending the set-restricted eigenvalue condition of the measurement matrix and combining it with an upper bound for the power transmission constraint. Our framework can achieve robust performance for recovering 3D human poses from noisy compressed IMU signals. Additionally, our pre-trained deep generative model achieves signal reconstruction accuracy comparable to an optimization-based approach, i.e., Lasso, but is an order of magnitude faster.

Create account to get full access

Overview

This paper proposes a novel framework for reconstructing 3D human body pose from sparse measurements of Inertial Measurement Unit (IMU) sensors in a noisy wireless environment.
The key innovations include a deep generative model for accurate and efficient signal recovery, and a novel design of the measurement matrix to satisfy power constraints while enabling high-accuracy recovery.
The proposed approach can reliably transmit compressed IMU signals over noisy wireless channels and effectively recover the original signals, enabling robust 3D human pose estimation.

Plain English Explanation

In virtual reality (VR) and extended reality (XR) applications, it is critical to be able to accurately sense, locate, and estimate the 3D position and orientation of the human body. This becomes even more important and challenging as these applications are deployed over the next generation of wireless systems like 5G and beyond.

The researchers in this paper have developed a new framework to address this challenge. Their idea is to use small, wearable IMU sensors to measure the user's body movements, and then transmit these measurements wirelessly to a receiving device, like an edge server. However, transmitting the sensor data wirelessly can be tricky due to noise and signal loss in the wireless channel.

To solve this problem, the researchers first compress the IMU sensor data using a technique called compressive sensing. This allows them to transmit a smaller amount of data while still preserving the important information about the user's body pose. At the receiving end, they use a deep learning model to efficiently reconstruct the original 3D body pose from the compressed sensor data, even in the presence of wireless noise.

The key innovation is the design of the compression process itself. The researchers have found a way to optimize the compression in a way that not only meets the power constraints of the IMU sensors, but also enables highly accurate reconstruction of the 3D body pose at the receiver. This allows their framework to achieve robust performance for recovering 3D human poses from the noisy, compressed IMU sensor data.

Technical Explanation

The proposed framework leverages compressive sensing to efficiently transmit IMU sensor data over noisy wireless channels. At the transmitter, the IMU signals are linearly projected onto a carefully designed measurement matrix to obtain compressed measurements.

At the receiver, a deep generative model is used to recover the original IMU signals from these linear measurements. The key innovation is the novel design of the measurement matrix, which satisfies the power constraints of the IMU devices while also enabling highly accurate signal recovery.

Specifically, the researchers extend the set-restricted eigenvalue condition of the measurement matrix and combine it with an upper bound for the power transmission constraint. This allows them to optimize the measurement matrix to balance the competing objectives of power efficiency and reconstruction accuracy.

The deep generative model used for signal recovery achieves performance comparable to an optimization-based approach (Lasso), but is significantly faster. This enables real-time 3D human pose estimation from the recovered IMU signals, even in the presence of wireless noise.

Critical Analysis

The proposed framework addresses an important challenge in VR/XR applications, where accurate 3D human pose estimation is crucial but can be hindered by the constraints of wireless transmission. The researchers have made several innovative contributions, including the novel measurement matrix design and the use of a deep generative model for efficient signal recovery.

However, the paper does not explore the performance of the framework in realistic, end-to-end VR/XR scenarios. The experiments are conducted in a simulated wireless environment, and the impact of factors like sensor placement, user movement, and system integration are not addressed. Additional real-world validation would be important to fully assess the practical viability of the approach.

Furthermore, the paper does not discuss potential privacy and security implications of its proposed framework. Transmitting sensitive human motion data over wireless channels raises concerns about data privacy and the risk of unauthorized access or misuse. Addressing these considerations would be an important area for future research.

Overall, this paper presents a valuable contribution to the field of 3D human pose estimation for VR/XR applications. The technical innovations, particularly the measurement matrix design and deep generative model, demonstrate the potential for robust and efficient wireless-based pose reconstruction. However, further research is needed to fully validate the approach and address the broader implications of such technology.

Conclusion

This paper introduces a novel framework for reconstructing 3D human body pose from sparse, noisy IMU sensor data transmitted over wireless channels. The key innovations include a deep generative model for efficient signal recovery and a novel measurement matrix design that balances power constraints and reconstruction accuracy.

The proposed approach enables reliable transmission and effective recovery of compressed IMU signals, allowing for robust 3D human pose estimation in VR/XR applications. This is an important advancement, as accurate 3D pose sensing is critical for providing immersive and responsive experiences in these emerging technologies.

While the paper demonstrates promising technical results, further research is needed to validate the framework in realistic end-to-end VR/XR scenarios and address potential privacy and security concerns. Nonetheless, this work represents a significant step forward in addressing the challenges of wireless-based 3D human pose estimation, with the potential to enhance the user experience and capabilities of future VR/XR systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs

Yiming Bao, Xu Zhao, Dahong Qian

Temporal 3D human pose estimation from monocular videos is a challenging task in human-centered computer vision due to the depth ambiguity of 2D-to-3D lifting. To improve accuracy and address occlusion issues, inertial sensor has been introduced to provide complementary source of information. However, it remains challenging to integrate heterogeneous sensor data for producing physically rational 3D human poses. In this paper, we propose a novel framework, Real-time Optimization and Fusion (RTOF), to address this issue. We first incorporate sparse inertial orientations into a parametric human skeleton to refine 3D poses in kinematics. The poses are then optimized by energy functions built on both visual and inertial observations to reduce the temporal jitters. Our framework outputs smooth and biomechanically plausible human motion. Comprehensive experiments with ablation studies demonstrate its rationality and efficiency. On Total Capture dataset, the pose estimation error is significantly decreased compared to the baseline method.

4/30/2024

cs.CV cs.HC

Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging

Rayan Armani, Changlin Qian, Jiaxi Jiang, Christian Holz

While camera-based capture systems remain the gold standard for recording human motion, learning-based tracking systems based on sparse wearable sensors are gaining popularity. Most commonly, they use inertial sensors, whose propensity for drift and jitter have so far limited tracking accuracy. In this paper, we propose Ultra Inertial Poser, a novel 3D full body pose estimation method that constrains drift and jitter in inertial tracking via inter-sensor distances. We estimate these distances across sparse sensor setups using a lightweight embedded tracker that augments inexpensive off-the-shelf 6D inertial measurement units with ultra-wideband radio-based ranging$-$dynamically and without the need for stationary reference anchors. Our method then fuses these inter-sensor distances with the 3D states estimated from each sensor Our graph-based machine learning model processes the 3D states and distances to estimate a person's 3D full body pose and translation. To train our model, we synthesize inertial measurements and distance estimates from the motion capture database AMASS. For evaluation, we contribute a novel motion dataset of 10 participants who performed 25 motion types, captured by 6 wearable IMU+UWB trackers and an optical motion capture system, totaling 200 minutes of synchronized sensor data (UIP-DB). Our extensive experiments show state-of-the-art performance for our method over PIP and TIP, reducing position error from $13.62$ to $10.65cm$ ($22%$ better) and lowering jitter from $1.56$ to $0.055km/s^3$ (a reduction of $97%$).

5/1/2024

cs.CV cs.AI cs.GR eess.SP

🏋️

Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation

Xianzhou Zeng, Hao Qin, Ming Kong, Luyuan Chen, Qiang Zhu

The accuracy and robustness of 3D human pose estimation (HPE) are limited by 2D pose detection errors and 2D to 3D ill-posed challenges, which have drawn great attention to Multi-Hypothesis HPE research. Most existing MH-HPE methods are based on generative models, which are computationally expensive and difficult to train. In this study, we propose a Probabilistic Restoration 3D Human Pose Estimation framework (PRPose) that can be integrated with any lightweight single-hypothesis model. Specifically, PRPose employs a weakly supervised approach to fit the hidden probability distribution of the 2D-to-3D lifting process in the Single-Hypothesis HPE model and then reverse-map the distribution to the 2D pose input through an adaptive noise sampling strategy to generate reasonable multi-hypothesis samples effectively. Extensive experiments on 3D HPE benchmarks (Human3.6M and MPI-INF-3DHP) highlight the effectiveness and efficiency of PRPose. Code is available at: https://github.com/xzhouzeng/PRPose.

5/6/2024

cs.CV

🚀

Multimodal Active Measurement for Human Mesh Recovery in Close Proximity

Takahiro Maeda, Keisuke Takeshita, Kazuhito Tanaka

For physical human-robot interactions (pHRI), a robot needs to estimate the accurate body pose of a target person. However, in these pHRI scenarios, the robot cannot fully observe the target person's body with equipped cameras because the target person must be close to the robot for physical interaction. This closeness leads to severe truncation and occlusions and thus results in poor accuracy of human pose estimation. For better accuracy in this challenging environment, we propose an active measurement and sensor fusion framework of the equipped cameras with touch and ranging sensors such as 2D LiDAR. Touch and ranging sensor measurements are sparse, but reliable and informative cues for localizing human body parts. In our active measurement process, camera viewpoints and sensor placements are dynamically optimized to measure body parts with higher estimation uncertainty, which is closely related to truncation or occlusion. In our sensor fusion process, assuming that the measurements of touch and ranging sensors are more reliable than the camera-based estimations, we fuse the sensor measurements to the camera-based estimated pose by aligning the estimated pose towards the measured points. Our proposed method outperformed previous methods on the standard occlusion benchmark with simulated active measurement. Furthermore, our method reliably estimated human poses using a real robot even with practical constraints such as occlusion by blankets.

5/13/2024

cs.RO cs.CV