EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System

Read original: arXiv:2409.00343 - Published 9/6/2024 by Bonan Liu, Handi Yin, Manuel Kaufmann, Jinhao He, Sammy Christen, Jie Song, Pan Hui
Total Score

0

EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • The paper "EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System" presents a system for tracking human motion, localizing the user, and creating a dense 3D map of the environment using only egocentric (first-person) sensors and inertial measurement units (IMUs).
  • The system operates in real-time and uses a novel optimization-based framework to fuse data from multiple sensors, including cameras, IMUs, and a depth sensor.
  • The key innovation is the tight integration of human motion, localization, and 3D mapping, which allows the system to accurately track the user's motion and position while simultaneously building a detailed model of the surrounding environment.

Plain English Explanation

The EgoHDM system is a way to track a person's movements, figure out where they are located, and create a detailed 3D map of their surroundings - all using only the sensors built into a headset or other wearable device. This is done by combining data from cameras, motion sensors, and depth sensors in a smart way, so the system can understand the person's movements and position in the environment at the same time as it builds a 3D model of that environment.

The key advantage of EgoHDM is that it can do all of this in real-time, without needing any external infrastructure like GPS or motion tracking systems. This makes it useful for applications where the user needs to move freely while their motion and the environment are being captured, such as in virtual and augmented reality, robotics, and sports training.

Technical Explanation

The EgoHDM system uses a combination of cameras, inertial measurement units (IMUs), and a depth sensor to track the user's motion and build a 3D map of the environment. The system uses a novel optimization-based framework to fuse the data from these sensors, allowing it to accurately estimate the user's position and orientation as well as the 3D structure of the surroundings.

The key innovation is the tight integration of human motion capture, localization, and dense 3D mapping. By jointly optimizing these three elements, the system can leverage the constraints and complementary information provided by each task to improve the overall accuracy and robustness of the system.

The system was evaluated on challenging real-world datasets and was shown to outperform previous methods for egocentric human motion capture and localization. The authors also demonstrated the system's ability to create detailed 3D maps of the user's environment in real-time.

Critical Analysis

The paper presents a compelling approach to integrating human motion capture, localization, and 3D mapping using only egocentric sensors. While the results are promising, the authors acknowledge some limitations, such as the need for a depth sensor and the potential for drift in the 3D mapping over long-term use.

It would be interesting to see how the system performs in more diverse environments and with different types of users, as well as to explore ways to further improve the accuracy and robustness of the 3D mapping. Additionally, the authors could investigate ways to reduce the computational and power requirements of the system, which could be important for real-world deployments.

Overall, the EgoHDM system represents an important step forward in enabling highly accurate and practical human motion capture, localization, and 3D mapping using only wearable sensors. Further research and development in this area could lead to significant advances in applications such as virtual and augmented reality, robotics, and sports training.

Conclusion

The "EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System" paper presents a novel approach to tracking human motion, determining the user's location, and creating a detailed 3D model of the environment using only first-person sensors and inertial measurement units. This integrated system operates in real-time and could have important applications in virtual and augmented reality, robotics, and sports training. While the system shows promising results, there are some limitations that could be addressed through further research and development.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System
Total Score

0

EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System

Bonan Liu, Handi Yin, Manuel Kaufmann, Jinhao He, Sammy Christen, Jie Song, Pan Hui

We present EgoHDM, an online egocentric-inertial human motion capture (mocap), localization, and dense mapping system. Our system uses 6 inertial measurement units (IMUs) and a commodity head-mounted RGB camera. EgoHDM is the first human mocap system that offers dense scene mapping in near real-time. Further, it is fast and robust to initialize and fully closes the loop between physically plausible map-aware global human motion estimation and mocap-aware 3D scene reconstruction. Our key idea is integrating camera localization and mapping information with inertial human motion capture bidirectionally in our system. To achieve this, we design a tightly coupled mocap-aware dense bundle adjustment and physics-based body pose correction module leveraging a local body-centric elevation map. The latter introduces a novel terrain-aware contact PD controller, which enables characters to physically contact the given local elevation map thereby reducing human floating or penetration. We demonstrate the performance of our system on established synthetic and real-world benchmarks. The results show that our method reduces human localization, camera pose, and mapping accuracy error by 41%, 71%, 46%, respectively, compared to the state of the art. Our qualitative evaluations on newly captured data further demonstrate that EgoHDM can cover challenging scenarios in non-flat terrain including stepping over stairs and outdoor scenes in the wild.

Read more

9/6/2024

EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams
Total Score

0

EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams

Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Christian Theobalt, Vladislav Golyanik

Monocular egocentric 3D human motion capture is a challenging and actively researched problem. Existing methods use synchronously operating visual sensors (e.g. RGB cameras) and often fail under low lighting and fast motions, which can be restricting in many applications involving head-mounted devices. In response to the existing limitations, this paper 1) introduces a new problem, i.e., 3D human motion capture from an egocentric monocular event camera with a fisheye lens, and 2) proposes the first approach to it called EventEgo3D (EE3D). Event streams have high temporal resolution and provide reliable cues for 3D human motion capture under high-speed human motions and rapidly changing illumination. The proposed EE3D framework is specifically tailored for learning with event streams in the LNES representation, enabling high 3D reconstruction accuracy. We also design a prototype of a mobile head-mounted device with an event camera and record a real dataset with event observations and the ground-truth 3D human poses (in addition to the synthetic dataset). Our EE3D demonstrates robustness and superior 3D accuracy compared to existing solutions across various challenging experiments while supporting real-time 3D pose update rates of 140Hz.

Read more

4/15/2024

EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs
Total Score

0

EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs

Zhen Fan, Peng Dai, Zhuo Su, Xu Gao, Zheng Lv, Jiarui Zhang, Tianyuan Du, Guidong Wang, Yang Zhang

Egocentric human pose estimation (HPE) using wearable sensors is essential for VR/AR applications. Most methods rely solely on either egocentric-view images or sparse Inertial Measurement Unit (IMU) signals, leading to inaccuracies due to self-occlusion in images or the sparseness and drift of inertial sensors. Most importantly, the lack of real-world datasets containing both modalities is a major obstacle to progress in this field. To overcome the barrier, we propose EMHI, a multimodal textbf{E}gocentric human textbf{M}otion dataset with textbf{H}ead-Mounted Display (HMD) and body-worn textbf{I}MUs, with all data collected under the real VR product suite. Specifically, EMHI provides synchronized stereo images from downward-sloping cameras on the headset and IMU data from body-worn sensors, along with pose annotations in SMPL format. This dataset consists of 885 sequences captured by 58 subjects performing 39 actions, totaling about 28.5 hours of recording. We evaluate the annotations by comparing them with optical marker-based SMPL fitting results. To substantiate the reliability of our dataset, we introduce MEPoser, a new baseline method for multimodal egocentric HPE, which employs a multimodal fusion encoder, temporal feature encoder, and MLP-based regression heads. The experiments on EMHI show that MEPoser outperforms existing single-modal methods and demonstrates the value of our dataset in solving the problem of egocentric HPE. We believe the release of EMHI and the method could advance the research of egocentric HPE and expedite the practical implementation of this technology in VR/AR products.

Read more

9/2/2024

HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes
Total Score

0

HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes

Zhiming Hu, Zheming Yin, Daniel Haeufle, Syn Schmitt, Andreas Bulling

We present HOIMotion - a novel approach for human motion forecasting during human-object interactions that integrates information about past body poses and egocentric 3D object bounding boxes. Human motion forecasting is important in many augmented reality applications but most existing methods have only used past body poses to predict future motion. HOIMotion first uses an encoder-residual graph convolutional network (GCN) and multi-layer perceptrons to extract features from body poses and egocentric 3D object bounding boxes, respectively. Our method then fuses pose and object features into a novel pose-object graph and uses a residual-decoder GCN to forecast future body motion. We extensively evaluate our method on the Aria digital twin (ADT) and MoGaze datasets and show that HOIMotion consistently outperforms state-of-the-art methods by a large margin of up to 8.7% on ADT and 7.2% on MoGaze in terms of mean per joint position error. Complementing these evaluations, we report a human study (N=20) that shows that the improvements achieved by our method result in forecasted poses being perceived as both more precise and more realistic than those of existing methods. Taken together, these results reveal the significant information content available in egocentric 3D object bounding boxes for human motion forecasting and the effectiveness of our method in exploiting this information.

Read more

7/4/2024