Real-Time Simulated Avatar from Head-Mounted Sensors

2403.06862

YC

0

Reddit

0

Published 4/26/2024 by Zhengyi Luo, Jinkun Cao, Rawal Khirodkar, Alexander Winkler, Jing Huang, Kris Kitani, Weipeng Xu
Real-Time Simulated Avatar from Head-Mounted Sensors

Abstract

We present SimXR, a method for controlling a simulated avatar from information (headset pose and cameras) obtained from AR / VR headsets. Due to the challenging viewpoint of head-mounted cameras, the human body is often clipped out of view, making traditional image-based egocentric pose estimation challenging. On the other hand, headset poses provide valuable information about overall body motion, but lack fine-grained details about the hands and feet. To synergize headset poses with cameras, we control a humanoid to track headset movement while analyzing input images to decide body movement. When body parts are seen, the movements of hands and feet will be guided by the images; when unseen, the laws of physics guide the controller to generate plausible motion. We design an end-to-end method that does not rely on any intermediate representations and learns to directly map from images and headset poses to humanoid control signals. To train our method, we also propose a large-scale synthetic dataset created using camera configurations compatible with a commercially available VR headset (Quest 2) and show promising results on real-world captures. To demonstrate the applicability of our framework, we also test it on an AR headset with a forward-facing camera.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • The paper presents a system for real-time simulation of an avatar based on head-mounted sensor data.
  • The system can capture the user's head movements, facial expressions, and eye gaze, and use this information to animate a 3D avatar in real-time.
  • This allows for more natural and immersive virtual interactions, with the avatar reflecting the user's movements and expressions.

Plain English Explanation

The paper describes a way to create a digital avatar that can move and act like a real person, in real-time. The avatar is controlled by sensors that are worn on the user's head, which track the user's head movements, facial expressions, and eye movements. This information is then used to animate the avatar, so that it mimics the user's actions and appearance.

This could be used to create more realistic and engaging virtual experiences, where the user's avatar feels like a natural extension of themselves. Instead of just seeing a static avatar, the other people in the virtual environment would see an avatar that moves, looks, and even behaves just like the real user. This could have applications in areas like remote collaboration, virtual events, and immersive gaming.

Technical Explanation

The system uses a combination of head-mounted sensors, 3D avatar models, and real-time animation algorithms to create the simulated avatar. The head-mounted sensors capture the user's head pose, facial expressions, and eye gaze data. This data is then used to drive the animation of a 3D avatar model, updating its position, orientation, facial features, and eye movements in real-time.

The paper describes the technical details of how the sensor data is processed and mapped to the avatar model. This includes techniques for 3D pose estimation, facial expression recognition, and eye gaze tracking. The authors also discuss the challenges of achieving low-latency, high-fidelity avatar animation in a real-time system.

Critical Analysis

The paper demonstrates promising results in creating realistic, real-time avatar animations from head-mounted sensors. However, the system is currently limited to head and facial tracking, and does not capture the user's full body movements. Extending the system to include full-body motion capture could further improve the realism and immersion of the avatar.

Additionally, the paper does not address potential privacy and ethical concerns around the use of head-mounted sensors to capture user data. Careful consideration would be needed to ensure users have control over how their biometric data is collected and used.

Conclusion

Overall, the paper presents a compelling approach for creating real-time, simulated avatars that can closely mirror a user's head movements and facial expressions. This technology has the potential to enhance virtual interactions and experiences, though further research is needed to address technical limitations and ethical considerations. As head-mounted devices and virtual/augmented reality continue to evolve, systems like the one described in this paper may become increasingly important for enabling more natural and immersive digital interactions.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Head Pose Estimation and 3D Neural Surface Reconstruction via Monocular Camera in situ for Navigation and Safe Insertion into Natural Openings

Head Pose Estimation and 3D Neural Surface Reconstruction via Monocular Camera in situ for Navigation and Safe Insertion into Natural Openings

Ruijie Tang, Beilei Cui, Hongliang Ren

YC

0

Reddit

0

As the significance of simulation in medical care and intervention continues to grow, it is anticipated that a simplified and low-cost platform can be set up to execute personalized diagnoses and treatments. 3D Slicer can not only perform medical image analysis and visualization but can also provide surgical navigation and surgical planning functions. In this paper, we have chosen 3D Slicer as our base platform and monocular cameras are used as sensors. Then, We used the neural radiance fields (NeRF) algorithm to complete the 3D model reconstruction of the human head. We compared the accuracy of the NeRF algorithm in generating 3D human head scenes and utilized the MarchingCube algorithm to generate corresponding 3D mesh models. The individual's head pose, obtained through single-camera vision, is transmitted in real-time to the scene created within 3D Slicer. The demonstrations presented in this paper include real-time synchronization of transformations between the human head model in the 3D Slicer scene and the detected head posture. Additionally, we tested a scene where a tool, marked with an ArUco Maker tracked by a single camera, synchronously points to the real-time transformation of the head posture. These demos indicate that our methodology can provide a feasible real-time simulation platform for nasopharyngeal swab collection or intubation.

Read more

6/21/2024

Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction

Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction

Antoine Maiorca, Seyed Abolfazl Ghasemzadeh, Thierry Ravet, Franc{c}ois Cresson, Thierry Dutoit, Christophe De Vleeschouwer

YC

0

Reddit

0

Virtual Reality (VR) applications have revolutionized user experiences by immersing individuals in interactive 3D environments. These environments find applications in numerous fields, including healthcare, education, or architecture. A significant aspect of VR is the inclusion of self-avatars, representing users within the virtual world, which enhances interaction and embodiment. However, generating lifelike full-body self-avatar animations remains challenging, particularly in consumer-grade VR systems, where lower-body tracking is often absent. One method to tackle this problem is by providing an external source of motion information that includes lower body information such as full Cartesian positions estimated from RGB(D) cameras. Nevertheless, the limitations of these systems are multiples: the desynchronization between the two motion sources and occlusions are examples of significant issues that hinder the implementations of such systems. In this paper, we aim to measure the impact on the reconstruction of the articulated self-avatar's full-body pose of (1) the latency between the VR motion features and estimated positions, (2) the data acquisition rate, (3) occlusions, and (4) the inaccuracy of the position estimation algorithm. In addition, we analyze the motion reconstruction errors using ground truth and 3D Cartesian coordinates estimated from textit{YOLOv8} pose estimation. These analyzes show that the studied methods are significantly sensitive to any degradation tested, especially regarding the velocity reconstruction error.

Read more

4/30/2024

🏷️

3D Human Pose Perception from Egocentric Stereo Videos

Hiroyasu Akada, Jian Wang, Vladislav Golyanik, Christian Theobalt

YC

0

Reddit

0

While head-mounted devices are becoming more compact, they provide egocentric views with significant self-occlusions of the device user. Hence, existing methods often fail to accurately estimate complex 3D poses from egocentric views. In this work, we propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation, which leverages the scene information and temporal context of egocentric stereo videos. Specifically, we utilize 1) depth features from our 3D scene reconstruction module with uniformly sampled windows of egocentric stereo frames, and 2) human joint queries enhanced by temporal features of the video inputs. Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting. Furthermore, we introduce two new benchmark datasets, i.e., UnrealEgo2 and UnrealEgo-RW (RealWorld). The proposed datasets offer a much larger number of egocentric stereo views with a wider variety of human motions than the existing datasets, allowing comprehensive evaluation of existing and upcoming methods. Our extensive experiments show that the proposed approach significantly outperforms previous methods. We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.

Read more

5/16/2024

Virtual avatar generation models as world navigators

Virtual avatar generation models as world navigators

Sai Mandava

YC

0

Reddit

0

We introduce SABR-CLIMB, a novel video model simulating human movement in rock climbing environments using a virtual avatar. Our diffusion transformer predicts the sample instead of noise in each diffusion step and ingests entire videos to output complete motion sequences. By leveraging a large proprietary dataset, NAV-22M, and substantial computational resources, we showcase a proof of concept for a system to train general-purpose virtual avatars for complex tasks in robotics, sports, and healthcare.

Read more

6/4/2024