Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses

Read original: arXiv:2312.12042 - Published 6/11/2024 by Zhiming Hu, Jiahui Xu, Syn Schmitt, Andreas Bulling

Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses

Overview

This paper presents Pose2Gaze, a model that can generate realistic human gaze behavior from full-body poses.
The model uses an eye-body coordination mechanism to learn the relationship between body movements and eye gaze.
The generated gaze behavior can be used to enhance the realism and interactivity of virtual characters and avatars.

Plain English Explanation

Pose2Gaze: Generating Realistic Human Gaze Behaviour from Full-body Poses using an Eye-body Coordination Model is a research paper that describes a new way to make virtual characters and avatars look and behave more realistic. The key idea is to use the way a person's body moves to predict where their eyes will look.

Humans constantly shift their gaze as they move around and interact with the world. This eye movement, called "gaze behavior," is an important part of how we communicate and express ourselves. When creating virtual characters or avatars, it's important to capture this gaze behavior realistically to make them feel more lifelike and engaging.

The researchers developed a model called Pose2Gaze that can learn the relationship between a person's body movements and their eye gaze. By observing how a person's body moves and where their eyes look in the real world, the model can then generate realistic gaze behavior for virtual characters based on their body poses.

This approach is useful for applications like virtual reality, interactive games, and human-robot interaction, where creating natural-looking virtual characters or avatars is important for providing an immersive and engaging experience. It can also potentially be used to help detect where a person is looking in real-time, which has applications in areas like assistive technology and cognitive science.

Technical Explanation

Pose2Gaze is a deep learning model that can generate realistic human gaze behavior from full-body poses. The key innovation is the use of an eye-body coordination mechanism that learns the relationship between body movements and eye gaze.

The model takes as input a sequence of 3D body poses over time and outputs a corresponding sequence of gaze directions. It uses a transformer-based architecture to effectively capture the complex, nonlinear mapping between body and eye movements.

The training data for the model consists of synchronized recordings of body movements and eye gaze from human participants. This allows the model to learn the natural correlation between how a person's body moves and where their eyes look.

During inference, the model can take a new sequence of body poses as input and generate a plausible sequence of gaze directions that are coherent with the body movements. This allows the model to be used to enhance the realism and interactivity of virtual characters and avatars in a variety of applications.

Critical Analysis

The Pose2Gaze model represents a promising approach to generating realistic gaze behavior for virtual characters, but there are some potential limitations and areas for further research.

One key limitation is that the model is trained on data from a relatively small number of individuals, which may limit its ability to generalize to a broader population. Additionally, the training data was collected in a controlled lab setting, and it's unclear how well the model would perform on more unconstrained, real-world data.

The paper also does not provide a thorough analysis of the model's sensitivity to different types of body poses or movements. It's possible that the model may struggle to accurately predict gaze behavior in more complex or unusual body poses that were not well represented in the training data.

Further research could explore ways to improve the model's generalization, such as by incorporating larger and more diverse training datasets or developing techniques to better handle variations in body pose and movement. Additionally, evaluating the model's performance in more realistic, interactive scenarios could help identify areas for improvement and highlight the practical benefits of the approach.

Overall, the Pose2Gaze model is a promising step forward in the quest to create more lifelike and engaging virtual characters. With further refinement and evaluation, this type of eye-body coordination modeling could have significant implications for a wide range of applications, from virtual reality and gaming to assistive technologies and human-robot interaction.

Conclusion

Pose2Gaze is a novel deep learning model that can generate realistic human gaze behavior from full-body poses. By learning the inherent relationship between body movements and eye gaze, the model can enhance the realism and interactivity of virtual characters and avatars in a variety of applications.

While the model shows promise, there are still areas for improvement, such as enhancing its generalization capabilities and evaluating its performance in more realistic, interactive scenarios. Nevertheless, this research represents an important step forward in the quest to create more lifelike and engaging virtual worlds and human-machine interactions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses

Zhiming Hu, Jiahui Xu, Syn Schmitt, Andreas Bulling

Human eye gaze plays a significant role in many virtual and augmented reality (VR/AR) applications, such as gaze-contingent rendering, gaze-based interaction, or eye-based activity recognition. However, prior works on gaze analysis and prediction have only explored eye-head coordination and were limited to human-object interactions. We first report a comprehensive analysis of eye-body coordination in various human-object and human-human interaction activities based on four public datasets collected in real-world (MoGaze), VR (ADT), as well as AR (GIMO and EgoBody) environments. We show that in human-object interactions, e.g. pick and place, eye gaze exhibits strong correlations with full-body motion while in human-human interactions, e.g. chat and teach, a person's gaze direction is correlated with the body orientation towards the interaction partner. Informed by these analyses we then present Pose2Gaze, a novel eye-body coordination model that uses a convolutional neural network and a spatio-temporal graph convolutional neural network to extract features from head direction and full-body poses, respectively, and then uses a convolutional neural network to predict eye gaze. We compare our method with state-of-the-art methods that predict eye gaze only from head movements and show that Pose2Gaze outperforms these baselines with an average improvement of 24.0% on MoGaze, 10.1% on ADT, 21.3% on GIMO, and 28.6% on EgoBody in mean angular error, respectively. We also show that our method significantly outperforms prior methods in the sample downstream task of eye-based activity recognition. These results underline the significant information content available in eye-body coordination during daily activities and open up a new direction for gaze prediction.

6/11/2024

GazeMotion: Gaze-guided Human Motion Forecasting

Zhiming Hu, Syn Schmitt, Daniel Haeufle, Andreas Bulling

We present GazeMotion, a novel method for human motion forecasting that combines information on past human poses with human eye gaze. Inspired by evidence from behavioural sciences showing that human eye and body movements are closely coordinated, GazeMotion first predicts future eye gaze from past gaze, then fuses predicted future gaze and past poses into a gaze-pose graph, and finally uses a residual graph convolutional network to forecast body motion. We extensively evaluate our method on the MoGaze, ADT, and GIMO benchmark datasets and show that it outperforms state-of-the-art methods by up to 7.4% improvement in mean per joint position error. Using head direction as a proxy to gaze, our method still achieves an average improvement of 5.5%. We finally report an online user study showing that our method also outperforms prior methods in terms of perceived realism. These results show the significant information content available in eye gaze for human motion forecasting as well as the effectiveness of our method in exploiting this information.

7/12/2024

Human Gaze and Head Rotation during Navigation, Exploration and Object Manipulation in Shared Environments with Robots

Tim Schreiter, Andrey Rudenko, Martin Magnusson, Achim J. Lilienthal

The human gaze is an important cue to signal intention, attention, distraction, and the regions of interest in the immediate surroundings. Gaze tracking can transform how robots perceive, understand, and react to people, enabling new modes of robot control, interaction, and collaboration. In this paper, we use gaze tracking data from a rich dataset of human motion (THOR-MAGNI) to investigate the coordination between gaze direction and head rotation of humans engaged in various indoor activities involving navigation, interaction with objects, and collaboration with a mobile robot. In particular, we study the spread and central bias of fixations in diverse activities and examine the correlation between gaze direction and head rotation. We introduce various human motion metrics to enhance the understanding of gaze behavior in dynamic interactions. Finally, we apply semantic object labeling to decompose the gaze distribution into activity-relevant regions.

6/11/2024

Gaze-guided Hand-Object Interaction Synthesis: Dataset and Method

Jie Tian, Ran Ji, Lingxiao Yang, Yuexin Ma, Lan Xu, Jingyi Yu, Ye Shi, Jingya Wang

Gaze plays a crucial role in revealing human attention and intention, particularly in hand-object interaction scenarios, where it guides and synchronizes complex tasks that require precise coordination between the brain, hand, and object. Motivated by this, we introduce a novel task: Gaze-Guided Hand-Object Interaction Synthesis, with potential applications in augmented reality, virtual reality, and assistive technologies. To support this task, we present GazeHOI, the first dataset to capture simultaneous 3D modeling of gaze, hand, and object interactions. This task poses significant challenges due to the inherent sparsity and noise in gaze data, as well as the need for high consistency and physical plausibility in generating hand and object motions. To tackle these issues, we propose a stacked gaze-guided hand-object interaction diffusion model, named GHO-Diffusion. The stacked design effectively reduces the complexity of motion generation. We also introduce HOI-Manifold Guidance during the sampling stage of GHO-Diffusion, enabling fine-grained control over generated motions while maintaining the data manifold. Additionally, we propose a spatial-temporal gaze feature encoding for the diffusion condition and select diffusion results based on consistency scores between gaze-contact maps and gaze-interaction trajectories. Extensive experiments highlight the effectiveness of our method and the unique contributions of our dataset.

8/23/2024