Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors

2405.01112

Published 5/3/2024 by Wenxuan Guo, Zhiyu Pan, Ziheng Xi, Alapati Tuerxun, Jianjiang Feng, Jie Zhou

Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors

Abstract

Sports analysis and viewing play a pivotal role in the current sports domain, offering significant value not only to coaches and athletes but also to fans and the media. In recent years, the rapid development of virtual reality (VR) and augmented reality (AR) technologies have introduced a new platform for watching games. Visualization of sports competitions in VR/AR represents a revolutionary technology, providing audiences with a novel immersive viewing experience. However, there is still a lack of related research in this area. In this work, we present for the first time a comprehensive system for sports competition analysis and real-time visualization on VR/AR platforms. First, we utilize multiview LiDARs and cameras to collect multimodal game data. Subsequently, we propose a framework for multi-player tracking and pose estimation based on a limited amount of supervised data, which extracts precise player positions and movements from point clouds and images. Moreover, we perform avatar modeling of players to obtain their 3D models. Ultimately, using these 3D player data, we conduct competition analysis and real-time visualization on VR/AR. Extensive quantitative experiments demonstrate the accuracy and robustness of our multi-player tracking and pose estimation framework. The visualization results showcase the immense potential of our sports visualization system on the domain of watching games on VR/AR devices. The multimodal competition dataset we collected and all related code will be released soon.

Create account to get full access

Overview

This paper presents a sports analysis and virtual reality (VR) viewing system that uses player tracking and pose estimation with multimodal and multiview sensors.
The system aims to provide enhanced sports viewing experiences by capturing detailed player movements and incorporating them into immersive VR environments.
The research explores techniques for accurately tracking players and estimating their body poses using a combination of different sensor modalities and camera viewpoints.

Plain English Explanation

This research paper describes a system that can enhance the way we experience and analyze sports events. The key idea is to use advanced technology to capture detailed information about the players' movements and then use that data to create immersive virtual reality (VR) experiences.

The system uses a variety of sensors, such as cameras and other devices, to track the position and pose (body orientation) of the players on the field or court. [This links to the research on <a href="https://aimodels.fyi/papers/arxiv/3d-human-scan-moving-event-camera">3D human pose estimation from event cameras</a> and <a href="https://aimodels.fyi/papers/arxiv/multi-person-3d-pose-estimation-from-unlabelled">multi-person 3D pose estimation</a>.]

By combining data from multiple sensors and camera angles, the researchers are able to create a very accurate and detailed model of the players' movements. This information can then be used to create VR experiences that put the viewer right in the middle of the action, allowing them to see the game from the players' perspectives. [This relates to the research on <a href="https://aimodels.fyi/papers/arxiv/self-avatar-animation-virtual-reality-impact-motion">self-avatar animation in VR</a> and <a href="https://aimodels.fyi/papers/arxiv/real-time-simulated-avatar-from-head-mounted">real-time avatar generation from head-mounted cameras</a>.]

The goal is to provide sports fans with a more immersive and engaging viewing experience, while also giving coaches and analysts new tools for studying player performance and strategy.

Technical Explanation

The researchers' approach involves using a combination of different sensor modalities, including RGB cameras, depth cameras, and inertial measurement units (IMUs), to track the position and pose of players on the field or court. [This relates to the research on <a href="https://aimodels.fyi/papers/arxiv/i-did-not-notice-comparison-immersive-analytics">multimodal sensor fusion for immersive analytics</a>.]

By fusing the data from these various sensors, the system is able to create a detailed 3D model of each player's movements. The researchers use advanced computer vision and machine learning techniques, such as deep learning-based pose estimation, to extract this information from the sensor data.

The tracked player data is then used to power the VR viewing experience, allowing users to see the game from different perspectives and even follow individual players as they move around the field. The system also provides analytical tools for coaches and analysts to review player performance and strategy in detail.

Critical Analysis

One potential limitation of the research is the reliance on specialized sensor hardware, which may limit the scalability and accessibility of the system. The authors acknowledge this and suggest that future work could explore more cost-effective sensor setups or ways to leverage existing infrastructure, such as stadium cameras or player-worn sensors.

Additionally, the paper does not provide a detailed evaluation of the system's accuracy or the user experience in the VR environment. Further research would be needed to understand the practical implications and real-world benefits of this technology for sports fans, coaches, and analysts.

It would also be interesting to see how this system could be extended to other sports or even non-athletic contexts, such as dance or theater performances, where detailed motion capture could enhance the viewing experience.

Conclusion

This research presents an innovative approach to sports analysis and viewing that leverages advanced sensor technology and immersive VR experiences. By capturing detailed player movements and incorporating them into virtual environments, the system has the potential to transform the way we engage with and understand sports events.

While the technical implementation details are complex, the core idea of using technology to create more immersive and insightful sports experiences is compelling. As the researchers continue to refine and validate their system, it could pave the way for a new era of sports viewing and analysis that brings fans and experts closer to the action than ever before.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Augmenting Sports Videos with VisCommentator

Chen Zhu-Tian, Shuainan Ye, Xiangtong Chu, Haijun Xia, Hui Zhang, Huamin Qu, Yingcai Wu

Visualizing data in sports videos is gaining traction in sports analytics, given its ability to communicate insights and explicate player strategies engagingly. However, augmenting sports videos with such data visualizations is challenging, especially for sports analysts, as it requires considerable expertise in video editing. To ease the creation process, we present a design space that characterizes augmented sports videos at an element-level (what the constituents are) and clip-level (how those constituents are organized). We do so by systematically reviewing 233 examples of augmented sports videos collected from TV channels, teams, and leagues. The design space guides selection of data insights and visualizations for various purposes. Informed by the design space and close collaboration with domain experts, we design VisCommentator, a fast prototyping tool, to eases the creation of augmented table tennis videos by leveraging machine learning-based data extractors and design space-based visualization recommendations. With VisCommentator, sports analysts can create an augmented video by selecting the data to visualize instead of manually drawing the graphical marks. Our system can be generalized to other racket sports (e.g., tennis, badminton) once the underlying datasets and models are available. A user study with seven domain experts shows high satisfaction with our system, confirms that the participants can reproduce augmented sports videos in a short period, and provides insightful implications into future improvements and opportunities.

5/14/2024

cs.HC cs.GR

Video2MR: Automatically Generating Mixed Reality 3D Instructions by Augmenting Extracted Motion from 2D Videos

Keiichi Ihara, Kyzyl Monteiro, Mehrad Faridan, Rubaiat Habib Kazi, Ryo Suzuki

This paper introduces Video2MR, a mixed reality system that automatically generates 3D sports and exercise instructions from 2D videos. Mixed reality instructions have great potential for physical training, but existing works require substantial time and cost to create these 3D experiences. Video2MR overcomes this limitation by transforming arbitrary instructional videos available online into MR 3D avatars with AI-enabled motion capture (DeepMotion). Then, it automatically enhances the avatar motion through the following augmentation techniques: 1) contrasting and highlighting differences between the user and avatar postures, 2) visualizing key trajectories and movements of specific body parts, 3) manipulation of time and speed using body motion, and 4) spatially repositioning avatars for different perspectives. Developed on Hololens 2 and Azure Kinect, we showcase various use cases, including yoga, dancing, soccer, tennis, and other physical exercises. The study results confirm that Video2MR provides more engaging and playful learning experiences, compared to existing 2D video instructions.

5/30/2024

cs.HC

Real-Time Simulated Avatar from Head-Mounted Sensors

Zhengyi Luo, Jinkun Cao, Rawal Khirodkar, Alexander Winkler, Jing Huang, Kris Kitani, Weipeng Xu

We present SimXR, a method for controlling a simulated avatar from information (headset pose and cameras) obtained from AR / VR headsets. Due to the challenging viewpoint of head-mounted cameras, the human body is often clipped out of view, making traditional image-based egocentric pose estimation challenging. On the other hand, headset poses provide valuable information about overall body motion, but lack fine-grained details about the hands and feet. To synergize headset poses with cameras, we control a humanoid to track headset movement while analyzing input images to decide body movement. When body parts are seen, the movements of hands and feet will be guided by the images; when unseen, the laws of physics guide the controller to generate plausible motion. We design an end-to-end method that does not rely on any intermediate representations and learns to directly map from images and headset poses to humanoid control signals. To train our method, we also propose a large-scale synthetic dataset created using camera configurations compatible with a commercially available VR headset (Quest 2) and show promising results on real-world captures. To demonstrate the applicability of our framework, we also test it on an AR headset with a forward-facing camera.

4/26/2024

cs.CV cs.GR cs.RO

Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction

Antoine Maiorca, Seyed Abolfazl Ghasemzadeh, Thierry Ravet, Franc{c}ois Cresson, Thierry Dutoit, Christophe De Vleeschouwer

Virtual Reality (VR) applications have revolutionized user experiences by immersing individuals in interactive 3D environments. These environments find applications in numerous fields, including healthcare, education, or architecture. A significant aspect of VR is the inclusion of self-avatars, representing users within the virtual world, which enhances interaction and embodiment. However, generating lifelike full-body self-avatar animations remains challenging, particularly in consumer-grade VR systems, where lower-body tracking is often absent. One method to tackle this problem is by providing an external source of motion information that includes lower body information such as full Cartesian positions estimated from RGB(D) cameras. Nevertheless, the limitations of these systems are multiples: the desynchronization between the two motion sources and occlusions are examples of significant issues that hinder the implementations of such systems. In this paper, we aim to measure the impact on the reconstruction of the articulated self-avatar's full-body pose of (1) the latency between the VR motion features and estimated positions, (2) the data acquisition rate, (3) occlusions, and (4) the inaccuracy of the position estimation algorithm. In addition, we analyze the motion reconstruction errors using ground truth and 3D Cartesian coordinates estimated from textit{YOLOv8} pose estimation. These analyzes show that the studied methods are significantly sensitive to any degradation tested, especially regarding the velocity reconstruction error.

4/30/2024

cs.CV