MultiPhys: Multi-Person Physics-aware 3D Motion Estimation

Read original: arXiv:2404.11987 - Published 4/19/2024 by Nicolas Ugrinovic, Boxiao Pan, Georgios Pavlakos, Despoina Paschalidou, Bokui Shen, Jordi Sanchez-Riera, Francesc Moreno-Noguer, Leonidas Guibas

MultiPhys: Multi-Person Physics-aware 3D Motion Estimation

Overview

This paper proposes a novel method called "MultiPhys" for estimating 3D motion of multiple people in a scene while considering physics-based constraints.
The approach combines 2D pose estimation, 3D reconstruction, and physics-based reasoning to jointly estimate the 3D poses and motions of multiple interacting people.
The method leverages recent advancements in two-person interaction augmentation using skeleton priors, self-supervised multi-person 3D pose estimation, and physics-aware 3D pose estimation.

Plain English Explanation

The paper describes a new way to estimate the 3D motion of multiple people in a scene, taking into account the laws of physics. Current methods for 3D pose estimation often treat each person independently, but in real life, people interact with each other and with the physical environment.

The MultiPhys approach combines several recent advancements to jointly estimate the 3D poses and motions of multiple interacting people. It starts by detecting the 2D positions of body parts, then reconstructs the 3D positions while considering how people's movements are constrained by physics, such as balance and contact with surfaces.

This physics-aware approach allows the system to better handle complex interactions between people, like supporting each other's weight or colliding. By incorporating these physical realities, the method can produce more realistic and accurate 3D motion estimates for multiple people in the same scene.

Technical Explanation

The MultiPhys framework consists of three main components:

2D Pose Estimation: The method first uses a state-of-the-art 2D pose estimation model to detect the locations of body parts (e.g. shoulders, elbows, knees) in each video frame.
3D Pose Reconstruction: The 2D pose estimates are then fed into a 3D pose reconstruction model, similar to the learning human motion from monocular videos approach, to infer the 3D positions of the body parts.
Physics-based Optimization: Finally, the system performs an optimization step that adjusts the 3D pose estimates to satisfy physical constraints, such as balance, contact with the ground, and interactions between people. This builds on recent work in physics-aware 3D pose estimation.

The key innovation is the integration of the physics-based optimization step, which allows the model to reason about the physical interactions between people and their environment. This leads to more realistic and consistent 3D motion estimates compared to previous methods that did not consider these physical constraints.

Critical Analysis

The authors acknowledge several limitations of the MultiPhys approach. First, the method relies on accurate 2D pose estimation, which can be challenging in crowded scenes with occlusions. Additionally, the physics-based optimization step increases the computational complexity of the system, which may limit its real-time application.

The paper also does not thoroughly evaluate the model's ability to handle complex interactions, such as people carrying objects or supporting each other's weight. Further research may be needed to assess the generalization capabilities of the approach in diverse real-world scenarios.

That said, the core idea of integrating physics-based reasoning into 3D pose estimation is a promising direction that could lead to significant improvements in modeling realistic human motion, especially in multi-person contexts. Future work building on this foundation may yield even more robust and versatile motion capture systems.

Conclusion

The MultiPhys method represents an important step forward in multi-person 3D motion estimation by incorporating physical constraints and interactions into the modeling process. By leveraging recent advancements in 2D pose detection, 3D reconstruction, and physics-aware reasoning, the approach can produce more realistic and accurate 3D motion estimates for multiple people in a scene.

While the current implementation has some limitations, the core ideas of the paper have the potential to substantially advance the field of human motion capture and enable a wide range of applications, from virtual reality and gaming to human-robot interaction and sports analysis.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

MultiPhys: Multi-Person Physics-aware 3D Motion Estimation

Nicolas Ugrinovic, Boxiao Pan, Georgios Pavlakos, Despoina Paschalidou, Bokui Shen, Jordi Sanchez-Riera, Francesc Moreno-Noguer, Leonidas Guibas

We introduce MultiPhys, a method designed for recovering multi-person motion from monocular videos. Our focus lies in capturing coherent spatial placement between pairs of individuals across varying degrees of engagement. MultiPhys, being physically aware, exhibits robustness to jittering and occlusions, and effectively eliminates penetration issues between the two individuals. We devise a pipeline in which the motion estimated by a kinematic-based method is fed into a physics simulator in an autoregressive manner. We introduce distinct components that enable our model to harness the simulator's properties without compromising the accuracy of the kinematic estimates. This results in final motion estimates that are both kinematically coherent and physically compliant. Extensive evaluations on three challenging datasets characterized by substantial inter-person interaction show that our method significantly reduces errors associated with penetration and foot skating, while performing competitively with the state-of-the-art on motion accuracy and smoothness. Results and code can be found on our project page (http://www.iri.upc.edu/people/nugrinovic/multiphys/).

4/19/2024

MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

Zeren Jiang, Chen Guo, Manuel Kaufmann, Tianjian Jiang, Julien Valentin, Otmar Hilliges, Jie Song

We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. Reconstructing multiple individuals moving and interacting naturally from monocular in-the-wild videos poses a challenging task. Addressing it necessitates precise pixel-level disentanglement of individuals without any prior knowledge about the subjects. Moreover, it requires recovering intricate and complete 3D human shapes from short video sequences, intensifying the level of difficulty. To tackle these challenges, we first define a layered neural representation for the entire scene, composited by individual human and background models. We learn the layered neural representation from videos via our layer-wise differentiable volume rendering. This learning process is further enhanced by our hybrid instance segmentation approach which combines the self-supervised 3D segmentation and the promptable 2D segmentation module, yielding reliable instance segmentation supervision even under close human interaction. A confidence-guided optimization formulation is introduced to optimize the human poses and shape/appearance alternately. We incorporate effective objectives to refine human poses via photometric information and impose physically plausible constraints on human dynamics, leading to temporally consistent 3D reconstructions with high fidelity. The evaluation of our method shows the superiority over prior art on publicly available datasets and in-the-wild videos.

6/4/2024

📊

Multi-person 3D pose estimation from unlabelled data

Daniel Rodriguez-Criado, Pilar Bachiller, George Vogiatzis, Luis J. Manso

Its numerous applications make multi-human 3D pose estimation a remarkably impactful area of research. Nevertheless, assuming a multiple-view system composed of several regular RGB cameras, 3D multi-pose estimation presents several challenges. First of all, each person must be uniquely identified in the different views to separate the 2D information provided by the cameras. Secondly, the 3D pose estimation process from the multi-view 2D information of each person must be robust against noise and potential occlusions in the scenario. In this work, we address these two challenges with the help of deep learning. Specifically, we present a model based on Graph Neural Networks capable of predicting the cross-view correspondence of the people in the scenario along with a Multilayer Perceptron that takes the 2D points to yield the 3D poses of each person. These two models are trained in a self-supervised manner, thus avoiding the need for large datasets with 3D annotations.

4/10/2024

AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos

Feichi Lu, Zijian Dong, Jie Song, Otmar Hilliges

Despite progress in human motion capture, existing multi-view methods often face challenges in estimating the 3D pose and shape of multiple closely interacting people. This difficulty arises from reliance on accurate 2D joint estimations, which are hard to obtain due to occlusions and body contact when people are in close interaction. To address this, we propose a novel method leveraging the personalized implicit neural avatar of each individual as a prior, which significantly improves the robustness and precision of this challenging pose estimation task. Concretely, the avatars are efficiently reconstructed via layered volume rendering from sparse multi-view videos. The reconstructed avatar prior allows for the direct optimization of 3D poses based on color and silhouette rendering loss, bypassing the issues associated with noisy 2D detections. To handle interpenetration, we propose a collision loss on the overlapping shape regions of avatars to add penetration constraints. Moreover, both 3D poses and avatars are optimized in an alternating manner. Our experimental results demonstrate state-of-the-art performance on several public datasets.

8/21/2024