Universal Humanoid Motion Representations for Physics-Based Control

2310.04582

Published 4/15/2024 by Zhengyi Luo, Jinkun Cao, Josh Merel, Alexander Winkler, Jing Huang, Kris Kitani, Weipeng Xu

Universal Humanoid Motion Representations for Physics-Based Control

Abstract

We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control. Due to the high dimensionality of humanoids and the inherent difficulties in reinforcement learning, prior methods have focused on learning skill embeddings for a narrow range of movement styles (e.g. locomotion, game characters) from specialized motion datasets. This limited scope hampers their applicability in complex tasks. We close this gap by significantly increasing the coverage of our motion representation space. To achieve this, we first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset. We then create our motion representation by distilling skills directly from the imitator. This is achieved by using an encoder-decoder structure with a variational information bottleneck. Additionally, we jointly learn a prior conditioned on proprioception (humanoid's own pose and velocities) to improve model expressiveness and sampling efficiency for downstream tasks. By sampling from the prior, we can generate long, stable, and diverse human motions. Using this latent space for hierarchical RL, we show that our policies solve tasks using human-like behavior. We demonstrate the effectiveness of our motion representation by solving generative tasks (e.g. strike, terrain traversal) and motion tracking using VR controllers.

Create account to get full access

Overview

This paper introduces a novel approach for representing humanoid motion in a physics-based control system.
The researchers developed a latent space representation that can encode a wide range of human motion patterns, enabling physics-based simulation and control of humanoid characters.
The proposed method outperforms existing techniques in terms of motion capture data efficiency, motion variety, and physical realism.

Plain English Explanation

The paper describes a new way to represent the movements and actions of human-like virtual characters, called "humanoids," in computer simulations. The key idea is to create a "latent space" - a mathematical representation that can capture the essential features of how humans move.

This latent space has several advantages over previous approaches. First, it requires less motion capture data to work with, meaning the virtual characters can be made to move in realistic ways without needing to record huge amounts of real human movement. Second, the latent space allows for a wide variety of different motions to be generated, from walking and running to more complex actions. And third, the motions produced through this latent space representation tend to be physically realistic, behaving in ways that obey the laws of physics.

By developing this new latent space representation, the researchers have created a powerful tool for animating humanoid characters in physics-based simulations. This could be useful for applications like video games, movie visual effects, and even robotics, where realistic and varied human-like motion is important.

Technical Explanation

The paper presents a novel physics-based humanoid motion latent space that can effectively encode a wide range of human motion patterns. This latent space representation enables physics-based simulation and control of humanoid characters, outperforming existing techniques in terms of motion capture data efficiency, motion variety, and physical realism.

The key innovation is the design of a unified masked autoencoder that can learn a compact latent encoding of human motion from limited training data. This latent space representation is then integrated into a physics-based control framework, allowing for the generation of physically realistic humanoid motions.

The researchers evaluate their approach through a series of experiments, demonstrating its advantages over prior motion inversion and behavior generation methods. The results show that the proposed latent space representation enables efficient skill transfer from simulation to real-world humanoid control.

Critical Analysis

The paper presents a compelling approach for representing humanoid motion in a physics-based control system. The key strengths of the proposed method are its data efficiency, motion variety, and physical realism.

However, the paper does not explicitly address the potential limitations of the latent space representation. For example, it is unclear how well the method would scale to more complex or diverse motion patterns, or how sensitive it is to variations in the underlying physics simulation. Additionally, the paper does not discuss potential challenges in deploying the system in real-world applications, such as hardware constraints or the need for robust control algorithms.

Further research could explore the generalization capabilities of the latent space representation, as well as its integration with other techniques for humanoid control and animation. Incorporating additional feedback signals, such as sensory information or user input, could also enhance the versatility and responsiveness of the system.

Conclusion

This paper introduces a novel latent space representation for encoding humanoid motion patterns, enabling physics-based simulation and control of virtual characters. The proposed method outperforms existing techniques in terms of data efficiency, motion variety, and physical realism, making it a promising approach for applications in video games, visual effects, and robotics. While the paper highlights the strengths of the method, further research is needed to fully explore its limitations and potential for real-world deployment.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤷

ImitationNet: Unsupervised Human-to-Robot Motion Retargeting via Shared Latent Space

Yashuai Yan, Esteve Valls Mascaro, Dongheui Lee

This paper introduces a novel deep-learning approach for human-to-robot motion retargeting, enabling robots to mimic human poses accurately. Contrary to prior deep-learning-based works, our method does not require paired human-to-robot data, which facilitates its translation to new robots. First, we construct a shared latent space between humans and robots via adaptive contrastive learning that takes advantage of a proposed cross-domain similarity metric between the human and robot poses. Additionally, we propose a consistency term to build a common latent space that captures the similarity of the poses with precision while allowing direct robot motion control from the latent space. For instance, we can generate in-between motion through simple linear interpolation between two projected human poses. We conduct a comprehensive evaluation of robot control from diverse modalities (i.e., texts, RGB videos, and key poses), which facilitates robot control for non-expert users. Our model outperforms existing works regarding human-to-robot retargeting in terms of efficiency and precision. Finally, we implemented our method in a real robot with self-collision avoidance through a whole-body controller to showcase the effectiveness of our approach. More information on our website https://evm7.github.io/UnsH2R/

4/9/2024

cs.RO cs.AI

Hierarchical World Models as Visual Whole-Body Humanoid Controllers

Nicklas Hansen, Jyothir S V, Vlad Sobal, Yann LeCun, Xiaolong Wang, Hao Su

Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty. In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives. Specifically, we propose a hierarchical world model in which a high-level agent generates commands based on visual observations for a low-level agent to execute, both of which are trained with rewards. Our approach produces highly performant control policies in 8 tasks with a simulated 56-DoF humanoid, while synthesizing motions that are broadly preferred by humans. Code and videos: https://nicklashansen.com/rlpuppeteer

6/3/2024

cs.LG cs.CV cs.RO

HumanPlus: Humanoid Shadowing and Imitation from Humans

Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, Chelsea Finn

One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. Yet, doing so has remained challenging in practice due to the complexities in humanoid perception and control, lingering physical gaps between humanoids and humans in morphologies and actuation, and lack of a data pipeline for humanoids to learn autonomous skills from egocentric vision. In this paper, we introduce a full-stack system for humanoids to learn motion and autonomous skills from human data. We first train a low-level policy in simulation via reinforcement learning using existing 40-hour human motion datasets. This policy transfers to the real world and allows humanoid robots to follow human body and hand motion in real time using only a RGB camera, i.e. shadowing. Through shadowing, human operators can teleoperate humanoids to collect whole-body data for learning different tasks in the real world. Using the data collected, we then perform supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously by imitating human skills. We demonstrate the system on our customized 33-DoF 180cm humanoid, autonomously completing tasks such as wearing a shoe to stand up and walk, unloading objects from warehouse racks, folding a sweatshirt, rearranging objects, typing, and greeting another robot with 60-100% success rates using up to 40 demonstrations. Project website: https://humanoid-ai.github.io/

6/18/2024

cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY

🤿

Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning

Liu Qiyuan

The existing Motion Imitation models typically require expert data obtained through MoCap devices, but the vast amount of training data needed is difficult to acquire, necessitating substantial investments of financial resources, manpower, and time. This project combines 3D human pose estimation with reinforcement learning, proposing a novel model that simplifies Motion Imitation into a prediction problem of joint angle values in reinforcement learning. This significantly reduces the reliance on vast amounts of training data, enabling the agent to learn an imitation policy from just a few seconds of video and exhibit strong generalization capabilities. It can quickly apply the learned policy to imitate human arm motions in unfamiliar videos. The model first extracts skeletal motions of human arms from a given video using 3D human pose estimation. These extracted arm motions are then morphologically retargeted onto a robotic manipulator. Subsequently, the retargeted motions are used to generate reference motions. Finally, these reference motions are used to formulate a reinforcement learning problem, enabling the agent to learn a policy for imitating human arm motions. This project excels at imitation tasks and demonstrates robust transferability, accurately imitating human arm motions from other unfamiliar videos. This project provides a lightweight, convenient, efficient, and accurate Motion Imitation model. While simplifying the complex process of Motion Imitation, it achieves notably outstanding performance.

5/3/2024

cs.RO cs.LG