Animatable and Relightable Gaussians for High-fidelity Human Avatar Modeling

Read original: arXiv:2311.16096 - Published 5/28/2024 by Zhe Li, Yipengjing Sun, Zerong Zheng, Lizhen Wang, Shengping Zhang, Yebin Liu

Overview

This paper presents a new method for creating high-fidelity, animatable 3D human avatars using pose-dependent Gaussian maps.
The approach learns a set of Gaussian distributions that can be deformed and combined to create detailed, realistic human models that can be animated.
The method outperforms prior work on several key metrics, including visual quality, animation fidelity, and computational efficiency.

Plain English Explanation

The paper describes a new way to create realistic, animated 3D human avatars. Current methods for generating these avatars often struggle to capture fine details or move smoothly when animated. This new approach uses a technique called "Gaussian maps" to build the avatars.

Essentially, the researchers train a neural network to learn the shape and position of a set of Gaussian distributions (bell-shaped curves) that can be combined to form a detailed 3D human model. As the human model is animated, these Gaussian distributions can be deformed and blended together to create natural-looking movement and preserve details like facial features and clothing folds.

Compared to previous work, this Animatable Gaussians method produces avatars that look more realistic, move more smoothly, and can be generated more efficiently. This could enable better virtual characters in games, movies, and other applications that rely on realistic human avatars.

Technical Explanation

The key innovation in this paper is the use of pose-dependent Gaussian maps to represent the underlying 3D human geometry. Rather than using a traditional polygon mesh, the method models the avatar as a set of Gaussian distributions that can be deformed based on the pose of the character.

The researchers train a neural network to predict the parameters (position, scale, orientation) of these Gaussian distributions from input 3D scans or renderings of humans in various poses. During animation, the Gaussian distributions are transformed according to the character's pose, allowing the avatar to move naturally while preserving fine details.

Experimental results show that this Animatable Gaussian approach outperforms prior work on metrics like visual quality, animation fidelity, and computational efficiency. The GomAvatar and LayGa methods are identified as particularly relevant prior art.

Critical Analysis

The paper presents a compelling solution to the challenge of creating high-fidelity, animatable human avatars. The use of Gaussian maps is a novel and effective way to capture fine details while enabling smooth animation.

However, the approach does have some limitations. The training process requires 3D scans of humans, which may not be readily available in all scenarios. Additionally, the method may struggle to accurately model complex garments or accessories, as the Gaussian distributions may not be able to fully capture their detailed geometry.

Further research could explore ways to make the training process more flexible, perhaps by leveraging 2D images or other data sources. Investigating how to better model intricate clothing and accessories would also be a valuable area of study.

Overall, this work represents a significant advancement in the field of 3D human avatar modeling and animation, with the potential to enable more realistic virtual characters across a wide range of applications.

Conclusion

The Animatable Gaussians method presents a novel approach to generating high-fidelity, animatable 3D human avatars using pose-dependent Gaussian maps. By modeling the underlying geometry as a set of deformable Gaussian distributions, the technique is able to produce realistic, smoothly animated avatars that outperform previous methods.

This innovation could have far-reaching implications, enabling more immersive and believable virtual characters in games, movies, and other applications that rely on human avatars. As the field of 3D human modeling continues to evolve, the insights and techniques presented in this paper are likely to be influential for future research and development.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Animatable and Relightable Gaussians for High-fidelity Human Avatar Modeling

Zhe Li, Yipengjing Sun, Zerong Zheng, Lizhen Wang, Shengping Zhang, Yebin Liu

Modeling animatable human avatars from RGB videos is a long-standing and challenging problem. Recent works usually adopt MLP-based neural radiance fields (NeRF) to represent 3D humans, but it remains difficult for pure MLPs to regress pose-dependent garment details. To this end, we introduce Animatable Gaussians, a new avatar representation that leverages powerful 2D CNNs and 3D Gaussian splatting to create high-fidelity avatars. To associate 3D Gaussians with the animatable avatar, we learn a parametric template from the input videos, and then parameterize the template on two front & back canonical Gaussian maps where each pixel represents a 3D Gaussian. The learned template is adaptive to the wearing garments for modeling looser clothes like dresses. Such template-guided 2D parameterization enables us to employ a powerful StyleGAN-based CNN to learn the pose-dependent Gaussian maps for modeling detailed dynamic appearances. Furthermore, we introduce a pose projection strategy for better generalization given novel poses. To tackle the realistic relighting of animatable avatars, we introduce physically-based rendering into the avatar representation for decomposing avatar materials and environment illumination. Overall, our method can create lifelike avatars with dynamic, realistic, generalized and relightable appearances. Experiments show that our method outperforms other state-of-the-art approaches.

5/28/2024

⛏️

Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars

Yang Liu, Xiang Huang, Minghan Qin, Qinwei Lin, Haoqian Wang

Neural radiance fields are capable of reconstructing high-quality drivable human avatars but are expensive to train and render and not suitable for multi-human scenes with complex shadows. To reduce consumption, we propose Animatable 3D Gaussian, which learns human avatars from input images and poses. We extend 3D Gaussians to dynamic human scenes by modeling a set of skinned 3D Gaussians and a corresponding skeleton in canonical space and deforming 3D Gaussians to posed space according to the input poses. We introduce a multi-head hash encoder for pose-dependent shape and appearance and a time-dependent ambient occlusion module to achieve high-quality reconstructions in scenes containing complex motions and dynamic shadows. On both novel view synthesis and novel pose synthesis tasks, our method achieves higher reconstruction quality than InstantAvatar with less training time (1/60), less GPU memory (1/4), and faster rendering speed (7x). Our method can be easily extended to multi-human scenes and achieve comparable novel view synthesis results on a scene with ten people in only 25 seconds of training.

7/30/2024

Interactive Rendering of Relightable and Animatable Gaussian Avatars

Youyi Zhan, Tianjia Shao, He Wang, Yin Yang, Kun Zhou

Creating relightable and animatable avatars from multi-view or monocular videos is a challenging task for digital human creation and virtual reality applications. Previous methods rely on neural radiance fields or ray tracing, resulting in slow training and rendering processes. By utilizing Gaussian Splatting, we propose a simple and efficient method to decouple body materials and lighting from sparse-view or monocular avatar videos, so that the avatar can be rendered simultaneously under novel viewpoints, poses, and lightings at interactive frame rates (6.9 fps). Specifically, we first obtain the canonical body mesh using a signed distance function and assign attributes to each mesh vertex. The Gaussians in the canonical space then interpolate from nearby body mesh vertices to obtain the attributes. We subsequently deform the Gaussians to the posed space using forward skinning, and combine the learnable environment light with the Gaussian attributes for shading computation. To achieve fast shadow modeling, we rasterize the posed body mesh from dense viewpoints to obtain the visibility. Our approach is not only simple but also fast enough to allow interactive rendering of avatar animation under environmental light changes. Experiments demonstrate that, compared to previous works, our method can render higher quality results at a faster speed on both synthetic and real datasets.

7/16/2024

✨

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal

Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (e.g., flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (e.g., colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution.

4/1/2024