Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping

2405.12069

Published 5/22/2024 by Tianhao Wu, Jing Yang, Zhilin Guo, Jingyi Wan, Fangcheng Zhong, Cengiz Oztireli

Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping

Abstract

By equipping the most recent 3D Gaussian Splatting representation with head 3D morphable models (3DMM), existing methods manage to create head avatars with high fidelity. However, most existing methods only reconstruct a head without the body, substantially limiting their application scenarios. We found that naively applying Gaussians to model the clothed chest and shoulders tends to result in blurry reconstruction and noisy floaters under novel poses. This is because of the fundamental limitation of Gaussians and point clouds -- each Gaussian or point can only have a single directional radiance without spatial variance, therefore an unnecessarily large number of them is required to represent complicated spatially varying texture, even for simple geometry. In contrast, we propose to model the body part with a neural texture that consists of coarse and pose-dependent fine colors. To properly render the body texture for each view and pose without accurate geometry nor UV mapping, we optimize another sparse set of Gaussians as anchors that constrain the neural warping field that maps image plane coordinates to the texture space. We demonstrate that Gaussian Head & Shoulders can fit the high-frequency details on the clothed upper body with high fidelity and potentially improve the accuracy and fidelity of the head region. We evaluate our method with casual phone-captured and internet videos and show our method archives superior reconstruction quality and robustness in both self and cross reenactment tasks. To fully utilize the efficient rendering speed of Gaussian splatting, we additionally propose an accelerated inference method of our trained model without Multi-Layer Perceptron (MLP) queries and reach a stable rendering speed of around 130 FPS for any subjects.

Create account to get full access

Overview

This paper introduces a novel approach for creating high-fidelity neural upper body avatars, called "Gaussian Head & Shoulders".
The key innovations include the use of Anchor Gaussian Guided Texture Warping to generate realistic textures and an Implicit Mesh Representation for efficient rendering.
The authors demonstrate impressive results on a range of avatar animation tasks, including facial expressions, head movements, and upper body motion.

Plain English Explanation

The paper describes a new way to create realistic-looking digital avatars, particularly for the head and upper body region. The core idea is to use a neural network model that can generate high-quality textures and efficiently render the avatar in real-time, even when the avatar is moving and changing expressions.

The key innovations include a technique called "Anchor Gaussian Guided Texture Warping" which helps the model generate realistic skin textures that move and deform naturally as the avatar moves. The model also uses an "Implicit Mesh Representation" to represent the 3D shape of the avatar, which allows for efficient rendering compared to traditional mesh-based approaches.

Overall, the goal is to create 3D Gaussian Blendshapes for the head and upper body that look and behave very realistically, enabling high-quality 3D Animatable Avatars for applications like virtual reality, games, and digital communications.

Technical Explanation

The paper introduces a novel neural architecture called "Gaussian Head & Shoulders" for generating high-fidelity upper body avatars. The key technical components include:

Anchor Gaussian Guided Texture Warping: The model uses a set of "anchor" Gaussian distributions to guide the deformation of the avatar's texture as it moves and changes expression. This allows for realistic skin deformation and wrinkle patterns.
Implicit Mesh Representation: Rather than using a traditional polygon mesh to represent the 3D shape of the avatar, the model employs an implicit neural representation. This enables efficient rendering and animation of the avatar.
Neural Radiance Field: The model utilizes a neural radiance field to capture the appearance of the avatar, including color, lighting, and view-dependent effects.

Through extensive experiments, the authors demonstrate the effectiveness of their approach, showing high-quality results on a range of avatar animation tasks, including facial expressions, head movements, and upper body motion.

Critical Analysis

The paper presents a compelling approach for creating realistic-looking digital avatars, addressing key challenges in texture generation, 3D representation, and real-time rendering. The use of Anchor Gaussian Guided Texture Warping and the Implicit Mesh Representation are particularly innovative and show promising results.

However, the paper does not address some potential limitations and areas for further research:

Generalization to Diverse Body Types: The avatar models presented in the paper are based on a relatively narrow range of body types and skin tones. It would be important to evaluate the model's ability to generalize to a more diverse population.
Handling Occlusion and Complex Dynamics: While the model handles head and upper body motion well, it is unclear how it would handle more complex full-body animations, including occlusions and interactions with external objects or environments.
Data and Privacy Considerations: The creation of high-fidelity digital avatars raises important questions about data privacy and the ethical use of such technology, which the paper does not address.

Overall, the "Gaussian Head & Shoulders" approach represents an exciting advancement in the field of neural avatar generation, but further research is needed to address these potential limitations and ensure the responsible development of such technologies.

Conclusion

The "Gaussian Head & Shoulders" paper presents a novel and effective approach for creating high-fidelity neural upper body avatars. By leveraging Anchor Gaussian Guided Texture Warping and Implicit Mesh Representation, the model is able to generate realistic-looking avatars with convincing facial expressions, head movements, and upper body motion.

The technical innovations introduced in this paper have the potential to significantly advance the state-of-the-art in avatar generation, enabling more immersive and lifelike experiences in virtual reality, gaming, and digital communication applications. While the paper does not address all the potential limitations and ethical considerations, it represents an important step forward in the field of neural avatar research.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation

Jie Wang, Jiu-Cheng Xie, Xianyan Li, Feng Xu, Chi-Man Pun, Hao Gao

Constructing vivid 3D head avatars for given subjects and realizing a series of animations on them is valuable yet challenging. This paper presents GaussianHead, which models the actional human head with anisotropic 3D Gaussians. In our framework, a motion deformation field and multi-resolution tri-plane are constructed respectively to deal with the head's dynamic geometry and complex texture. Notably, we impose an exclusive derivation scheme on each Gaussian, which generates its multiple doppelgangers through a set of learnable parameters for position transformation. With this design, we can compactly and accurately encode the appearance information of Gaussians, even those fitting the head's particular components with sophisticated structures. In addition, an inherited derivation strategy for newly added Gaussians is adopted to facilitate training acceleration. Extensive experiments show that our method can produce high-fidelity renderings, outperforming state-of-the-art approaches in reconstruction, cross-identity reenactment, and novel view synthesis tasks. Our code is available at: https://github.com/chiehwangs/gaussian-head.

5/31/2024

cs.CV

PSAvatar: A Point-based Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting

Zhongyuan Zhao, Zhenyu Bao, Qing Li, Guoping Qiu, Kanglin Liu

Despite much progress, achieving real-time high-fidelity head avatar animation is still difficult and existing methods have to trade-off between speed and quality. 3DMM based methods often fail to model non-facial structures such as eyeglasses and hairstyles, while neural implicit models suffer from deformation inflexibility and rendering inefficiency. Although 3D Gaussian has been demonstrated to possess promising capability for geometry representation and radiance field reconstruction, applying 3D Gaussian in head avatar creation remains a major challenge since it is difficult for 3D Gaussian to model the head shape variations caused by changing poses and expressions. In this paper, we introduce PSAvatar, a novel framework for animatable head avatar creation that utilizes discrete geometric primitive to create a parametric morphable shape model and employs 3D Gaussian for fine detail representation and high fidelity rendering. The parametric morphable shape model is a Point-based Morphable Shape Model (PMSM) which uses points instead of meshes for 3D representation to achieve enhanced representation flexibility. The PMSM first converts the FLAME mesh to points by sampling on the surfaces as well as off the meshes to enable the reconstruction of not only surface-like structures but also complex geometries such as eyeglasses and hairstyles. By aligning these points with the head shape in an analysis-by-synthesis manner, the PMSM makes it possible to utilize 3D Gaussian for fine detail representation and appearance modeling, thus enabling the creation of high-fidelity avatars. We show that PSAvatar can reconstruct high-fidelity head avatars of a variety of subjects and the avatars can be animated in real-time ($ge$ 25 fps at a resolution of 512 $times$ 512 ).

6/26/2024

cs.GR cs.CV

MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

Cong Wang, Di Kang, He-Yi Sun, Shen-Han Qian, Zi-Xuan Wang, Linchao Bao, Song-Hai Zhang

Creating high-fidelity head avatars from multi-view videos is a core issue for many AR/VR applications. However, existing methods usually struggle to obtain high-quality renderings for all different head components simultaneously since they use one single representation to model components with drastically different characteristics (e.g., skin vs. hair). In this paper, we propose a Hybrid Mesh-Gaussian Head Avatar (MeGA) that models different head components with more suitable representations. Specifically, we select an enhanced FLAME mesh as our facial representation and predict a UV displacement map to provide per-vertex offsets for improved personalized geometric details. To achieve photorealistic renderings, we obtain facial colors using deferred neural rendering and disentangle neural textures into three meaningful parts. For hair modeling, we first build a static canonical hair using 3D Gaussian Splatting. A rigid transformation and an MLP-based deformation field are further applied to handle complex dynamic expressions. Combined with our occlusion-aware blending, MeGA generates higher-fidelity renderings for the whole head and naturally supports more downstream tasks. Experiments on the NeRSemble dataset demonstrate the effectiveness of our designs, outperforming previous state-of-the-art methods and supporting various editing functionalities, including hairstyle alteration and texture editing.

5/1/2024

cs.CV

✨

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal

Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (e.g., flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (e.g., colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution.

4/1/2024

cs.CV cs.GR cs.LG