Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes

2404.01543

Published 4/3/2024 by Ziqian Bai, Feitong Tan, Sean Fanello, Rohit Pandey, Mingsong Dou, Shichen Liu, Ping Tan, Yinda Zhang

cs.CV cs.GR

Abstract

3D head avatars built with neural implicit volumetric representations have achieved unprecedented levels of photorealism. However, the computational cost of these methods remains a significant barrier to their widespread adoption, particularly in real-time applications such as virtual reality and teleconferencing. While attempts have been made to develop fast neural rendering approaches for static scenes, these methods cannot be simply employed to support realistic facial expressions, such as in the case of a dynamic facial performance. To address these challenges, we propose a novel fast 3D neural implicit head avatar model that achieves real-time rendering while maintaining fine-grained controllability and high rendering quality. Our key idea lies in the introduction of local hash table blendshapes, which are learned and attached to the vertices of an underlying face parametric model. These per-vertex hash-tables are linearly merged with weights predicted via a CNN, resulting in expression dependent embeddings. Our novel representation enables efficient density and color predictions using a lightweight MLP, which is further accelerated by a hierarchical nearest neighbor search method. Extensive experiments show that our approach runs in real-time while achieving comparable rendering quality to state-of-the-arts and decent results on challenging expressions.

Get summaries of the top AI research delivered straight to your inbox:

Overview

This paper presents an efficient 3D implicit head avatar system that uses a novel mesh-anchored hash table blendshape approach.
The system can generate realistic 3D head avatars in real-time by blending a set of pre-computed blendshapes stored in a hash table.
The key innovations are the use of a hash table to store the blendshapes and anchoring them to the 3D mesh, allowing for efficient retrieval and application.

Plain English Explanation

The researchers have developed a new way to create 3D digital avatars of human heads. Traditionally, creating realistic 3D head models has been computationally intensive and time-consuming. This new system addresses those limitations.

At the core of the approach is a set of pre-defined "blendshapes" - these are essentially 3D facial expressions or poses that can be combined to create different facial expressions. The researchers store these blendshapes in a special data structure called a hash table, which allows them to be accessed and applied very efficiently.

Crucially, the blendshapes are also "anchored" to the underlying 3D mesh of the head model. This means the blendshapes are tightly integrated with the shape of the head, rather than being separate elements. This integration enables the system to generate highly realistic and natural-looking 3D head avatars in real-time.

Technical Explanation

The key components of the researchers' approach are:

Blendshape Extraction: The researchers start by capturing high-quality 3D head scans of a person making various facial expressions. From this data, they extract a set of blendshapes - 3D facial poses that can be linearly combined to generate new expressions.
Hash Table Storage: Rather than storing the blendshapes in a traditional array or other data structure, the researchers use a hash table. This allows for very fast retrieval of the relevant blendshapes when generating a new facial expression.
Mesh Anchoring: The researchers carefully align the blendshapes with the underlying 3D mesh of the head model. This "anchoring" ensures the blendshapes deform the mesh in a natural and realistic way when applied.
Real-Time Rendering: By leveraging the efficient blendshape retrieval and application, the system can generate new 3D head avatars in real-time, enabling applications like virtual avatars and video conferencing.

Critical Analysis

The researchers acknowledge some limitations of their approach. The hash table-based blendshape storage is efficient, but it does limit the total number of blendshapes that can be practically stored. This could constrain the range of facial expressions that can be generated.

Additionally, the blendshape extraction process relies on high-quality 3D scans, which may not be readily available in all scenarios. Further research could explore more automated or democratized methods for generating the initial blendshape data.

That said, the core innovations around the hash table storage and mesh anchoring represent a significant advancement in the state-of-the-art for real-time 3D head avatar generation. The results demonstrate highly realistic and natural-looking avatars, which could have wide-ranging applications in virtual reality, gaming, and remote communication.

Conclusion

This paper presents a novel and efficient approach to generating 3D head avatars in real-time. By using a hash table-based blendshape representation that is tightly integrated with the underlying 3D mesh, the researchers have developed a system that can create highly realistic digital heads without the computational burden of traditional methods.

While there are some limitations to the current approach, the core innovations represent an important step forward in making high-quality 3D avatars more accessible and practical for a wide range of applications. As virtual and augmented reality technologies continue to evolve, solutions like this will play a crucial role in enabling more natural and immersive digital experiences.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤿

3D Gaussian Blendshapes for Head Avatar Animation

Shengjie Ma, Yanlin Weng, Tianjia Shao, Kun Zhou

We introduce 3D Gaussian blendshapes for modeling photorealistic head avatars. Taking a monocular video as input, we learn a base head model of neutral expression, along with a group of expression blendshapes, each of which corresponds to a basis expression in classical parametric face models. Both the neutral model and expression blendshapes are represented as 3D Gaussians, which contain a few properties to depict the avatar appearance. The avatar model of an arbitrary expression can be effectively generated by combining the neutral model and expression blendshapes through linear blending of Gaussians with the expression coefficients. High-fidelity head avatar animations can be synthesized in real time using Gaussian splatting. Compared to state-of-the-art methods, our Gaussian blendshape representation better captures high-frequency details exhibited in input video, and achieves superior rendering performance.

5/3/2024

cs.GR cs.CV

🤷

InstantAvatar: Efficient 3D Head Reconstruction via Surface Rendering

Antonio Canela, Pol Caselles, Ibrar Malik, Eduard Ramon, Jaime Garc'ia, Jordi S'anchez-Riera, Gil Triginer, Francesc Moreno-Noguer

Recent advances in full-head reconstruction have been obtained by optimizing a neural field through differentiable surface or volume rendering to represent a single scene. While these techniques achieve an unprecedented accuracy, they take several minutes, or even hours, due to the expensive optimization process required. In this work, we introduce InstantAvatar, a method that recovers full-head avatars from few images (down to just one) in a few seconds on commodity hardware. In order to speed up the reconstruction process, we propose a system that combines, for the first time, a voxel-grid neural field representation with a surface renderer. Notably, a naive combination of these two techniques leads to unstable optimizations that do not converge to valid solutions. In order to overcome this limitation, we present a novel statistical model that learns a prior distribution over 3D head signed distance functions using a voxel-grid based architecture. The use of this prior model, in combination with other design choices, results into a system that achieves 3D head reconstructions with comparable accuracy as the state-of-the-art with a 100x speed-up.

4/8/2024

cs.CV

MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

Cong Wang, Di Kang, He-Yi Sun, Shen-Han Qian, Zi-Xuan Wang, Linchao Bao, Song-Hai Zhang

Creating high-fidelity head avatars from multi-view videos is a core issue for many AR/VR applications. However, existing methods usually struggle to obtain high-quality renderings for all different head components simultaneously since they use one single representation to model components with drastically different characteristics (e.g., skin vs. hair). In this paper, we propose a Hybrid Mesh-Gaussian Head Avatar (MeGA) that models different head components with more suitable representations. Specifically, we select an enhanced FLAME mesh as our facial representation and predict a UV displacement map to provide per-vertex offsets for improved personalized geometric details. To achieve photorealistic renderings, we obtain facial colors using deferred neural rendering and disentangle neural textures into three meaningful parts. For hair modeling, we first build a static canonical hair using 3D Gaussian Splatting. A rigid transformation and an MLP-based deformation field are further applied to handle complex dynamic expressions. Combined with our occlusion-aware blending, MeGA generates higher-fidelity renderings for the whole head and naturally supports more downstream tasks. Experiments on the NeRSemble dataset demonstrate the effectiveness of our designs, outperforming previous state-of-the-art methods and supporting various editing functionalities, including hairstyle alteration and texture editing.

5/1/2024

cs.CV

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image

Chong Bao, Yinda Zhang, Yuan Li, Xiyu Zhang, Bangbang Yang, Hujun Bao, Marc Pollefeys, Guofeng Zhang, Zhaopeng Cui

Recently, we have witnessed the explosive growth of various volumetric representations in modeling animatable head avatars. However, due to the diversity of frameworks, there is no practical method to support high-level applications like 3D head avatar editing across different representations. In this paper, we propose a generic avatar editing approach that can be universally applied to various 3DMM driving volumetric head avatars. To achieve this goal, we design a novel expression-aware modification generative model, which enables lift 2D editing from a single image to a consistent 3D modification field. To ensure the effectiveness of the generative modification process, we develop several techniques, including an expression-dependent modification distillation scheme to draw knowledge from the large-scale head avatar model and 2D facial texture editing tools, implicit latent space guidance to enhance model convergence, and a segmentation-based loss reweight strategy for fine-grained texture inversion. Extensive experiments demonstrate that our method delivers high-quality and consistent results across multiple expression and viewpoints. Project page: https://zju3dv.github.io/geneavatar/

4/3/2024

cs.CV