ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering

2312.05941

Published 4/16/2024 by Haokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, Marc Habermann

ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering

Abstract

Real-time rendering of photorealistic and controllable human avatars stands as a cornerstone in Computer Vision and Graphics. While recent advances in neural implicit rendering have unlocked unprecedented photorealism for digital avatars, real-time performance has mostly been demonstrated for static scenes only. To address this, we propose ASH, an animatable Gaussian splatting approach for photorealistic rendering of dynamic humans in real-time. We parameterize the clothed human as animatable 3D Gaussians, which can be efficiently splatted into image space to generate the final rendering. However, naively learning the Gaussian parameters in 3D space poses a severe challenge in terms of compute. Instead, we attach the Gaussians onto a deformable character model, and learn their parameters in 2D texture space, which allows leveraging efficient 2D convolutional architectures that easily scale with the required number of Gaussians. We benchmark ASH with competing methods on pose-controllable avatars, demonstrating that our method outperforms existing real-time methods by a large margin and shows comparable or even better results than offline methods.

Create account to get full access

Overview

This paper presents a new method called ASH (Animatable Gaussian Splats for Human Rendering) that can efficiently and photorealistically render human characters.
ASH uses a novel representation called "animatable Gaussian splats" to model the appearance and geometry of human bodies.
The method can be used for tasks like real-time animation, virtual try-on, and mixed reality applications.

Plain English Explanation

The researchers in this paper have developed a new way to create digital human characters that look very realistic and can be easily animated. Their approach, called ASH, uses a special type of visual representation called "animatable Gaussian splats" to model the shape and appearance of the human body.

Traditionally, creating highly detailed and realistic digital humans has been challenging and computationally expensive. ASH solves this problem by using a more efficient representation that can capture the nuances of human appearance and movement. This makes it possible to render human characters in real-time for applications like video games, virtual try-on of clothing, or mixed reality experiences where digital humans interact with the physical world.

At the core of ASH is the idea of "Gaussian splats" - essentially, overlapping blobs of color and texture that together form the surface of the human body. These splats are "animatable," meaning they can be deformed and moved to match the motion of the character. By using this flexible representation, ASH can generate photorealistic renderings of humans without requiring the high computational power that more traditional approaches need.

The researchers demonstrate the capabilities of ASH through various experiments, showing how it can be used to create lifelike digital humans that can be efficiently animated and integrated into different applications. This work represents an important step forward in making high-quality virtual humans more accessible and practical for a wide range of uses.

Technical Explanation

The key innovation in this paper is the "animatable Gaussian splat" representation used by the ASH method. This representation models the geometry and appearance of the human body using a set of overlapping, deformable Gaussian primitives.

Each Gaussian splat encodes information about the local shape, color, and texture of the human surface. By deforming and blending these splats, the method can capture the complex geometry and motion of the human body in an efficient, continuous manner.

The paper describes how the Gaussian splat parameters are learned from 3D scan data, allowing the method to faithfully reproduce the appearance of real people. The splats are then animated by applying linear blend skinning, which enables real-time rendering of the human character.

Compared to traditional polygon-based or neural rendering approaches, the Gaussian splat representation offers several advantages. It is more memory-efficient, allowing for detailed human models to be stored and rendered in real-time. The continuous, differentiable nature of the splats also enables smooth deformations and high-quality results.

The authors validate the ASH method through extensive experiments, demonstrating its ability to generate photorealistic renderings of humans that can be efficiently animated and integrated into virtual environments. They also show how the approach can be applied to tasks like virtual try-on and mixed reality applications.

Critical Analysis

One potential limitation of the ASH method is that it relies on 3D scan data to learn the Gaussian splat parameters. This means that generating high-quality digital humans from scratch, without access to real-world 3D scans, may still be challenging.

Additionally, while the paper demonstrates impressive results, there may be cases where the Gaussian splat representation is not flexible enough to capture all the nuances of human appearance and motion. Further research may be needed to extend the method to handle more complex deformations or handle a wider range of human body types and poses.

That said, the core ideas behind ASH represent an important step forward in the field of digital human rendering. By developing a more efficient and continuous representation, the authors have made significant progress towards the goal of creating highly realistic and animatable virtual humans. As the technology continues to evolve, we can expect to see even more advanced and practical applications of this work.

Conclusion

The ASH method presented in this paper offers a novel and efficient approach to rendering photorealistic digital humans. By using animatable Gaussian splats to model the geometry and appearance of the human body, the authors have created a representation that can be animated in real-time while maintaining a high level of detail and visual fidelity.

This work has important implications for a wide range of applications, from video games and virtual try-on to mixed reality experiences. By making it easier and more efficient to create high-quality virtual humans, ASH has the potential to enable new forms of interaction and storytelling in the digital realm.

As the field of digital human rendering continues to advance, we can expect to see even more innovative approaches like ASH emerge, further pushing the boundaries of what is possible in terms of creating lifelike and compelling virtual characters. This research represents an exciting step forward in this important area of computer graphics and visualization.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, Siyu Tang

We introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS). Existing methods based on neural radiance fields (NeRFs) achieve high-quality novel-view/novel-pose image synthesis but often require days of training, and are extremely slow at inference time. Recently, the community has explored fast grid structures for efficient training of clothed avatars. Albeit being extremely fast at training, these methods can barely achieve an interactive rendering frame rate with around 15 FPS. In this paper, we use 3D Gaussian Splatting and learn a non-rigid deformation network to reconstruct animatable clothed human avatars that can be trained within 30 minutes and rendered at real-time frame rates (50+ FPS). Given the explicit nature of our representation, we further introduce as-isometric-as-possible regularizations on both the Gaussian mean vectors and the covariance matrices, enhancing the generalization of our model on highly articulated unseen poses. Experimental results show that our method achieves comparable and even better performance compared to state-of-the-art approaches on animatable avatar creation from a monocular input, while being 400x and 250x faster in training and inference, respectively.

4/5/2024

cs.CV

✨

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal

Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (e.g., flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (e.g., colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution.

4/1/2024

cs.CV cs.GR cs.LG

Animatable and Relightable Gaussians for High-fidelity Human Avatar Modeling

Zhe Li, Yipengjing Sun, Zerong Zheng, Lizhen Wang, Shengping Zhang, Yebin Liu

Modeling animatable human avatars from RGB videos is a long-standing and challenging problem. Recent works usually adopt MLP-based neural radiance fields (NeRF) to represent 3D humans, but it remains difficult for pure MLPs to regress pose-dependent garment details. To this end, we introduce Animatable Gaussians, a new avatar representation that leverages powerful 2D CNNs and 3D Gaussian splatting to create high-fidelity avatars. To associate 3D Gaussians with the animatable avatar, we learn a parametric template from the input videos, and then parameterize the template on two front & back canonical Gaussian maps where each pixel represents a 3D Gaussian. The learned template is adaptive to the wearing garments for modeling looser clothes like dresses. Such template-guided 2D parameterization enables us to employ a powerful StyleGAN-based CNN to learn the pose-dependent Gaussian maps for modeling detailed dynamic appearances. Furthermore, we introduce a pose projection strategy for better generalization given novel poses. To tackle the realistic relighting of animatable avatars, we introduce physically-based rendering into the avatar representation for decomposing avatar materials and environment illumination. Overall, our method can create lifelike avatars with dynamic, realistic, generalized and relightable appearances. Experiments show that our method outperforms other state-of-the-art approaches.

5/28/2024

cs.CV cs.GR

PSAvatar: A Point-based Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting

Zhongyuan Zhao, Zhenyu Bao, Qing Li, Guoping Qiu, Kanglin Liu

Despite much progress, achieving real-time high-fidelity head avatar animation is still difficult and existing methods have to trade-off between speed and quality. 3DMM based methods often fail to model non-facial structures such as eyeglasses and hairstyles, while neural implicit models suffer from deformation inflexibility and rendering inefficiency. Although 3D Gaussian has been demonstrated to possess promising capability for geometry representation and radiance field reconstruction, applying 3D Gaussian in head avatar creation remains a major challenge since it is difficult for 3D Gaussian to model the head shape variations caused by changing poses and expressions. In this paper, we introduce PSAvatar, a novel framework for animatable head avatar creation that utilizes discrete geometric primitive to create a parametric morphable shape model and employs 3D Gaussian for fine detail representation and high fidelity rendering. The parametric morphable shape model is a Point-based Morphable Shape Model (PMSM) which uses points instead of meshes for 3D representation to achieve enhanced representation flexibility. The PMSM first converts the FLAME mesh to points by sampling on the surfaces as well as off the meshes to enable the reconstruction of not only surface-like structures but also complex geometries such as eyeglasses and hairstyles. By aligning these points with the head shape in an analysis-by-synthesis manner, the PMSM makes it possible to utilize 3D Gaussian for fine detail representation and appearance modeling, thus enabling the creation of high-fidelity avatars. We show that PSAvatar can reconstruct high-fidelity head avatars of a variety of subjects and the avatars can be animated in real-time ($ge$ 25 fps at a resolution of 512 $times$ 512 ).

6/26/2024

cs.GR cs.CV