LAGA: Layered 3D Avatar Generation and Customization via Gaussian Splatting

2405.12663

Published 5/22/2024 by Jia Gong, Shenyu Ji, Lin Geng Foo, Kang Chen, Hossein Rahmani, Jun Liu

🛸

Abstract

Creating and customizing a 3D clothed avatar from textual descriptions is a critical and challenging task. Traditional methods often treat the human body and clothing as inseparable, limiting users' ability to freely mix and match garments. In response to this limitation, we present LAyered Gaussian Avatar (LAGA), a carefully designed framework enabling the creation of high-fidelity decomposable avatars with diverse garments. By decoupling garments from avatar, our framework empowers users to conviniently edit avatars at the garment level. Our approach begins by modeling the avatar using a set of Gaussian points organized in a layered structure, where each layer corresponds to a specific garment or the human body itself. To generate high-quality garments for each layer, we introduce a coarse-to-fine strategy for diverse garment generation and a novel dual-SDS loss function to maintain coherence between the generated garments and avatar components, including the human body and other garments. Moreover, we introduce three regularization losses to guide the movement of Gaussians for garment transfer, allowing garments to be freely transferred to various avatars. Extensive experimentation demonstrates that our approach surpasses existing methods in the generation of 3D clothed humans.

Create account to get full access

Overview

Developing 3D clothed avatars from text descriptions is a crucial but challenging task.
Traditional methods often treat the human body and clothing as inseparable, limiting users' ability to mix and match garments.
To address this limitation, the researchers present the LAyered Gaussian Avatar (LAGA) framework, which enables the creation of high-fidelity, decomposable avatars with diverse garments.

Plain English Explanation

The paper describes a new way to create 3D avatars, or digital representations of people, that can be dressed in a variety of clothing. Traditional methods for creating 3D avatars have treated the human body and the clothing as a single, inseparable unit. This makes it difficult for users to freely mix and match different garments or outfits on their avatar.

To solve this problem, the researchers developed the LAGA framework, which models the avatar using a set of Gaussian points (a mathematical representation of the shape) organized in layers. Each layer corresponds to a specific garment or the human body itself. This allows the avatar and the clothing to be edited independently, giving users more flexibility to customize their avatar's appearance.

To generate high-quality garments for each layer, the researchers use a coarse-to-fine strategy, starting with a rough garment shape and then refining it. They also introduce a novel "dual-SDS loss function" to help maintain coherence between the generated garments and the avatar's other components, such as the body and other garments.

Furthermore, the researchers introduce three regularization losses to guide the movement of the Gaussian points when transferring garments from one avatar to another. This allows garments to be freely transferred between different avatars without losing their shape or fit.

Overall, the LAGA framework provides a more flexible and powerful way to create 3D clothed avatars, which could have applications in areas like gaming, virtual fashion, and digital entertainment.

Technical Explanation

The LAGA framework models the avatar using a set of Gaussian points organized in a layered structure, where each layer corresponds to a specific garment or the human body itself. This decoupling of the avatar and clothing allows for greater flexibility in editing and customizing the avatar's appearance.

To generate high-quality garments for each layer, the researchers introduce a coarse-to-fine strategy for diverse garment generation. This involves starting with a rough garment shape and then refining it through multiple steps to achieve the desired level of detail and realism.

The researchers also propose a novel dual-SDS loss function to maintain coherence between the generated garments and the avatar's other components, including the human body and other garments. This helps ensure that the added clothing integrates seamlessly with the rest of the avatar.

Moreover, the researchers introduce three regularization losses to guide the movement of the Gaussian points during garment transfer. This allows garments to be freely transferred between different avatars without losing their shape or fit, enabling users to mix and match garments across a variety of avatars.

The researchers extensively evaluate their LAGA framework and compare it to existing methods, demonstrating its superiority in generating high-quality 3D clothed humans.

Critical Analysis

The paper presents a robust and well-designed framework for creating 3D clothed avatars from textual descriptions. The key strengths of the LAGA approach are its ability to decouple the avatar and clothing, allowing for greater customization, and its use of a layered Gaussian representation to enable flexible garment transfer.

However, the paper does not address the potential limitations of the Gaussian representation, such as its ability to capture complex surface details or handle extreme garment deformations. Additionally, the researchers do not provide an in-depth discussion of the computational complexity or real-time performance of their framework, which could be important considerations for certain applications.

Further research could explore the integration of the LAGA framework with other state-of-the-art avatar generation techniques, such as the Gaussian Head and Shoulders or 3DGS approaches, to create even more realistic and versatile 3D avatars. Additionally, investigating the application of the LAGA framework to other domains, such as text-to-layer-wise 3D clothing or animatable 3D Gaussian avatars, could further expand its utility and impact.

Conclusion

The LAyered Gaussian Avatar (LAGA) framework represents a significant advancement in the field of 3D clothed avatar generation. By decoupling the avatar and clothing, the researchers have created a flexible and powerful system that allows users to freely mix and match garments on their digital characters. The use of a layered Gaussian representation, along with the novel garment generation and transfer techniques, enables the creation of high-fidelity, animatable 3D avatars.

This research has the potential to impact a wide range of applications, from gaming and virtual fashion to digital entertainment and beyond. As the demand for realistic and customizable 3D avatars continues to grow, the LAGA framework stands out as a promising solution that could revolutionize the way we create and interact with virtual 3D characters.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔄

LayGA: Layered Gaussian Avatars for Animatable Clothing Transfer

Siyou Lin, Zhe Li, Zhaoqi Su, Zerong Zheng, Hongwen Zhang, Yebin Liu

Animatable clothing transfer, aiming at dressing and animating garments across characters, is a challenging problem. Most human avatar works entangle the representations of the human body and clothing together, which leads to difficulties for virtual try-on across identities. What's worse, the entangled representations usually fail to exactly track the sliding motion of garments. To overcome these limitations, we present Layered Gaussian Avatars (LayGA), a new representation that formulates body and clothing as two separate layers for photorealistic animatable clothing transfer from multi-view videos. Our representation is built upon the Gaussian map-based avatar for its excellent representation power of garment details. However, the Gaussian map produces unstructured 3D Gaussians distributed around the actual surface. The absence of a smooth explicit surface raises challenges in accurate garment tracking and collision handling between body and garments. Therefore, we propose two-stage training involving single-layer reconstruction and multi-layer fitting. In the single-layer reconstruction stage, we propose a series of geometric constraints to reconstruct smooth surfaces and simultaneously obtain the segmentation between body and clothing. Next, in the multi-layer fitting stage, we train two separate models to represent body and clothing and utilize the reconstructed clothing geometries as 3D supervision for more accurate garment tracking. Furthermore, we propose geometry and rendering layers for both high-quality geometric reconstruction and high-fidelity rendering. Overall, the proposed LayGA realizes photorealistic animations and virtual try-on, and outperforms other baseline methods. Our project page is https://jsnln.github.io/layga/index.html.

5/14/2024

cs.CV

ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians

Yufei Liu, Junshu Tang, Chu Zheng, Shijie Zhang, Jinkun Hao, Junwei Zhu, Dongjin Huang

High-fidelity 3D garment synthesis from text is desirable yet challenging for digital avatar creation. Recent diffusion-based approaches via Score Distillation Sampling (SDS) have enabled new possibilities but either intricately couple with human body or struggle to reuse. We introduce ClotheDreamer, a 3D Gaussian-based method for generating wearable, production-ready 3D garment assets from text prompts. We propose a novel representation Disentangled Clothe Gaussian Splatting (DCGS) to enable separate optimization. DCGS represents clothed avatar as one Gaussian model but freezes body Gaussian splats. To enhance quality and completeness, we incorporate bidirectional SDS to supervise clothed avatar and garment RGBD renderings respectively with pose conditions and propose a new pruning strategy for loose clothing. Our approach can also support custom clothing templates as input. Benefiting from our design, the synthetic 3D garment can be easily applied to virtual try-on and support physically accurate animation. Extensive experiments showcase our method's superior and competitive performance. Our project page is at https://ggxxii.github.io/clothedreamer.

6/26/2024

cs.CV

✨

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal

Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (e.g., flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (e.g., colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution.

4/1/2024

cs.CV cs.GR cs.LG

3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, Siyu Tang

We introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS). Existing methods based on neural radiance fields (NeRFs) achieve high-quality novel-view/novel-pose image synthesis but often require days of training, and are extremely slow at inference time. Recently, the community has explored fast grid structures for efficient training of clothed avatars. Albeit being extremely fast at training, these methods can barely achieve an interactive rendering frame rate with around 15 FPS. In this paper, we use 3D Gaussian Splatting and learn a non-rigid deformation network to reconstruct animatable clothed human avatars that can be trained within 30 minutes and rendered at real-time frame rates (50+ FPS). Given the explicit nature of our representation, we further introduce as-isometric-as-possible regularizations on both the Gaussian mean vectors and the covariance matrices, enhancing the generalization of our model on highly articulated unseen poses. Experimental results show that our method achieves comparable and even better performance compared to state-of-the-art approaches on animatable avatar creation from a monocular input, while being 400x and 250x faster in training and inference, respectively.

4/5/2024

cs.CV