LayGA: Layered Gaussian Avatars for Animatable Clothing Transfer

2405.07319

Published 5/14/2024 by Siyou Lin, Zhe Li, Zhaoqi Su, Zerong Zheng, Hongwen Zhang, Yebin Liu

🔄

Abstract

Animatable clothing transfer, aiming at dressing and animating garments across characters, is a challenging problem. Most human avatar works entangle the representations of the human body and clothing together, which leads to difficulties for virtual try-on across identities. What's worse, the entangled representations usually fail to exactly track the sliding motion of garments. To overcome these limitations, we present Layered Gaussian Avatars (LayGA), a new representation that formulates body and clothing as two separate layers for photorealistic animatable clothing transfer from multi-view videos. Our representation is built upon the Gaussian map-based avatar for its excellent representation power of garment details. However, the Gaussian map produces unstructured 3D Gaussians distributed around the actual surface. The absence of a smooth explicit surface raises challenges in accurate garment tracking and collision handling between body and garments. Therefore, we propose two-stage training involving single-layer reconstruction and multi-layer fitting. In the single-layer reconstruction stage, we propose a series of geometric constraints to reconstruct smooth surfaces and simultaneously obtain the segmentation between body and clothing. Next, in the multi-layer fitting stage, we train two separate models to represent body and clothing and utilize the reconstructed clothing geometries as 3D supervision for more accurate garment tracking. Furthermore, we propose geometry and rendering layers for both high-quality geometric reconstruction and high-fidelity rendering. Overall, the proposed LayGA realizes photorealistic animations and virtual try-on, and outperforms other baseline methods. Our project page is https://jsnln.github.io/layga/index.html.

Create account to get full access

Overview

Animatable clothing transfer is a challenging problem in virtual character creation
Most avatar systems entangle the representations of the human body and clothing, making it difficult to transfer garments across characters
Existing methods also struggle to accurately track the sliding motion of garments
This paper introduces Layered Gaussian Avatars (LayGA), a new representation that models body and clothing as separate layers for photorealistic animatable clothing transfer

Plain English Explanation

Animatable clothing transfer refers to the ability to take a garment from one virtual character and put it on a different character, while also being able to animate the garment realistically as the character moves. This is a difficult problem because most existing avatar systems combine the representations of the human body and the clothing in a way that makes it hard to transfer the garment to a new character. Additionally, these entangled representations often fail to accurately track how the garment would slide and move as the character animates.

To overcome these limitations, the researchers present a new representation called Layered Gaussian Avatars (LayGA). This approach models the body and clothing as two separate layers, which allows for more flexibility in transferring garments between characters. The representation is based on Gaussian maps, which are good at capturing detailed garment geometry. However, Gaussian maps produce unstructured 3D shapes, which can make it challenging to accurately track garment movement and handle collisions between the body and clothing.

To address these issues, the LayGA approach uses a two-stage training process. First, it reconstructs smooth surfaces for the body and clothing layers separately, and also determines the segmentation between them. Then, in the second stage, it trains two separate models to represent the body and clothing, using the reconstructed clothing geometry as 3D supervision to enable more accurate garment tracking.

Additionally, LayGA includes geometric and rendering layers to enable high-quality reconstruction and rendering of the virtual characters and their clothing. Overall, this approach allows for more photorealistic animations and improved virtual try-on capabilities compared to previous methods.

Technical Explanation

The key technical elements of the LayGA approach are:

Separation of Body and Clothing Representations: Instead of entangling the representations of the human body and clothing, LayGA models them as two separate layers. This allows for more flexibility in transferring garments across characters.
Two-Stage Training: The first stage involves reconstructing smooth surfaces for the body and clothing layers, as well as determining the segmentation between them. The second stage trains separate models for the body and clothing, using the reconstructed clothing geometry as 3D supervision to enable more accurate garment tracking.
Geometric and Rendering Layers: LayGA includes specialized layers for high-quality geometric reconstruction and rendering, which are important for achieving photorealistic animations and virtual try-on capabilities.

The researchers evaluate LayGA on various tasks, including garment transfer, animation, and virtual try-on. The results show that LayGA outperforms baseline methods in terms of both geometric accuracy and rendering quality.

Critical Analysis

The LayGA approach addresses some important limitations of existing avatar systems, such as the difficulty of transferring garments across characters and the inaccurate tracking of garment motion. By separating the representations of the body and clothing, the method provides more flexibility and control over the virtual characters.

However, the paper does not discuss the computational complexity or inference speed of the LayGA model, which could be an important consideration for real-time applications. Additionally, the paper does not mention how well the method generalizes to a wide range of garment types and styles, or how it might handle more complex interactions between the body and clothing, such as wrinkles or deformations.

Further research could explore ways to make the LayGA approach more efficient and robust, potentially by incorporating physics-based simulation or implicit surface representations to improve garment tracking and collision handling. Combining LayGA with efficient avatar modeling techniques could also be an interesting direction for future work.

Conclusion

The LayGA approach presented in this paper represents a significant step forward in the field of animatable clothing transfer for virtual characters. By separating the representations of the body and clothing, the method enables more flexibility and accuracy in garment transfer and animation. The two-stage training process and specialized geometric and rendering layers contribute to the method's ability to produce photorealistic results.

While the paper does not address all the potential challenges and limitations of the approach, it demonstrates the value of this new representation and lays the groundwork for further advancements in virtual character creation and animation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🛸

LAGA: Layered 3D Avatar Generation and Customization via Gaussian Splatting

Jia Gong, Shenyu Ji, Lin Geng Foo, Kang Chen, Hossein Rahmani, Jun Liu

Creating and customizing a 3D clothed avatar from textual descriptions is a critical and challenging task. Traditional methods often treat the human body and clothing as inseparable, limiting users' ability to freely mix and match garments. In response to this limitation, we present LAyered Gaussian Avatar (LAGA), a carefully designed framework enabling the creation of high-fidelity decomposable avatars with diverse garments. By decoupling garments from avatar, our framework empowers users to conviniently edit avatars at the garment level. Our approach begins by modeling the avatar using a set of Gaussian points organized in a layered structure, where each layer corresponds to a specific garment or the human body itself. To generate high-quality garments for each layer, we introduce a coarse-to-fine strategy for diverse garment generation and a novel dual-SDS loss function to maintain coherence between the generated garments and avatar components, including the human body and other garments. Moreover, we introduce three regularization losses to guide the movement of Gaussians for garment transfer, allowing garments to be freely transferred to various avatars. Extensive experimentation demonstrates that our approach surpasses existing methods in the generation of 3D clothed humans.

5/22/2024

cs.GR cs.CV

Animatable and Relightable Gaussians for High-fidelity Human Avatar Modeling

Zhe Li, Yipengjing Sun, Zerong Zheng, Lizhen Wang, Shengping Zhang, Yebin Liu

Modeling animatable human avatars from RGB videos is a long-standing and challenging problem. Recent works usually adopt MLP-based neural radiance fields (NeRF) to represent 3D humans, but it remains difficult for pure MLPs to regress pose-dependent garment details. To this end, we introduce Animatable Gaussians, a new avatar representation that leverages powerful 2D CNNs and 3D Gaussian splatting to create high-fidelity avatars. To associate 3D Gaussians with the animatable avatar, we learn a parametric template from the input videos, and then parameterize the template on two front & back canonical Gaussian maps where each pixel represents a 3D Gaussian. The learned template is adaptive to the wearing garments for modeling looser clothes like dresses. Such template-guided 2D parameterization enables us to employ a powerful StyleGAN-based CNN to learn the pose-dependent Gaussian maps for modeling detailed dynamic appearances. Furthermore, we introduce a pose projection strategy for better generalization given novel poses. To tackle the realistic relighting of animatable avatars, we introduce physically-based rendering into the avatar representation for decomposing avatar materials and environment illumination. Overall, our method can create lifelike avatars with dynamic, realistic, generalized and relightable appearances. Experiments show that our method outperforms other state-of-the-art approaches.

5/28/2024

cs.CV cs.GR

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Yang Zheng, Qingqing Zhao, Guandao Yang, Wang Yifan, Donglai Xiang, Florian Dubost, Dmitry Lagun, Thabo Beeler, Federico Tombari, Leonidas Guibas, Gordon Wetzstein

Modeling and rendering photorealistic avatars is of crucial importance in many applications. Existing methods that build a 3D avatar from visual observations, however, struggle to reconstruct clothed humans. We introduce PhysAvatar, a novel framework that combines inverse rendering with inverse physics to automatically estimate the shape and appearance of a human from multi-view video data along with the physical parameters of the fabric of their clothes. For this purpose, we adopt a mesh-aligned 4D Gaussian technique for spatio-temporal mesh tracking as well as a physically based inverse renderer to estimate the intrinsic material properties. PhysAvatar integrates a physics simulator to estimate the physical parameters of the garments using gradient-based optimization in a principled manner. These novel capabilities enable PhysAvatar to create high-quality novel-view renderings of avatars dressed in loose-fitting clothes under motions and lighting conditions not seen in the training data. This marks a significant advancement towards modeling photorealistic digital humans using physically based inverse rendering with physics in the loop. Our project website is at: https://qingqing-zhao.github.io/PhysAvatar

4/10/2024

cs.GR cs.CV

✨

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal

Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (e.g., flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (e.g., colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution.

4/1/2024

cs.CV cs.GR cs.LG