TELA: Text to Layer-wise 3D Clothed Human Generation

Read original: arXiv:2404.16748 - Published 4/26/2024 by Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, Bo Dai

🛸

Overview

Addresses the task of generating 3D clothed human models from textual descriptions
Previous works generate the whole human body and clothes as a single model, making it difficult to edit the clothes or fine-tune the generation process
Proposes a layer-wise clothed human representation and progressive optimization strategy to produce clothing-disentangled 3D human models with more control over the generation process

Plain English Explanation

The paper focuses on the challenge of creating 3D models of clothed human figures from written descriptions. Previous methods have treated the human body and clothes as a single, unified model, which makes it hard to edit the clothing or precisely control the generation process.

To solve this, the researchers developed a new approach that represents the human body and clothing as separate layers. This layer-based representation, combined with a step-by-step optimization strategy, allows the system to generate high-quality 3D models where the clothing is disentangled from the underlying body. This provides more flexibility to edit the clothes or fine-tune different aspects of the model.

The key innovation is a "stratified compositional rendering" method that fuses the multiple layers (body, clothing, etc.) into a single 3D model. The researchers also introduced a new loss function to help ensure the clothing is properly separated from the body during the generation process.

Overall, this technique advances the state-of-the-art in 3D clothed human generation, and also enables new applications like virtual clothing try-on. The layer-based representation and progressive optimization provide more control and editability compared to previous holistic approaches.

Technical Explanation

The paper proposes a layer-wise clothed human representation and progressive optimization strategy to generate 3D clothed human models from textual descriptions. Previous methods have struggled with this task because they encode the human body and clothes as a single holistic model, making it difficult to edit the clothing or fine-tune the generation process.

To address this, the researchers developed a multi-layer representation where the minimal-clothed human body and clothing layers are generated progressively. During clothing generation, a novel "stratified compositional rendering" technique is used to fuse the multiple layers into a single 3D model. Additionally, a new loss function is introduced to help decouple the clothing model from the underlying human body.

The progressive optimization and layer-based representation provide more control over the generation process compared to previous single-stage approaches. This allows for higher-quality disentanglement of the clothing from the body, enabling applications like virtual clothing try-on.

Extensive experiments demonstrate that this method achieves state-of-the-art performance on 3D clothed human generation tasks, while also supporting flexible clothing editing capabilities.

Critical Analysis

The paper presents an innovative approach to 3D clothed human generation that addresses some key limitations of prior work. The layer-wise representation and progressive optimization strategy provide more control and editability over the generated models.

However, the paper does not provide a detailed analysis of the computational complexity or runtime performance of the proposed method. The authors also do not discuss potential challenges in scaling the approach to handle a wider variety of clothing styles and body shapes.

Additionally, while the results demonstrate strong performance on standard benchmarks, the paper could benefit from a more thorough discussion of failure cases, limitations, and potential avenues for future research. For example, it would be interesting to understand how well the method generalizes to more diverse or unconventional clothing styles.

Overall, this research represents a significant advancement in the field of 3D clothed human generation, but there are still opportunities to further improve the robustness, efficiency, and real-world applicability of the technique.

Conclusion

This paper presents a novel layer-based representation and progressive optimization approach for generating high-quality 3D clothed human models from textual descriptions. By disentangling the clothing from the underlying body, the proposed method provides more control and editability compared to previous holistic generation techniques.

The key innovations include a stratified compositional rendering method and a new loss function to help separate the clothing from the body during the generation process. Extensive experiments demonstrate state-of-the-art performance on 3D clothed human generation tasks, as well as support for flexible clothing editing applications like virtual try-on.

While the paper does not fully address all potential limitations, this research represents an important step forward in the field of 3D clothed human generation. The layer-based representation and progressive optimization strategy open up new possibilities for creating more realistic, customizable, and editable 3D human models from textual inputs.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🛸

TELA: Text to Layer-wise 3D Clothed Human Generation

Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, Bo Dai

This paper addresses the task of 3D clothed human generation from textural descriptions. Previous works usually encode the human body and clothes as a holistic model and generate the whole model in a single-stage optimization, which makes them struggle for clothing editing and meanwhile lose fine-grained control over the whole generation process. To solve this, we propose a layer-wise clothed human representation combined with a progressive optimization strategy, which produces clothing-disentangled 3D human models while providing control capacity for the generation process. The basic idea is progressively generating a minimal-clothed human body and layer-wise clothes. During clothing generation, a novel stratified compositional rendering method is proposed to fuse multi-layer human models, and a new loss function is utilized to help decouple the clothing model from the human body. The proposed method achieves high-quality disentanglement, which thereby provides an effective way for 3D garment generation. Extensive experiments demonstrate that our approach achieves state-of-the-art 3D clothed human generation while also supporting cloth editing applications such as virtual try-on. Project page: http://jtdong.com/tela_layer/

4/26/2024

HumanCoser: Layered 3D Human Generation via Semantic-Aware Diffusion Model

Yi Wang, Jian Ma, Ruizhi Shao, Qiao Feng, Yu-kun Lai, Kun Li

This paper aims to generate physically-layered 3D humans from text prompts. Existing methods either generate 3D clothed humans as a whole or support only tight and simple clothing generation, which limits their applications to virtual try-on and part-level editing. To achieve physically-layered 3D human generation with reusable and complex clothing, we propose a novel layer-wise dressed human representation based on a physically-decoupled diffusion model. Specifically, to achieve layer-wise clothing generation, we propose a dual-representation decoupling framework for generating clothing decoupled from the human body, in conjunction with an innovative multi-layer fusion volume rendering method. To match the clothing with different body shapes, we propose an SMPL-driven implicit field deformation network that enables the free transfer and reuse of clothing. Extensive experiments demonstrate that our approach not only achieves state-of-the-art layered 3D human generation with complex clothing but also supports virtual try-on and layered human animation.

8/22/2024

Layered 3D Human Generation via Semantic-Aware Diffusion Model

Yi Wang, Jian Ma, Ruizhi Shao, Qiao Feng, Yu-Kun Lai, Yebin Liu, Kun Li

The generation of 3D clothed humans has attracted increasing attention in recent years. However, existing work cannot generate layered high-quality 3D humans with consistent body structures. As a result, these methods are unable to arbitrarily and separately change and edit the body and clothing of the human. In this paper, we propose a text-driven layered 3D human generation framework based on a novel physically-decoupled semantic-aware diffusion model. To keep the generated clothing consistent with the target text, we propose a semantic-confidence strategy for clothing that can eliminate the non-clothing content generated by the model. To match the clothing with different body shapes, we propose a SMPL-driven implicit field deformation network that enables the free transfer and reuse of clothing. Besides, we introduce uniform shape priors based on the SMPL model for body and clothing, respectively, which generates more diverse 3D content without being constrained by specific templates. The experimental results demonstrate that the proposed method not only generates 3D humans with consistent body structures but also allows free editing in a layered manner. The source code will be made public.

7/23/2024

WordRobe: Text-Guided Generation of Textured 3D Garments

Astitva Srivastava, Pranav Manu, Amit Raj, Varun Jampani, Avinash Sharma

In this paper, we tackle a new and challenging problem of text-driven generation of 3D garments with high-quality textures. We propose WordRobe, a novel framework for the generation of unposed & textured 3D garment meshes from user-friendly text prompts. We achieve this by first learning a latent representation of 3D garments using a novel coarse-to-fine training strategy and a loss for latent disentanglement, promoting better latent interpolation. Subsequently, we align the garment latent space to the CLIP embedding space in a weakly supervised manner, enabling text-driven 3D garment generation and editing. For appearance modeling, we leverage the zero-shot generation capability of ControlNet to synthesize view-consistent texture maps in a single feed-forward inference step, thereby drastically decreasing the generation time as compared to existing methods. We demonstrate superior performance over current SOTAs for learning 3D garment latent space, garment interpolation, and text-driven texture synthesis, supported by quantitative evaluation and qualitative user study. The unposed 3D garment meshes generated using WordRobe can be directly fed to standard cloth simulation & animation pipelines without any post-processing.

7/16/2024