GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting

Read original: arXiv:2405.07472 - Published 5/24/2024 by Haodong Chen, Yongle Huang, Haojian Huang, Xiangsheng Ge, Dian Shao
Total Score

0

🖼️

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • The growing popularity of e-commerce has highlighted the importance of Virtual Try-On (VTON) technology.
  • Previous VTON research has primarily focused on 2D environments and required extensive training data.
  • 3D VTON research has mainly addressed the compatibility between garment and body shape, which has already been extensively covered in 2D VTON.
  • Advances in 3D scene editing have led to the adaptation of 2D diffusion models for 3D editing using multiple viewpoints.

Plain English Explanation

The rise of online shopping has made Virtual Try-On (VTON) technology increasingly important. Previous VTON research has mostly dealt with 2D scenarios and required a lot of training data. The 3D VTON research that has been done has focused on how well the clothes fit the person's body shape, which has already been thoroughly examined in 2D VTON.

However, improvements in 3D scene editing have allowed researchers to take a 2D diffusion model and adapt it for 3D editing by using multiple viewpoints. In this new study, the researchers propose a 3D VTON system called GaussianVTON that combines Gaussian Splatting (GS) editing with 2D VTON.

To make the transition from 2D to 3D VTON more seamless, the researchers use only images as editing prompts for the 3D editing, instead of requiring more complex inputs. They also develop a three-stage refinement strategy to address issues like blurry faces, inaccurate garments, and lower-quality viewpoints during the editing process.

Additionally, the researchers introduce a new editing strategy called Edit Recall Reconstruction (ERR) to overcome the limitations of previous editing strategies when it comes to making complex geometric changes.

Technical Explanation

The researchers propose GaussianVTON, a 3D VTON pipeline that integrates Gaussian Splatting (GS) editing with 2D VTON. To facilitate the transition from 2D to 3D VTON, they use only images as editing prompts for the 3D editing, instead of the more complex inputs typically required.

To address issues like face blurring, garment inaccuracy, and degraded viewpoint quality during the editing process, the researchers devised a three-stage refinement strategy to gradually mitigate these problems.

Furthermore, they introduced a new editing strategy called Edit Recall Reconstruction (ERR) to tackle the limitations of previous editing strategies in handling complex geometric changes.

The researchers conducted comprehensive experiments to demonstrate the superiority of GaussianVTON, offering a novel perspective on 3D VTON and establishing a new starting point for image-prompting 3D scene editing.

Critical Analysis

The paper presents a promising approach to 3D Virtual Try-On (VTON) by integrating Gaussian Splatting editing with 2D VTON. The use of only image prompts for 3D editing is a novel contribution, as it simplifies the input requirements compared to previous 3D VTON methods.

The three-stage refinement strategy and the introduction of the Edit Recall Reconstruction (ERR) editing technique are also notable advancements that help address common issues in 3D editing, such as face blurring, garment inaccuracy, and degraded viewpoint quality.

However, the paper does not discuss the computational complexity or runtime performance of the GaussianVTON system, which could be an important consideration for real-world applications. Additionally, the evaluation of the system is primarily focused on visual quality, and the practical implications for e-commerce or virtual fashion use cases are not thoroughly explored.

Further research could investigate the scalability, robustness, and real-world deployment of the GaussianVTON system, as well as its potential integration with other 3D modeling or virtual fitting technologies. Exploring user experiences and the broader impact on the fashion industry would also be valuable.

Conclusion

The GaussianVTON system proposed in this research represents a significant advancement in 3D Virtual Try-On (VTON) technology. By integrating Gaussian Splatting editing with 2D VTON and using only image prompts for 3D editing, the researchers have developed a more accessible and user-friendly approach to 3D scene editing.

The novel editing strategies and refinement techniques introduced in this work help address longstanding challenges in 3D VTON, such as garment accuracy, viewpoint quality, and face blurring. These improvements could have far-reaching implications for the e-commerce and virtual fashion industries, potentially enhancing the online shopping experience and enabling more seamless virtual try-on of clothing and accessories.

While further research is needed to fully explore the practical applications and scalability of GaussianVTON, this study establishes a promising new direction for 3D VTON and image-based 3D scene editing more broadly.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

Total Score

0

GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting

Haodong Chen, Yongle Huang, Haojian Huang, Xiangsheng Ge, Dian Shao

The increasing prominence of e-commerce has underscored the importance of Virtual Try-On (VTON). However, previous studies predominantly focus on the 2D realm and rely heavily on extensive data for training. Research on 3D VTON primarily centers on garment-body shape compatibility, a topic extensively covered in 2D VTON. Thanks to advances in 3D scene editing, a 2D diffusion model has now been adapted for 3D editing via multi-viewpoint editing. In this work, we propose GaussianVTON, an innovative 3D VTON pipeline integrating Gaussian Splatting (GS) editing with 2D VTON. To facilitate a seamless transition from 2D to 3D VTON, we propose, for the first time, the use of only images as editing prompts for 3D editing. To further address issues, e.g., face blurring, garment inaccuracy, and degraded viewpoint quality during editing, we devise a three-stage refinement strategy to gradually mitigate potential issues. Furthermore, we introduce a new editing strategy termed Edit Recall Reconstruction (ERR) to tackle the limitations of previous editing strategies in leading to complex geometric changes. Our comprehensive experiments demonstrate the superiority of GaussianVTON, offering a novel perspective on 3D VTON while also establishing a novel starting point for image-prompting 3D scene editing.

Read more

5/24/2024

DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models
Total Score

0

DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models

Zhenyu Xie, Haoye Dong, Yufei Gao, Zehua Ma, Xiaodan Liang

Image-based 3D Virtual Try-ON (VTON) aims to sculpt the 3D human according to person and clothes images, which is data-efficient (i.e., getting rid of expensive 3D data) but challenging. Recent text-to-3D methods achieve remarkable improvement in high-fidelity 3D human generation, demonstrating its potential for 3D virtual try-on. Inspired by the impressive success of personalized diffusion models (e.g., Dreambooth and LoRA) for 2D VTON, it is straightforward to achieve 3D VTON by integrating the personalization technique into the diffusion-based text-to-3D framework. However, employing the personalized module in a pre-trained diffusion model (e.g., StableDiffusion (SD)) would degrade the model's capability for multi-view or multi-domain synthesis, which is detrimental to the geometry and texture optimization guided by Score Distillation Sampling (SDS) loss. In this work, we propose a novel customizing 3D human try-on model, named textbf{DreamVTON}, to separately optimize the geometry and texture of the 3D human. Specifically, a personalized SD with multi-concept LoRA is proposed to provide the generative prior about the specific person and clothes, while a Densepose-guided ControlNet is exploited to guarantee consistent prior about body pose across various camera views. Besides, to avoid the inconsistent multi-view priors from the personalized SD dominating the optimization, DreamVTON introduces a template-based optimization mechanism, which employs mask templates for geometry shape learning and normal/RGB templates for geometry/texture details learning. Furthermore, for the geometry optimization phase, DreamVTON integrates a normal-style LoRA into personalized SD to enhance normal map generative prior, facilitating smooth geometry modeling.

Read more

7/24/2024

🏅

Total Score

0

MV-VTON: Multi-View Virtual Try-On with Diffusion Models

Haoyu Wang, Zhilu Zhang, Donglin Di, Shiliang Zhang, Wangmeng Zuo

The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given clothing. However, existing methods solely focus on the frontal try-on using the frontal clothing. When the views of the clothing and person are significantly inconsistent, particularly when the person's view is non-frontal, the results are unsatisfactory. To address this challenge, we introduce Multi-View Virtual Try-ON (MV-VTON), which aims to reconstruct the dressing results from multiple views using the given clothes. Given that single-view clothes provide insufficient information for MV-VTON, we instead employ two images, i.e., the frontal and back views of the clothing, to encompass the complete view as much as possible. Moreover, we adopt diffusion models that have demonstrated superior abilities to perform our MV-VTON. In particular, we propose a view-adaptive selection method where hard-selection and soft-selection are applied to the global and local clothing feature extraction, respectively. This ensures that the clothing features are roughly fit to the person's view. Subsequently, we suggest joint attention blocks to align and fuse clothing features with person features. Additionally, we collect a MV-VTON dataset MVG, in which each person has multiple photos with diverse views and poses. Experiments show that the proposed method not only achieves state-of-the-art results on MV-VTON task using our MVG dataset, but also has superiority on frontal-view virtual try-on task using VITON-HD and DressCode datasets. Codes and datasets are publicly released at https://github.com/hywang2002/MV-VTON .

Read more

9/5/2024

VTON-IT: Virtual Try-On using Image Translation
Total Score

0

VTON-IT: Virtual Try-On using Image Translation

Santosh Adhikari, Bishnu Bhusal, Prashant Ghimire, Anil Shrestha

Virtual Try-On (trying clothes virtually) is a promising application of the Generative Adversarial Network (GAN). However, it is an arduous task to transfer the desired clothing item onto the corresponding regions of a human body because of varying body size, pose, and occlusions like hair and overlapped clothes. In this paper, we try to produce photo-realistic translated images through semantic segmentation and a generative adversarial architecture-based image translation network. We present a novel image-based Virtual Try-On application VTON-IT that takes an RGB image, segments desired body part, and overlays target cloth over the segmented body region. Most state-of-the-art GAN-based Virtual Try-On applications produce unaligned pixelated synthesis images on real-life test images. However, our approach generates high-resolution natural images with detailed textures on such variant images.

Read more

5/8/2024