TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

Read original: arXiv:2408.01291 - Published 8/6/2024 by Dong Huo, Zixin Guo, Xinxin Zuo, Zhihao Shi, Juwei Lu, Peng Dai, Songcen Xu, Li Cheng, Yee-Hong Yang

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

Overview

TexGen is a technique for generating 3D textures from text descriptions
It uses multi-view sampling and resampling to create consistent textures across different viewpoints
The system generates high-quality 3D textures that match the provided text prompt

Plain English Explanation

TexGen is a new method for creating 3D textures based on text descriptions. Rather than manually designing textures, this system can automatically generate 3D textures that match a given text prompt.

The key innovation of TexGen is its use of multi-view sampling and resampling. This means the system renders the 3D texture from multiple viewpoints, and then combines those views to create a consistent final texture. This helps ensure the generated texture looks good from any angle, rather than having distortions or seams when viewed from different perspectives.

Overall, TexGen allows users to describe the desired texture in natural language, and the system will then create a 3D texture that accurately reflects that description. This can save time and effort compared to manually crafting 3D textures by hand.

Technical Explanation

TexGen is a novel approach for generating 3D textures from text descriptions. It consists of a text encoder that maps the input prompt to a latent representation, and a texture generator that uses this latent code to synthesize a 3D texture.

A key contribution is the use of multi-view sampling and resampling. The texture generator renders the 3D texture from multiple viewpoints, and then a resampling module combines these views to create the final output. This helps ensure consistency and coherence across different perspectives.

The paper also introduces several training strategies, including contrastive learning to better align the text and texture representations, and perceptual losses to improve the visual quality of the generated textures.

Extensive experiments demonstrate TexGen's ability to generate high-fidelity 3D textures that closely match the given text prompts, outperforming previous approaches in both quantitative and qualitative evaluations.

Critical Analysis

The paper presents a compelling approach for text-guided 3D texture generation. The multi-view sampling and resampling technique is a clever way to ensure consistency across different viewpoints, which is an important consideration for 3D assets.

That said, the paper does not fully address some potential limitations. For example, the system may struggle with highly complex or abstract text prompts that are difficult to translate into 3D texture representations. Additionally, the training process is computationally intensive, which could limit the practical application of TexGen in real-world scenarios.

Further research could explore ways to improve the system's robustness and efficiency, such as by investigating more efficient neural network architectures or leveraging domain-specific texture priors. Evaluating TexGen's performance on a broader range of text prompts and 3D applications would also help validate its practical usefulness.

Conclusion

TexGen represents an important step forward in the field of text-guided 3D texture generation. By incorporating multi-view sampling and resampling, the system can create high-quality 3D textures that maintain consistency across different viewpoints. This capability has significant implications for 3D content creation, as it can streamline the process of designing and applying textures to 3D models.

While the paper highlights the system's strong performance, further research is needed to address potential limitations and expand its practical applications. Nonetheless, TexGen demonstrates the potential of using language as a powerful interface for 3D texture design, opening up new possibilities for creative expression and 3D asset creation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

Dong Huo, Zixin Guo, Xinxin Zuo, Zhihao Shi, Juwei Lu, Peng Dai, Songcen Xu, Li Cheng, Yee-Hong Yang

Given a 3D mesh, we aim to synthesize 3D textures that correspond to arbitrary textual descriptions. Current methods for generating and assembling textures from sampled views often result in prominent seams or excessive smoothing. To tackle these issues, we present TexGen, a novel multi-view sampling and resampling framework for texture generation leveraging a pre-trained text-to-image diffusion model. For view consistent sampling, first of all we maintain a texture map in RGB space that is parameterized by the denoising step and updated after each sampling step of the diffusion model to progressively reduce the view discrepancy. An attention-guided multi-view sampling strategy is exploited to broadcast the appearance information across views. To preserve texture details, we develop a noise resampling technique that aids in the estimation of noise, generating inputs for subsequent denoising steps, as directed by the text prompt and current texture map. Through an extensive amount of qualitative and quantitative evaluations, we demonstrate that our proposed method produces significantly better texture quality for diverse 3D objects with a high degree of view consistency and rich appearance details, outperforming current state-of-the-art methods. Furthermore, our proposed texture generation technique can also be applied to texture editing while preserving the original identity. More experimental results are available at https://dong-huo.github.io/TexGen/

8/6/2024

TexPainter: Generative Mesh Texturing with Multi-view Consistency

Hongkun Zhang, Zherong Pan, Congyi Zhang, Lifeng Zhu, Xifeng Gao

The recent success of pre-trained diffusion models unlocks the possibility of the automatic generation of textures for arbitrary 3D meshes in the wild. However, these models are trained in the screen space, while converting them to a multi-view consistent texture image poses a major obstacle to the output quality. In this paper, we propose a novel method to enforce multi-view consistency. Our method is based on the observation that latent space in a pre-trained diffusion model is noised separately for each camera view, making it difficult to achieve multi-view consistency by directly manipulating the latent codes. Based on the celebrated Denoising Diffusion Implicit Models (DDIM) scheme, we propose to use an optimization-based color-fusion to enforce consistency and indirectly modify the latent codes by gradient back-propagation. Our method further relaxes the sequential dependency assumption among the camera views. By evaluating on a series of general 3D models, we find our simple approach improves consistency and overall quality of the generated textures as compared to competing state-of-the-arts. Our implementation is available at: https://github.com/Quantuman134/TexPainter

6/28/2024

🌐

Text-to-3D using Gaussian Splatting

Zilong Chen, Feng Wang, Yikai Wang, Huaping Liu

Automatic text-to-3D generation that combines Score Distillation Sampling (SDS) with the optimization of volume rendering has achieved remarkable progress in synthesizing realistic 3D objects. Yet most existing text-to-3D methods by SDS and volume rendering suffer from inaccurate geometry, e.g., the Janus issue, since it is hard to explicitly integrate 3D priors into implicit 3D representations. Besides, it is usually time-consuming for them to generate elaborate 3D models with rich colors. In response, this paper proposes GSGEN, a novel method that adopts Gaussian Splatting, a recent state-of-the-art representation, to text-to-3D generation. GSGEN aims at generating high-quality 3D objects and addressing existing shortcomings by exploiting the explicit nature of Gaussian Splatting that enables the incorporation of 3D prior. Specifically, our method adopts a progressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under 3D point cloud diffusion prior along with the ordinary 2D SDS optimization, ensuring a sensible and 3D-consistent rough shape. Subsequently, the obtained Gaussians undergo an iterative appearance refinement to enrich texture details. In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity. With these designs, our approach can generate 3D assets with delicate details and accurate geometry. Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components. Our code is available at https://github.com/gsgen3d/gsgen

4/3/2024

UV-free Texture Generation with Denoising and Geodesic Heat Diffusions

Simone Foti, Stefanos Zafeiriou, Tolga Birdal

Seams, distortions, wasted UV space, vertex-duplication, and varying resolution over the surface are the most prominent issues of the standard UV-based texturing of meshes. These issues are particularly acute when automatic UV-unwrapping techniques are used. For this reason, instead of generating textures in automatically generated UV-planes like most state-of-the-art methods, we propose to represent textures as coloured point-clouds whose colours are generated by a denoising diffusion probabilistic model constrained to operate on the surface of 3D objects. Our sampling and resolution agnostic generative model heavily relies on heat diffusion over the surface of the meshes for spatial communication between points. To enable processing of arbitrarily sampled point-cloud textures and ensure long-distance texture consistency we introduce a fast re-sampling of the mesh spectral properties used during the heat diffusion and introduce a novel heat-diffusion-based self-attention mechanism. Our code and pre-trained models are available at github.com/simofoti/UV3-TeD.

8/30/2024