EASI-Tex: Edge-Aware Mesh Texturing from Single Image

2405.17393

Published 5/28/2024 by Sai Raj Kishore Perla, Yizhi Wang, Ali Mahdavi-Amiri, Hao Zhang

🖼️

Abstract

We present a novel approach for single-image mesh texturing, which employs a diffusion model with judicious conditioning to seamlessly transfer an object's texture from a single RGB image to a given 3D mesh object. We do not assume that the two objects belong to the same category, and even if they do, there can be significant discrepancies in their geometry and part proportions. Our method aims to rectify the discrepancies by conditioning a pre-trained Stable Diffusion generator with edges describing the mesh through ControlNet, and features extracted from the input image using IP-Adapter to generate textures that respect the underlying geometry of the mesh and the input texture without any optimization or training. We also introduce Image Inversion, a novel technique to quickly personalize the diffusion model for a single concept using a single image, for cases where the pre-trained IP-Adapter falls short in capturing all the details from the input image faithfully. Experimental results demonstrate the efficiency and effectiveness of our edge-aware single-image mesh texturing approach, coined EASI-Tex, in preserving the details of the input texture on diverse 3D objects, while respecting their geometry.

Create account to get full access

Overview

The paper presents a novel approach for transferring textures from a single RGB image to a 3D mesh object.
The method uses a diffusion model with "judicious conditioning" to seamlessly apply the texture while respecting the underlying geometry of the mesh.
It does not assume the mesh and image are from the same object category, and can handle significant discrepancies in geometry and proportions.
The technique employs ControlNet to condition the diffusion model with mesh edges, and IP-Adapter to extract features from the input image.
It also introduces "Image Inversion" to quickly personalize the diffusion model for a single concept using a single image.

Plain English Explanation

The researchers have developed a new way to take the texture from a single 2D image and apply it to a 3D mesh object. This is useful for things like creating custom 3D models or enhancing existing ones.

The key innovation is that their method can handle cases where the 3D mesh and 2D image are quite different - for example, the mesh might be of a different object category or have a very different shape than what's shown in the image. [link to EUCLIDReamer paper]

To do this, the approach uses a type of AI model called a "diffusion model", which is trained to generate new images. The researchers condition this diffusion model in two ways:

They provide the model with the edges, or outlines, of the 3D mesh using a technique called ControlNet. This helps the model understand the underlying geometry it needs to respect.
They extract visual features from the input 2D image using something called an "IP-Adapter". This allows the model to capture the important details of the texture that should be transferred.

The researchers also introduce a novel "Image Inversion" technique. This lets them quickly customize the diffusion model for a specific visual concept using just a single example image. This is helpful when the pre-trained IP-Adapter doesn't fully capture all the details from the input.

Overall, this new texture transfer method, which the researchers call "EASI-Tex", is able to faithfully apply the details of a 2D texture to a 3D mesh while respecting the mesh's underlying shape and structure. [link to Infinite-Texture and TexSliders papers]

Technical Explanation

The core of the researchers' approach is a diffusion model that is carefully conditioned to transfer textures from a single RGB image to a 3D mesh object. Diffusion models are a type of generative AI that learn to add noise to images and then remove it, allowing them to synthesize new images.

To condition the diffusion model for the texture transfer task, the researchers employ two key components:

ControlNet: This module provides the diffusion model with the edges, or outlines, of the target 3D mesh. This helps the model understand the underlying geometry it needs to respect when generating the texture.
IP-Adapter: This extracts visual features from the input 2D image. This allows the diffusion model to capture the important details of the texture that should be transferred to the mesh.

Importantly, the researchers do not assume the mesh and image are from the same object category, or that their geometries and proportions perfectly align. The ControlNet and IP-Adapter components help the diffusion model overcome these discrepancies.

The researchers also introduce a novel "Image Inversion" technique. This allows them to quickly personalize the diffusion model for a single visual concept using just a single example image. This is helpful when the pre-trained IP-Adapter does not fully capture all the details from the input.

Experimental results demonstrate that this edge-aware single-image mesh texturing approach, called "EASI-Tex", is effective at preserving the details of the input texture while respecting the geometry of the 3D mesh. [link to Enhancing Text-to-Image Editing via Hybrid paper]

Critical Analysis

The researchers present a compelling approach for transferring textures from 2D images to 3D mesh objects. A key strength is the ability to handle significant discrepancies between the image and mesh, which expands the practical applicability of the technique.

However, the paper does not provide much discussion of the limitations or potential issues with the method. For example, it's unclear how well EASI-Tex would perform on highly complex or irregular mesh geometries, or how it would handle challenging texture patterns like fine details or repetitive structures.

Additionally, the researchers mention that the Image Inversion technique is introduced to address shortcomings of the pre-trained IP-Adapter, but they don't provide much insight into the specific limitations of the adapter or the tradeoffs involved in using Image Inversion.

Further research could explore the boundaries of EASI-Tex's capabilities, such as the types of meshes and textures it can handle, as well as comparisons to other texture transfer approaches like [link to EUCLIDReamer paper] or [link to Infinite-Texture paper]. Exploring ways to make the method more robust or efficient would also be valuable.

Overall, the paper presents an interesting and potentially impactful innovation in 3D texture transfer, but more thorough analysis of the method's strengths, weaknesses, and areas for improvement would strengthen the research.

Conclusion

The researchers have developed a novel approach for transferring textures from a single 2D image to a 3D mesh object. Their "EASI-Tex" method uses a diffusion model with carefully designed conditioning to seamlessly apply the texture while respecting the underlying geometry of the mesh.

The key innovations are the use of ControlNet to provide the diffusion model with mesh edges, and IP-Adapter to extract visual features from the input image. This allows the technique to handle significant discrepancies between the image and mesh, expanding its practical applications.

The researchers also introduce an "Image Inversion" technique to quickly personalize the diffusion model for a single visual concept. This helps address limitations of the pre-trained IP-Adapter.

Overall, EASI-Tex demonstrates the potential of diffusion models for 3D texture transfer, with the ability to preserve intricate details while adapting to diverse mesh geometries. Further research could explore the boundaries of the method's capabilities and ways to make it even more robust and efficient.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Single Mesh Diffusion Models with Field Latents for Texture Generation

Thomas W. Mitchel, Carlos Esteves, Ameesh Makadia

We introduce a framework for intrinsic latent diffusion models operating directly on the surfaces of 3D shapes, with the goal of synthesizing high-quality textures. Our approach is underpinned by two contributions: field latents, a latent representation encoding textures as discrete vector fields on the mesh vertices, and field latent diffusion models, which learn to denoise a diffusion process in the learned latent space on the surface. We consider a single-textured-mesh paradigm, where our models are trained to generate variations of a given texture on a mesh. We show the synthesized textures are of superior fidelity compared those from existing single-textured-mesh generative models. Our models can also be adapted for user-controlled editing tasks such as inpainting and label-guided generation. The efficacy of our approach is due in part to the equivariance of our proposed framework under isometries, allowing our models to seamlessly reproduce details across locally similar regions and opening the door to a notion of generative texture transfer.

5/30/2024

cs.CV cs.GR cs.LG

TexPainter: Generative Mesh Texturing with Multi-view Consistency

Hongkun Zhang, Zherong Pan, Congyi Zhang, Lifeng Zhu, Xifeng Gao

The recent success of pre-trained diffusion models unlocks the possibility of the automatic generation of textures for arbitrary 3D meshes in the wild. However, these models are trained in the screen space, while converting them to a multi-view consistent texture image poses a major obstacle to the output quality. In this paper, we propose a novel method to enforce multi-view consistency. Our method is based on the observation that latent space in a pre-trained diffusion model is noised separately for each camera view, making it difficult to achieve multi-view consistency by directly manipulating the latent codes. Based on the celebrated Denoising Diffusion Implicit Models (DDIM) scheme, we propose to use an optimization-based color-fusion to enforce consistency and indirectly modify the latent codes by gradient back-propagation. Our method further relaxes the sequential dependency assumption among the camera views. By evaluating on a series of general 3D models, we find our simple approach improves consistency and overall quality of the generated textures as compared to competing state-of-the-arts. Our implementation is available at: https://github.com/Quantuman134/TexPainter

6/28/2024

cs.CV cs.GR

EucliDreamer: Fast and High-Quality Texturing for 3D Models with Depth-Conditioned Stable Diffusion

Cindy Le, Congrui Hetang, Chendi Lin, Ang Cao, Yihui He

We present EucliDreamer, a simple and effective method to generate textures for 3D models given text prompts and meshes. The texture is parametrized as an implicit function on the 3D surface, which is optimized with the Score Distillation Sampling (SDS) process and differentiable rendering. To generate high-quality textures, we leverage a depth-conditioned Stable Diffusion model guided by the depth image rendered from the mesh. We test our approach on 3D models in Objaverse and conducted a user study, which shows its superior quality compared to existing texturing methods like Text2Tex. In addition, our method converges 2 times faster than DreamFusion. Through text prompting, textures of diverse art styles can be produced. We hope Euclidreamer proides a viable solution to automate a labor-intensive stage in 3D content creation.

4/17/2024

cs.CV

Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis

Yifan Wang, Aleksander Holynski, Brian L. Curless, Steven M. Seitz

We present Infinite Texture, a method for generating arbitrarily large texture images from a text prompt. Our approach fine-tunes a diffusion model on a single texture, and learns to embed that statistical distribution in the output domain of the model. We seed this fine-tuning process with a sample texture patch, which can be optionally generated from a text-to-image model like DALL-E 2. At generation time, our fine-tuned diffusion model is used through a score aggregation strategy to generate output texture images of arbitrary resolution on a single GPU. We compare synthesized textures from our method to existing work in patch-based and deep learning texture synthesis methods. We also showcase two applications of our generated textures in 3D rendering and texture transfer.

5/15/2024

cs.CV