MatFusion: A Generative Diffusion Model for SVBRDF Capture

2406.06539

Published 6/12/2024 by Sam Sartor, Pieter Peers

📈

Abstract

We formulate SVBRDF estimation from photographs as a diffusion task. To model the distribution of spatially varying materials, we first train a novel unconditional SVBRDF diffusion backbone model on a large set of 312,165 synthetic spatially varying material exemplars. This SVBRDF diffusion backbone model, named MatFusion, can then serve as a basis for refining a conditional diffusion model to estimate the material properties from a photograph under controlled or uncontrolled lighting. Our backbone MatFusion model is trained using only a loss on the reflectance properties, and therefore refinement can be paired with more expensive rendering methods without the need for backpropagation during training. Because the conditional SVBRDF diffusion models are generative, we can synthesize multiple SVBRDF estimates from the same input photograph from which the user can select the one that best matches the users' expectation. We demonstrate the flexibility of our method by refining different SVBRDF diffusion models conditioned on different types of incident lighting, and show that for a single photograph under colocated flash lighting our method achieves equal or better accuracy than existing SVBRDF estimation methods.

Create account to get full access

Overview

The researchers formulate the problem of estimating spatially varying material properties (SVBRDF) from photographs as a diffusion task.
They train a novel unconditional SVBRDF diffusion backbone model, called MatFusion, on a large dataset of synthetic spatially varying material examples.
The MatFusion model can then be refined into a conditional diffusion model to estimate material properties from photographs, under controlled or uncontrolled lighting.
The conditional SVBRDF diffusion models are generative, allowing for the synthesis of multiple SVBRDF estimates from a single input photograph.
The researchers demonstrate the flexibility of their method by refining different SVBRDF diffusion models for various incident lighting conditions, and show that their approach achieves equal or better accuracy than existing SVBRDF estimation methods.

Plain English Explanation

The researchers have developed a new way to estimate the material properties of objects in photographs. Material properties, such as how light reflects off a surface, can vary across an object, which is known as spatially varying BRDF (SVBRDF).

To model this distribution of spatially varying materials, the researchers first trained a machine learning model called MatFusion on a large dataset of synthetic examples of spatially varying materials. This unconditional model can then be refined into a conditional model that can estimate the material properties from a photograph, even under different lighting conditions.

The key advantage of this approach is that the conditional models are generative, meaning they can produce multiple estimates of the material properties from a single input photograph. This gives the user the flexibility to choose the estimate that best matches their expectations.

The researchers demonstrate that their method can achieve equal or better accuracy than existing SVBRDF estimation techniques, while also being more flexible and adaptable to different lighting conditions.

Technical Explanation

The researchers formulate the problem of SVBRDF estimation from photographs as a diffusion task. They first train a novel unconditional SVBRDF diffusion backbone model, named MatFusion, on a large dataset of 312,165 synthetic spatially varying material exemplars.

This MatFusion model can then serve as a basis for refining a conditional diffusion model to estimate the material properties from a photograph, under controlled or uncontrolled lighting. The backbone MatFusion model is trained using only a loss on the reflectance properties, allowing for refinement with more expensive rendering methods without the need for backpropagation during training.

The conditional SVBRDF diffusion models are generative, enabling the synthesis of multiple SVBRDF estimates from the same input photograph. This allows the user to select the estimate that best matches their expectations.

The researchers demonstrate the flexibility of their method by refining different SVBRDF diffusion models conditioned on various types of incident lighting. They show that for a single photograph under colocated flash lighting, their approach achieves equal or better accuracy than existing SVBRDF estimation methods.

Critical Analysis

The researchers acknowledge that their method relies on a large dataset of synthetic material examples, which may not fully capture the complexity and diversity of real-world materials. Evaluating the performance of their approach on more diverse real-world datasets could provide valuable insights.

Additionally, the researchers do not address the computational efficiency of their diffusion-based approach, which can be a concern for practical applications that require real-time or near-real-time material estimation. Exploring ways to optimize the inference speed of their models could enhance the practicality of their technique.

While the researchers demonstrate the flexibility of their method in handling different lighting conditions, it would be interesting to see how their approach performs in more challenging scenarios, such as complex lighting environments or the presence of occlusions and shadows.

Finally, the researchers could potentially explore ways to incorporate additional information, such as geometric constraints or user input, to further improve the accuracy and reliability of their SVBRDF estimation.

Conclusion

The researchers have presented a novel approach to SVBRDF estimation from photographs, framing the problem as a diffusion task. By training an unconditional SVBRDF diffusion backbone model and refining it into conditional models, they have developed a flexible and generative framework for estimating material properties under various lighting conditions.

The key advantages of their method include the ability to synthesize multiple SVBRDF estimates from a single input photograph, and the potential to achieve equal or better accuracy compared to existing techniques. While the method shows promise, further research is needed to address potential limitations, such as the reliance on synthetic data and the computational efficiency of the diffusion-based approach.

Overall, this work represents an interesting and innovative step forward in the field of SVBRDF estimation, with potential applications in areas such as digital content creation, product design, and virtual reality.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

ReflectanceFusion: Diffusion-based text to SVBRDF Generation

Bowen Xue, Giuseppe Claudio Guarnera, Shuang Zhao, Zahra Montazeri

We introduce Reflectance Diffusion, a new neural text-to-texture model capable of generating high-fidelity SVBRDF maps from textual descriptions. Our method leverages a tandem neural approach, consisting of two modules, to accurately model the distribution of spatially varying reflectance as described by text prompts. Initially, we employ a pre-trained stable diffusion 2 model to generate a latent representation that informs the overall shape of the material and serves as our backbone model. Then, our ReflectanceUNet enables fine-tuning control over the material's physical appearance and generates SVBRDF maps. ReflectanceUNet module is trained on an extensive dataset comprising approximately 200,000 synthetic spatially varying materials. Our generative SVBRDF diffusion model allows for the synthesis of multiple SVBRDF estimates from a single textual input, offering users the possibility to choose the output that best aligns with their requirements. We illustrate our method's versatility by generating SVBRDF maps from a range of textual descriptions, both specific and broad. Our ReflectanceUNet model can integrate optional physical parameters, such as roughness and specularity, enhancing customization. When the backbone module is fixed, the ReflectanceUNet module refines the material, allowing direct edits to its physical attributes. Comparative evaluations demonstrate that ReflectanceFusion achieves better accuracy than existing text-to-material models, such as Text2Mat, while also providing the benefits of editable and relightable SVBRDF maps.

6/24/2024

cs.GR cs.CV

🛸

DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance

Linxuan Xin, Zheng Zhang, Jinfu Wei, Wei Gao, Duan Gao

Prior material creation methods had limitations in producing diverse results mainly because reconstruction-based methods relied on real-world measurements and generation-based methods were trained on relatively small material datasets. To address these challenges, we propose DreamPBR, a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls, providing high controllability and diversity in material generation. Key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs, along with material priors derived from hundreds of PBR material samples. We utilize a novel material Latent Diffusion Model (LDM) to establish the mapping between albedo maps and the corresponding latent space. The latent representation is then decoded into full SVBRDF parameter maps using a rendering-aware PBR decoder. Our method supports tileable generation through convolution with circular padding. Furthermore, we introduce a multi-modal guidance module, which includes pixel-aligned guidance, style image guidance, and 3D shape guidance, to enhance the control capabilities of the material LDM. We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.

7/2/2024

cs.CV cs.GR

DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models

Yuqing Zhang, Yuan Liu, Zhiyu Xie, Lei Yang, Zhongyuan Liu, Mengzhou Yang, Runze Zhang, Qilong Kou, Cheng Lin, Wenping Wang, Xiaogang Jin

2D diffusion model, which often contains unwanted baked-in shading effects and results in unrealistic rendering effects in the downstream applications. Generating Physically Based Rendering (PBR) materials instead of just RGB textures would be a promising solution. However, directly distilling the PBR material parameters from 2D diffusion models still suffers from incorrect material decomposition, such as baked-in shading effects in albedo. We introduce DreamMat, an innovative approach to resolve the aforementioned problem, to generate high-quality PBR materials from text descriptions. We find out that the main reason for the incorrect material distillation is that large-scale 2D diffusion models are only trained to generate final shading colors, resulting in insufficient constraints on material decomposition during distillation. To tackle this problem, we first finetune a new light-aware 2D diffusion model to condition on a given lighting environment and generate the shading results on this specific lighting condition. Then, by applying the same environment lights in the material distillation, DreamMat can generate high-quality PBR materials that are not only consistent with the given geometry but also free from any baked-in shading effects in albedo. Extensive experiments demonstrate that the materials produced through our methods exhibit greater visual appeal to users and achieve significantly superior rendering quality compared to baseline methods, which are preferable for downstream tasks such as game and film production.

5/28/2024

cs.GR cs.AI

DifFUSER: Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation

Duy-Tho Le, Hengcan Shi, Jianfei Cai, Hamid Rezatofighi

Diffusion models have recently gained prominence as powerful deep generative models, demonstrating unmatched performance across various domains. However, their potential in multi-sensor fusion remains largely unexplored. In this work, we introduce DifFUSER, a novel approach that leverages diffusion models for multi-modal fusion in 3D object detection and BEV map segmentation. Benefiting from the inherent denoising property of diffusion, DifFUSER is able to refine or even synthesize sensor features in case of sensor malfunction, thereby improving the quality of the fused output. In terms of architecture, our DifFUSER blocks are chained together in a hierarchical BiFPN fashion, termed cMini-BiFPN, offering an alternative architecture for latent diffusion. We further introduce a Gated Self-conditioned Modulated (GSM) latent diffusion module together with a Progressive Sensor Dropout Training (PSDT) paradigm, designed to add stronger conditioning to the diffusion process and robustness to sensor failures. Our extensive evaluations on the Nuscenes dataset reveal that DifFUSER not only achieves state-of-the-art performance with a 69.1% mIOU in BEV map segmentation tasks but also competes effectively with leading transformer-based fusion techniques in 3D object detection.

4/9/2024

cs.CV