$infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions

Read original: arXiv:2407.14709 - Published 7/23/2024 by Minh-Quan Le, Alexandros Graikos, Srikar Yellapragada, Rajarsi Gupta, Joel Saltz, Dimitris Samaras

$infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions

Overview

The paper introduces "∞-Brush," a novel diffusion model that enables controllable synthesis of high-resolution images in infinite dimensions.
Key contributions include an infinite-dimensional function space model, a new training methodology, and interactive editing capabilities.
The model can generate diverse, high-quality images at arbitrary resolutions, with fine-grained control over the synthesis process.

Plain English Explanation

The paper presents a new AI system called "∞-Brush" that can create detailed, high-quality images at any size the user wants. Rather than being limited to a fixed resolution, ∞-Brush works in an "infinite dimensional" space, allowing it to generate images that can be scaled up or down without losing quality.

At the core of ∞-Brush is a new type of machine learning model called a "diffusion model." Diffusion models work by adding noise to an image, then learning how to reverse that process to generate new images. ∞-Brush takes this a step further by operating in an infinite-dimensional function space, which gives it much more flexibility and control over the synthesis process.

With ∞-Brush, users can interactively edit and refine the generated images, adjusting the content, style, and other attributes. The system also allows for fine-grained control over the image generation, enabling users to create diverse and high-fidelity outputs at any desired resolution.

Technical Explanation

The key innovation in ∞-Brush is the use of an infinite-dimensional function space model to represent the images, rather than a fixed-resolution pixel grid. This allows the model to generate images at arbitrary scales, while maintaining high visual quality and detail.

The authors introduce a new training methodology that leverages this infinite-dimensional representation, along with diffusion-based image inpainting and layered diffusion brushes for interactive editing. This enables fine-grained control over the synthesis process, allowing users to create diverse and high-fidelity images at any desired resolution.

The ∞-Brush model builds on recent advances in progressive high-resolution image generation and resolution-invariant point diffusion models, further expanding the capabilities of diffusion-based approaches for large-scale image synthesis.

Critical Analysis

The authors acknowledge several limitations and areas for future work. For example, the current approach is computationally intensive, which may limit its practical deployment. Additionally, the paper does not provide a comprehensive evaluation of the model's performance across diverse image domains and tasks.

Further research could explore more efficient training and inference strategies, as well as investigate the model's robustness and generalization capabilities. Expanding the interactive editing capabilities and incorporating additional control mechanisms could also enhance the user experience and usability of the system.

Conclusion

The ∞-Brush system represents a significant advancement in the field of large-scale, high-resolution image synthesis using diffusion models. By operating in an infinite-dimensional function space, the model can generate diverse, high-quality images at arbitrary resolutions, with fine-grained control over the synthesis process.

This work has the potential to enable new applications and use cases in areas such as digital art, product design, and visual effects, where the ability to create and manipulate large-scale, high-fidelity imagery is crucial. As the field of generative AI continues to evolve, innovations like ∞-Brush may help push the boundaries of what is possible in terms of creative expression and visual representation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

$infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions

Minh-Quan Le, Alexandros Graikos, Srikar Yellapragada, Rajarsi Gupta, Joel Saltz, Dimitris Samaras

Synthesizing high-resolution images from intricate, domain-specific information remains a significant challenge in generative modeling, particularly for applications in large-image domains such as digital histopathology and remote sensing. Existing methods face critical limitations: conditional diffusion models in pixel or latent space cannot exceed the resolution on which they were trained without losing fidelity, and computational demands increase significantly for larger image sizes. Patch-based methods offer computational efficiency but fail to capture long-range spatial relationships due to their overreliance on local information. In this paper, we introduce a novel conditional diffusion model in infinite dimensions, $infty$-Brush for controllable large image synthesis. We propose a cross-attention neural operator to enable conditioning in function space. Our model overcomes the constraints of traditional finite-dimensional diffusion models and patch-based methods, offering scalability and superior capability in preserving global image structures while maintaining fine details. To our best knowledge, $infty$-Brush is the first conditional diffusion model in function space, that can controllably synthesize images at arbitrary resolutions of up to $4096times4096$ pixels. The code is available at https://github.com/cvlab-stonybrook/infinity-brush.

7/23/2024

Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis

Yifan Wang, Aleksander Holynski, Brian L. Curless, Steven M. Seitz

We present Infinite Texture, a method for generating arbitrarily large texture images from a text prompt. Our approach fine-tunes a diffusion model on a single texture, and learns to embed that statistical distribution in the output domain of the model. We seed this fine-tuning process with a sample texture patch, which can be optionally generated from a text-to-image model like DALL-E 2. At generation time, our fine-tuned diffusion model is used through a score aggregation strategy to generate output texture images of arbitrary resolution on a single GPU. We compare synthesized textures from our method to existing work in patch-based and deep learning texture synthesis methods. We also showcase two applications of our generated textures in 3D rendering and texture transfer.

5/15/2024

Diffusion-based image inpainting with internal learning

Nicolas Cherel, Andr'es Almansa, Yann Gousseau, Alasdair Newson

Diffusion models are now the undisputed state-of-the-art for image generation and image restoration. However, they require large amounts of computational power for training and inference. In this paper, we propose lightweight diffusion models for image inpainting that can be trained on a single image, or a few images. We show that our approach competes with large state-of-the-art models in specific cases. We also show that training a model on a single image is particularly relevant for image acquisition modality that differ from the RGB images of standard learning databases. We show results in three different contexts: texture images, line drawing images, and materials BRDF, for which we achieve state-of-the-art results in terms of realism, with a computational load that is greatly reduced compared to concurrent methods.

6/7/2024

🖼️

Streamlining Image Editing with Layered Diffusion Brushes

Peyman Gholami, Robert Xiao

Denoising diffusion models have recently gained prominence as powerful tools for a variety of image generation and manipulation tasks. Building on this, we propose a novel tool for real-time editing of images that provides users with fine-grained region-targeted supervision in addition to existing prompt-based controls. Our novel editing technique, termed Layered Diffusion Brushes, leverages prompt-guided and region-targeted alteration of intermediate denoising steps, enabling precise modifications while maintaining the integrity and context of the input image. We provide an editor based on Layered Diffusion Brushes modifications, which incorporates well-known image editing concepts such as layer masks, visibility toggles, and independent manipulation of layers; regardless of their order. Our system renders a single edit on a 512x512 image within 140 ms using a high-end consumer GPU, enabling real-time feedback and rapid exploration of candidate edits. We validated our method and editing system through a user study involving both natural images (using inversion) and generated images, showcasing its usability and effectiveness compared to existing techniques such as InstructPix2Pix and Stable Diffusion Inpainting for refining images. Our approach demonstrates efficacy across a range of tasks, including object attribute adjustments, error correction, and sequential prompt-based object placement and manipulation, demonstrating its versatility and potential for enhancing creative workflows.

5/2/2024