Conditional Distribution Modelling for Few-Shot Image Synthesis with Diffusion Models

2404.16556

Published 4/30/2024 by Parul Gupta, Munawar Hayat, Abhinav Dhall, Thanh-Toan Do

🖼️

Abstract

Few-shot image synthesis entails generating diverse and realistic images of novel categories using only a few example images. While multiple recent efforts in this direction have achieved impressive results, the existing approaches are dependent only upon the few novel samples available at test time in order to generate new images, which restricts the diversity of the generated images. To overcome this limitation, we propose Conditional Distribution Modelling (CDM) -- a framework which effectively utilizes Diffusion models for few-shot image generation. By modelling the distribution of the latent space used to condition a Diffusion process, CDM leverages the learnt statistics of the training data to get a better approximation of the unseen class distribution, thereby removing the bias arising due to limited number of few shot samples. Simultaneously, we devise a novel inversion based optimization strategy that further improves the approximated unseen class distribution, and ensures the fidelity of the generated samples to the unseen class. The experimental results on four benchmark datasets demonstrate the effectiveness of our proposed CDM for few-shot generation.

Create account to get full access

Overview

This paper proposes a framework called Conditional Distribution Modelling (CDM) that leverages Diffusion models for few-shot image generation.
The key idea is to model the distribution of the latent space used to condition the Diffusion process, which helps capture the statistics of the training data and better approximate the unseen class distribution.
The authors also devise a novel inversion-based optimization strategy to further improve the approximated unseen class distribution and ensure the fidelity of the generated samples.

Plain English Explanation

The paper tackles the challenge of few-shot image synthesis, which is the task of generating diverse and realistic images of new categories using only a few example images. While recent approaches have made impressive progress, they rely solely on the limited few-shot samples available at test time, which restricts the diversity of the generated images.

To overcome this limitation, the researchers propose Conditional Distribution Modelling (CDM), a framework that uses Diffusion models for few-shot image generation. The key idea is to model the distribution of the latent space used to condition the Diffusion process. This allows the framework to leverage the learned statistics of the training data, which helps it better approximate the distribution of the unseen class, rather than being limited by the few-shot samples.

Additionally, the authors devise a novel inversion-based optimization strategy that further improves the approximated unseen class distribution and ensures the fidelity of the generated samples to the unseen class. This helps address the bias that can arise from the limited few-shot samples.

Technical Explanation

The paper presents the Conditional Distribution Modelling (CDM) framework, which builds on the success of Diffusion models for few-shot image generation. The key innovation is the incorporation of a latent space distribution model, which allows the framework to capture the statistics of the training data and better approximate the distribution of the unseen class.

Specifically, the authors first train a Diffusion model on the available training data. They then model the distribution of the latent space used to condition the Diffusion process, using a separate neural network. This latent space distribution model is trained to capture the statistics of the training data, which helps the framework generate samples that better reflect the unseen class distribution, rather than being limited by the few-shot samples.

To further improve the generated samples, the authors also devise a novel inversion-based optimization strategy. This strategy iteratively refines the latent space representation of the few-shot samples, with the goal of ensuring that the generated samples are faithful to the unseen class. This helps address the potential bias introduced by the limited few-shot samples.

The experimental results on four benchmark datasets demonstrate the effectiveness of the proposed CDM framework for few-shot image generation, outperforming state-of-the-art approaches.

Critical Analysis

The paper presents a thoughtful and well-designed approach to the challenge of few-shot image synthesis. The key innovation of modeling the latent space distribution is a clever way to leverage the training data statistics and improve the generation of unseen class samples.

However, the paper does not address the potential limitations of this approach. For example, the effectiveness of the latent space distribution modeling may depend on the similarity between the training and unseen classes, and the framework may struggle with larger distributional shifts. Additionally, the computational overhead of the inversion-based optimization strategy could be a practical concern, especially for real-world deployment.

Further research could explore ways to make the framework more robust to distributional shifts, as well as investigate more efficient optimization techniques that maintain the generation quality. It would also be interesting to see how the CDM framework compares to joint conditional diffusion models or other efficient conditional diffusion models for few-shot image synthesis tasks.

Conclusion

The Conditional Distribution Modelling (CDM) framework proposed in this paper represents a significant advancement in the field of few-shot image synthesis. By modeling the latent space distribution and leveraging an inversion-based optimization strategy, the framework is able to generate diverse and realistic images of unseen classes while overcoming the limitations of relying solely on the few-shot samples.

The results demonstrate the effectiveness of this approach, and the insights gained from this research could inspire further innovations in few-shot image generation and other generative modeling tasks. As the field continues to evolve, these types of advancements will be crucial for unlocking the full potential of artificial intelligence in generating and manipulating visual content.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🖼️

CCDM: Continuous Conditional Diffusion Models for Image Generation

Xin Ding, Yongwei Wang, Kao Zhang, Z. Jane Wang

Continuous Conditional Generative Modeling (CCGM) aims to estimate the distribution of high-dimensional data, typically images, conditioned on scalar continuous variables known as regression labels. While Continuous conditional Generative Adversarial Networks (CcGANs) were initially designed for this task, their adversarial training mechanism remains vulnerable to extremely sparse or imbalanced data, resulting in suboptimal outcomes. To enhance the quality of generated images, a promising alternative is to replace CcGANs with Conditional Diffusion Models (CDMs), renowned for their stable training process and ability to produce more realistic images. However, existing CDMs encounter challenges when applied to CCGM tasks due to several limitations such as inadequate U-Net architectures and deficient model fitting mechanisms for handling regression labels. In this paper, we introduce Continuous Conditional Diffusion Models (CCDMs), the first CDM designed specifically for the CCGM task. CCDMs address the limitations of existing CDMs by introducing specially designed conditional diffusion processes, a modified denoising U-Net with a custom-made conditioning mechanism, a novel hard vicinal loss for model fitting, and an efficient conditional sampling procedure. With comprehensive experiments on four datasets with varying resolutions ranging from 64x64 to 192x192, we demonstrate the superiority of the proposed CCDM over state-of-the-art CCGM models, establishing new benchmarks in CCGM. Extensive ablation studies validate the model design and implementation configuration of the proposed CCDM. Our code is publicly available at https://github.com/UBCDingXin/CCDM.

5/7/2024

cs.CV cs.LG

Bayesian Conditioned Diffusion Models for Inverse Problems

Alper Gungor, Bahri Batuhan Bilecen, Tolga c{C}ukur

Diffusion models have recently been shown to excel in many image reconstruction tasks that involve inverse problems based on a forward measurement operator. A common framework uses task-agnostic unconditional models that are later post-conditioned for reconstruction, an approach that typically suffers from suboptimal task performance. While task-specific conditional models have also been proposed, current methods heuristically inject measured data as a naive input channel that elicits sampling inaccuracies. Here, we address the optimal conditioning of diffusion models for solving challenging inverse problems that arise during image reconstruction. Specifically, we propose a novel Bayesian conditioning technique for diffusion models, BCDM, based on score-functions associated with the conditional distribution of desired images given measured data. We rigorously derive the theory to express and train the conditional score-function. Finally, we show state-of-the-art performance in image dealiasing, deblurring, super-resolution, and inpainting with the proposed technique.

6/17/2024

cs.CV cs.AI cs.LG

Diffusion Models Are Innate One-Step Generators

Bowen Zheng, Tianming Yang

Diffusion Models (DMs) have achieved great success in image generation and other fields. By fine sampling through the trajectory defined by the SDE/ODE solver based on a well-trained score model, DMs can generate remarkable high-quality results. However, this precise sampling often requires multiple steps and is computationally demanding. To address this problem, instance-based distillation methods have been proposed to distill a one-step generator from a DM by having a simpler student model mimic a more complex teacher model. Yet, our research reveals an inherent limitations in these methods: the teacher model, with more steps and more parameters, occupies different local minima compared to the student model, leading to suboptimal performance when the student model attempts to replicate the teacher. To avoid this problem, we introduce a novel distributional distillation method, which uses an exclusive distributional loss. This method exceeds state-of-the-art (SOTA) results while requiring significantly fewer training images. Additionally, we show that DMs' layers are differentially activated at different time steps, leading to an inherent capability to generate images in a single step. Freezing most of the convolutional layers in a DM during distributional distillation enables this innate capability and leads to further performance improvements. Our method achieves the SOTA results on CIFAR-10 (FID 1.54), AFHQv2 64x64 (FID 1.23), FFHQ 64x64 (FID 0.85) and ImageNet 64x64 (FID 1.16) with great efficiency. Most of those results are obtained with only 5 million training images within 6 hours on 8 A100 GPUs.

6/10/2024

cs.CV

🔗

CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling

Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley, Otmar Hilliges, Romann M. Weber

While conditional diffusion models are known to have good coverage of the data distribution, they still face limitations in output diversity, particularly when sampled with a high classifier-free guidance scale for optimal image quality or when trained on small datasets. We attribute this problem to the role of the conditioning signal in inference and offer an improved sampling strategy for diffusion models that can increase generation diversity, especially at high guidance scales, with minimal loss of sample quality. Our sampling strategy anneals the conditioning signal by adding scheduled, monotonically decreasing Gaussian noise to the conditioning vector during inference to balance diversity and condition alignment. Our Condition-Annealed Diffusion Sampler (CADS) can be used with any pretrained model and sampling algorithm, and we show that it boosts the diversity of diffusion models in various conditional generation tasks. Further, using an existing pretrained diffusion model, CADS achieves a new state-of-the-art FID of 1.70 and 2.31 for class-conditional ImageNet generation at 256$times$256 and 512$times$512 respectively.

5/14/2024

cs.CV