SFDDM: Single-fold Distillation for Diffusion models

2405.14961

Published 5/27/2024 by Chi Hong, Jiyue Huang, Robert Birke, Dick Epema, Stefanie Roos, Lydia Y. Chen

SFDDM: Single-fold Distillation for Diffusion models

Abstract

While diffusion models effectively generate remarkable synthetic images, a key limitation is the inference inefficiency, requiring numerous sampling steps. To accelerate inference and maintain high-quality synthesis, teacher-student distillation is applied to compress the diffusion models in a progressive and binary manner by retraining, e.g., reducing the 1024-step model to a 128-step model in 3 folds. In this paper, we propose a single-fold distillation algorithm, SFDDM, which can flexibly compress the teacher diffusion model into a student model of any desired step, based on reparameterization of the intermediate inputs from the teacher model. To train the student diffusion, we minimize not only the output distance but also the distribution of the hidden variables between the teacher and student model. Extensive experiments on four datasets demonstrate that our student model trained by the proposed SFDDM is able to sample high-quality data with steps reduced to as little as approximately 1%, thus, trading off inference time. Our remarkable performance highlights that SFDDM effectively transfers knowledge in single-fold distillation, achieving semantic consistency and meaningful image interpolation.

Create account to get full access

Overview

Introduces a new method called "Single-fold Distillation for Diffusion models" (SFDDM) to efficiently distill large diffusion models into smaller models.
Aims to improve upon existing distillation techniques for diffusion models, which can be computationally expensive.
Proposes a simpler distillation approach that only requires a single forward pass through the student model, in contrast to multi-fold distillation methods.

Plain English Explanation

Diffusion models are a powerful type of machine learning model that can generate high-quality images and other types of data. However, these models can be very large and complex, making them computationally expensive to use. To address this, researchers have developed "distillation" techniques to compress large diffusion models into smaller, more efficient models.

The paper introduces a new distillation method called "Single-fold Distillation for Diffusion models" (SFDDM). This approach aims to simplify the distillation process by only requiring a single forward pass through the student model, rather than the multiple passes needed in some previous methods.

The key idea behind SFDDM is to directly match the output of the student model to the output of the large, pre-trained teacher model. This is done by minimizing the difference between the two model outputs, which encourages the student model to learn the same behavior as the teacher model. The researchers show that this simple approach can achieve competitive performance compared to more complex distillation techniques, while being much more computationally efficient.

Technical Explanation

The paper introduces a new distillation method called "Single-fold Distillation for Diffusion models" (SFDDM) that aims to efficiently compress large diffusion models into smaller, more computationally efficient models.

In contrast to previous multi-fold distillation techniques, such as those used in Improved Distribution Matching Distillation for Fast Image Synthesis, Distilling Diffusion Models into Conditional GANs, and Scott: Accelerating Diffusion Models via Stochastic Consistency Distillation, SFDDM only requires a single forward pass through the student model during the distillation process.

The paper also explores variants of SFDDM, including a "Directly Denoising Diffusion Model" (D2M) approach that directly matches the student's denoised output to the teacher's, as proposed in Directly Denoising Diffusion Model, and a "Score-Identity Distillation" (SID) approach that matches the student's score function to the teacher's, as in Score-Identity Distillation: Exponentially Fast Distillation of Pretrained Models.

The researchers evaluate SFDDM and its variants on several diffusion model benchmarks, demonstrating its ability to achieve strong performance while being significantly more computationally efficient than previous distillation approaches.

Critical Analysis

The paper presents a novel and promising approach to efficiently distilling large diffusion models into smaller, more computationally efficient models. The key strength of SFDDM is its simplicity, requiring only a single forward pass through the student model during the distillation process, which is a significant improvement over more complex multi-fold distillation techniques.

However, the paper does not provide a deep analysis of the limitations or potential downsides of SFDDM. For example, it would be useful to understand how SFDDM's performance compares to the teacher model, as well as the impact of the distillation process on the student model's ability to generalize to diverse inputs.

Additionally, the paper could have explored more variations of the SFDDM approach, such as incorporating additional loss terms or exploring alternative ways to match the student and teacher model outputs. Further research in these areas could help to identify the optimal distillation strategy for different use cases and model architectures.

Overall, the SFDDM method presented in the paper is a valuable contribution to the field of diffusion model compression and optimization. However, continued research and a more thorough analysis of the method's strengths, weaknesses, and potential improvements would further strengthen the work.

Conclusion

The paper introduces a new method called "Single-fold Distillation for Diffusion models" (SFDDM) that aims to efficiently compress large diffusion models into smaller, more computationally efficient models. SFDDM achieves this by directly matching the output of the student model to the output of the large, pre-trained teacher model, requiring only a single forward pass through the student model.

The researchers demonstrate that SFDDM and its variants can achieve competitive performance compared to more complex distillation techniques, while being significantly more computationally efficient. This makes SFDDM a promising approach for deploying large diffusion models in real-world applications where computational resources are limited.

Overall, the SFDDM method presented in this paper represents an important advance in the field of diffusion model compression and optimization, and the researchers' findings suggest that further exploration of this approach could lead to even more efficient and effective ways of leveraging the power of diffusion models in a wide range of applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Diffusion Models Are Innate One-Step Generators

Bowen Zheng, Tianming Yang

Diffusion Models (DMs) have achieved great success in image generation and other fields. By fine sampling through the trajectory defined by the SDE/ODE solver based on a well-trained score model, DMs can generate remarkable high-quality results. However, this precise sampling often requires multiple steps and is computationally demanding. To address this problem, instance-based distillation methods have been proposed to distill a one-step generator from a DM by having a simpler student model mimic a more complex teacher model. Yet, our research reveals an inherent limitations in these methods: the teacher model, with more steps and more parameters, occupies different local minima compared to the student model, leading to suboptimal performance when the student model attempts to replicate the teacher. To avoid this problem, we introduce a novel distributional distillation method, which uses an exclusive distributional loss. This method exceeds state-of-the-art (SOTA) results while requiring significantly fewer training images. Additionally, we show that DMs' layers are differentially activated at different time steps, leading to an inherent capability to generate images in a single step. Freezing most of the convolutional layers in a DM during distributional distillation enables this innate capability and leads to further performance improvements. Our method achieves the SOTA results on CIFAR-10 (FID 1.54), AFHQv2 64x64 (FID 1.23), FFHQ 64x64 (FID 0.85) and ImageNet 64x64 (FID 1.16) with great efficiency. Most of those results are obtained with only 5 million training images within 6 hours on 8 A100 GPUs.

6/10/2024

cs.CV

🖼️

Improved Distribution Matching Distillation for Fast Image Synthesis

Tianwei Yin, Michael Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman

Recent approaches have shown promises distilling diffusion models into efficient one-step generators. Among them, Distribution Matching Distillation (DMD) produces one-step generators that match their teacher in distribution, without enforcing a one-to-one correspondence with the sampling trajectories of their teachers. However, to ensure stable training, DMD requires an additional regression loss computed using a large set of noise-image pairs generated by the teacher with many steps of a deterministic sampler. This is costly for large-scale text-to-image synthesis and limits the student's quality, tying it too closely to the teacher's original sampling paths. We introduce DMD2, a set of techniques that lift this limitation and improve DMD training. First, we eliminate the regression loss and the need for expensive dataset construction. We show that the resulting instability is due to the fake critic not estimating the distribution of generated samples accurately and propose a two time-scale update rule as a remedy. Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images. This lets us train the student model on real data, mitigating the imperfect real score estimation from the teacher model, and enhancing quality. Lastly, we modify the training procedure to enable multi-step sampling. We identify and address the training-inference input mismatch problem in this setting, by simulating inference-time generator samples during training time. Taken together, our improvements set new benchmarks in one-step image generation, with FID scores of 1.28 on ImageNet-64x64 and 8.35 on zero-shot COCO 2014, surpassing the original teacher despite a 500X reduction in inference cost. Further, we show our approach can generate megapixel images by distilling SDXL, demonstrating exceptional visual quality among few-step methods.

5/27/2024

cs.CV

Plug-and-Play Diffusion Distillation

Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte, Wei-An Lin, Hui Qu, Mingi Kwon, Ratheesh Kalarot

Diffusion models have shown tremendous results in image generation. However, due to the iterative nature of the diffusion process and its reliance on classifier-free guidance, inference times are slow. In this paper, we propose a new distillation approach for guided diffusion models in which an external lightweight guide model is trained while the original text-to-image model remains frozen. We show that our method reduces the inference computation of classifier-free guided latent-space diffusion models by almost half, and only requires 1% trainable parameters of the base model. Furthermore, once trained, our guide model can be applied to various fine-tuned, domain-specific versions of the base diffusion model without the need for additional training: this plug-and-play functionality drastically improves inference computation while maintaining the visual fidelity of generated images. Empirically, we show that our approach is able to produce visually appealing results and achieve a comparable FID score to the teacher with as few as 8 to 16 steps.

6/17/2024

cs.CV

📉

Distilling Diffusion Models into Conditional GANs

Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park

We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space, utilizing an ensemble of augmentations. Furthermore, we adapt a diffusion model to construct a multi-scale discriminator with a text alignment loss to build an effective conditional GAN-based formulation. E-LatentLPIPS converges more efficiently than many existing distillation methods, even accounting for dataset construction costs. We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models -- DMD, SDXL-Turbo, and SDXL-Lightning -- on the zero-shot COCO benchmark.

6/17/2024

cs.CV cs.GR cs.LG