LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models

2404.11098

Published 4/22/2024 by Dingkun Zhang, Sijia Li, Chen Chen, Qingsong Xie, Haonan Lu

LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models

Abstract

In the era of AIGC, the demand for low-budget or even on-device applications of diffusion models emerged. In terms of compressing the Stable Diffusion models (SDMs), several approaches have been proposed, and most of them leveraged the handcrafted layer removal methods to obtain smaller U-Nets, along with knowledge distillation to recover the network performance. However, such a handcrafting manner of layer removal is inefficient and lacks scalability and generalization, and the feature distillation employed in the retraining phase faces an imbalance issue that a few numerically significant feature loss terms dominate over others throughout the retraining process. To this end, we proposed the layer pruning and normalized distillation for compressing diffusion models (LAPTOP-Diff). We, 1) introduced the layer pruning method to compress SDM's U-Net automatically and proposed an effective one-shot pruning criterion whose one-shot performance is guaranteed by its good additivity property, surpassing other layer pruning and handcrafted layer removal methods, 2) proposed the normalized feature distillation for retraining, alleviated the imbalance issue. Using the proposed LAPTOP-Diff, we compressed the U-Nets of SDXL and SDM-v1.5 for the most advanced performance, achieving a minimal 4.0% decline in PickScore at a pruning ratio of 50% while the comparative methods' minimal PickScore decline is 8.2%. We will release our code.

Create account to get full access

Overview

This paper introduces LAPTOP-Diff, a method for compressing diffusion models by layer pruning and normalized distillation.
Diffusion models are a powerful class of generative models, but they can be computationally expensive and resource-intensive, limiting their practical applications.
The authors propose LAPTOP-Diff as a way to reduce the size and inference time of diffusion models without significantly compromising their performance.

Plain English Explanation

Diffusion models are a type of machine learning model that can generate new images, text, or other data by learning patterns from existing data. These models have shown impressive results, but they can be very complex and require a lot of computing power to run.

The researchers who wrote this paper developed a new method called LAPTOP-Diff to make diffusion models smaller and faster, without losing too much of their capability. The key ideas are:

Layer Pruning: The researchers identify parts of the diffusion model that aren't contributing much to the overall performance, and remove or "prune" those parts. This reduces the size of the model.
Normalized Distillation: The researchers then use a technique called "distillation" to transfer the knowledge from the original, larger diffusion model into a smaller, more efficient model. This helps maintain the performance of the original model.

By combining these two techniques, the researchers were able to create compressed versions of diffusion models that are much smaller and faster to run, while still producing high-quality results. This could make diffusion models more practical to use in real-world applications, like generating images or recognizing objects in images.

Technical Explanation

The paper introduces LAPTOP-Diff, a method for compressing diffusion models using a combination of layer pruning and a novel normalized distillation technique.

Diffusion models, which have demonstrated impressive performance in tasks like image and text generation, can be computationally expensive and resource-intensive, limiting their practical applications. To address this, the authors propose LAPTOP-Diff, which consists of two key components:

Layer Pruning: The researchers identify layers in the diffusion model that contribute the least to the overall performance, and prune (remove) those layers. This reduces the size and complexity of the model without significantly impacting its performance.
Normalized Distillation: After pruning, the researchers use a distillation process to transfer the knowledge from the original, larger diffusion model into a smaller, more efficient model. This normalized distillation technique helps maintain the performance of the original model in the compressed version.

The authors evaluate LAPTOP-Diff on several diffusion model benchmarks, including image generation and long-tailed recognition. The results show that LAPTOP-Diff can achieve significant model compression (up to 8x) with minimal loss in performance, outperforming existing compression techniques like LD-Pruner, Missing U, and SparseDM.

Critical Analysis

The paper presents a compelling approach to compressing diffusion models, addressing an important problem in the field of generative AI. The combination of layer pruning and normalized distillation appears to be an effective strategy for reducing the size and inference time of these models without significantly compromising their performance.

One potential limitation of the LAPTOP-Diff method is that it may not be as effective for all types of diffusion models or tasks. The authors primarily evaluate their approach on image generation and long-tailed recognition, and it's unclear how well it would generalize to other domains, such as text generation or audio synthesis.

Additionally, the paper does not provide a detailed analysis of the computational and memory requirements of LAPTOP-Diff, which would be valuable for understanding the practical implications of the method. Further research could explore the tradeoffs between model size, inference time, and performance in more depth.

Overall, the LAPTOP-Diff method represents a promising step towards more efficient and practical diffusion models, and the ideas presented in the paper could inspire future work in this area.

Conclusion

The LAPTOP-Diff paper introduces a novel approach for compressing diffusion models by combining layer pruning and normalized distillation. This technique allows for significant reductions in model size and inference time without significantly compromising the performance of these powerful generative models.

The results demonstrate the effectiveness of LAPTOP-Diff on several benchmarks, suggesting that this method could make diffusion models more practical for real-world applications, such as image generation and object recognition. As the field of generative AI continues to evolve, the ideas presented in this paper could contribute to the development of more efficient and deployable diffusion models.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights

Thibault Castells, Hyoung-Kyu Song, Bo-Kyeong Kim, Shinkook Choi

Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources. However, deploying LDMs on resource-limited devices remains a complex issue, presenting challenges such as memory consumption and inference speed. To address this issue, we introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing LDMs. Traditional pruning methods for deep neural networks are not tailored to the unique characteristics of LDMs, such as the high computational cost of training and the absence of a fast, straightforward and task-agnostic method for evaluating model performance. Our method tackles these challenges by leveraging the latent space during the pruning process, enabling us to effectively quantify the impact of pruning on model performance, independently of the task at hand. This targeted pruning of components with minimal impact on the output allows for faster convergence during training, as the model has less information to re-learn, thereby addressing the high computational cost of training. Consequently, our approach achieves a compressed model that offers improved inference speed and reduced parameter count, while maintaining minimal performance degradation. We demonstrate the effectiveness of our approach on three different tasks: text-to-image (T2I) generation, Unconditional Image Generation (UIG) and Unconditional Audio Generation (UAG). Notably, we reduce the inference time of Stable Diffusion (SD) by 34.9% while simultaneously improving its FID by 5.2% on MS-COCO T2I benchmark. This work paves the way for more efficient pruning methods for LDMs, enhancing their applicability.

4/19/2024

cs.LG cs.AI cs.CV

Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model

Jinyin Chen, Xiaoming Zhao, Haibin Zheng, Xiao Li, Sheng Xiang, Haifeng Guo

Benefiting from well-trained deep neural networks (DNNs), model compression have captured special attention for computing resource limited equipment, especially edge devices. Knowledge distillation (KD) is one of the widely used compression techniques for edge deployment, by obtaining a lightweight student model from a well-trained teacher model released on public platforms. However, it has been empirically noticed that the backdoor in the teacher model will be transferred to the student model during the process of KD. Although numerous KD methods have been proposed, most of them focus on the distillation of a high-performing student model without robustness consideration. Besides, some research adopts KD techniques as effective backdoor mitigation tools, but they fail to perform model compression at the same time. Consequently, it is still an open problem to well achieve two objectives of robust KD, i.e., student model's performance and backdoor mitigation. To address these issues, we propose RobustKD, a robust knowledge distillation that compresses the model while mitigating backdoor based on feature variance. Specifically, RobustKD distinguishes the previous works in three key aspects: (1) effectiveness: by distilling the feature map of the teacher model after detoxification, the main task performance of the student model is comparable to that of the teacher model; (2) robustness: by reducing the characteristic variance between the teacher model and the student model, it mitigates the backdoor of the student model under backdoored teacher model scenario; (3) generic: RobustKD still has good performance in the face of multiple data models (e.g., WRN 28-4, Pyramid-200) and diverse DNNs (e.g., ResNet50, MobileNet).

6/6/2024

cs.LG cs.AI

SFDDM: Single-fold Distillation for Diffusion models

Chi Hong, Jiyue Huang, Robert Birke, Dick Epema, Stefanie Roos, Lydia Y. Chen

While diffusion models effectively generate remarkable synthetic images, a key limitation is the inference inefficiency, requiring numerous sampling steps. To accelerate inference and maintain high-quality synthesis, teacher-student distillation is applied to compress the diffusion models in a progressive and binary manner by retraining, e.g., reducing the 1024-step model to a 128-step model in 3 folds. In this paper, we propose a single-fold distillation algorithm, SFDDM, which can flexibly compress the teacher diffusion model into a student model of any desired step, based on reparameterization of the intermediate inputs from the teacher model. To train the student diffusion, we minimize not only the output distance but also the distribution of the hidden variables between the teacher and student model. Extensive experiments on four datasets demonstrate that our student model trained by the proposed SFDDM is able to sample high-quality data with steps reduced to as little as approximately 1%, thus, trading off inference time. Our remarkable performance highlights that SFDDM effectively transfers knowledge in single-fold distillation, achieving semantic consistency and meaningful image interpolation.

5/27/2024

cs.CV cs.LG

Distill-then-prune: An Efficient Compression Framework for Real-time Stereo Matching Network on Edge Devices

Baiyu Pan, Jichao Jiao, Jianxing Pang, Jun Cheng

In recent years, numerous real-time stereo matching methods have been introduced, but they often lack accuracy. These methods attempt to improve accuracy by introducing new modules or integrating traditional methods. However, the improvements are only modest. In this paper, we propose a novel strategy by incorporating knowledge distillation and model pruning to overcome the inherent trade-off between speed and accuracy. As a result, we obtained a model that maintains real-time performance while delivering high accuracy on edge devices. Our proposed method involves three key steps. Firstly, we review state-of-the-art methods and design our lightweight model by removing redundant modules from those efficient models through a comparison of their contributions. Next, we leverage the efficient model as the teacher to distill knowledge into the lightweight model. Finally, we systematically prune the lightweight model to obtain the final model. Through extensive experiments conducted on two widely-used benchmarks, Sceneflow and KITTI, we perform ablation studies to analyze the effectiveness of each module and present our state-of-the-art results.

5/21/2024

cs.CV cs.AI