ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers

Read original: arXiv:2407.02763 - Published 7/4/2024 by Yanfeng Jiang, Ning Sun, Xueshuo Xie, Fei Yang, Tao Li
Total Score

0

ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper introduces a new post-training quantization (PTQ) method called ADFQ-ViT, which is designed to improve the accuracy of quantized Vision Transformers (ViTs).
  • The key idea is to leverage the distribution of activations in ViTs to guide the quantization process, resulting in models that are more robust to quantization errors.
  • ADFQ-ViT is tested on several ViT models and datasets, demonstrating improved accuracy compared to existing PTQ methods.

Plain English Explanation

The paper focuses on a problem called quantization, which is a way to make AI models smaller and more efficient by reducing the precision of their internal calculations. This is important for deploying AI on devices with limited computing power, like phones or sensors.

The researchers noticed that existing quantization methods don't work as well for a type of AI model called a Vision Transformer (ViT). ViTs are good at processing images, but they have some unique internal structures that make them more sensitive to quantization errors.

To address this, the researchers developed a new quantization method called ADFQ-ViT. The key idea is to carefully analyze how the internal activations (the signals flowing through the ViT) are distributed, and use that information to guide the quantization process. This helps the ViT retain more of its original accuracy even after being quantized.

The researchers tested ADFQ-ViT on several ViT models and datasets, and found that it outperformed existing quantization methods in terms of maintaining the model's accuracy. This is an important step towards making ViTs more practical for deployment on resource-constrained devices.

Technical Explanation

The paper proposes a new post-training quantization (PTQ) method called ADFQ-ViT (Activation-Distribution-Friendly PTQ for ViTs) to address the quantization challenges of ViT models.

Existing PTQ methods often struggle with ViTs due to their unique architecture, which includes a self-attention mechanism that is sensitive to quantization errors. ADFQ-ViT aims to mitigate this by leveraging the distribution of activations in ViTs to guide the quantization process.

The key steps of ADFQ-ViT are:

  1. Activation Analysis: The method analyzes the distribution of activations in the ViT model to identify the optimal bit-width for quantization.
  2. Quantization Parameter Optimization: It then optimizes the quantization parameters, such as the min and max values, to minimize the quantization error while preserving the activation distribution.
  3. Quantization-Aware Fine-tuning: Finally, the quantized model is fine-tuned on the training data to further improve its accuracy.

The researchers evaluate ADFQ-ViT on several ViT models (e.g., ViT, DeiT, and CaiT) and datasets (e.g., ImageNet, CIFAR-100, and ADE20K). The results show that ADFQ-ViT outperforms existing PTQ methods in terms of maintaining the model's accuracy after quantization, demonstrating the effectiveness of the proposed approach.

Critical Analysis

The paper presents a well-designed and thorough evaluation of the ADFQ-ViT method, with experiments conducted on a diverse set of ViT models and datasets. The researchers have also provided insightful discussions on the limitations of existing PTQ methods and how ADFQ-ViT addresses these challenges.

One potential area for further research is to investigate the performance of ADFQ-ViT on more complex or larger ViT models, as the current evaluation is limited to relatively small-scale models. Additionally, it would be interesting to see how ADFQ-ViT compares to other state-of-the-art PTQ methods, such as those based on hybrid vision transformers or efficient softmax approximations.

The paper also does not address the potential impact of ADFQ-ViT on the model's robustness or fairness, which are important considerations for real-world deployment. Further research could explore these aspects and provide a more comprehensive evaluation of the method.

Overall, the ADFQ-ViT method presented in this paper is a significant contribution to the field of ViT quantization, and the researchers have demonstrated its effectiveness through rigorous experimentation. The insights and techniques developed in this work could inform future research on efficient and accurate quantization of transformer-based models.

Conclusion

This paper introduces ADFQ-ViT, a novel post-training quantization method designed to improve the accuracy of quantized Vision Transformer (ViT) models. The key innovation is the use of activation distribution analysis to guide the quantization process, making it more robust to the unique characteristics of ViTs.

The experimental results show that ADFQ-ViT outperforms existing PTQ methods on a variety of ViT models and datasets, demonstrating its effectiveness in maintaining model accuracy after quantization. This is an important advance towards enabling the deployment of ViTs on resource-constrained devices, opening up new possibilities for efficient and high-performing computer vision applications.

The techniques developed in this work could also inform future research on quantization methods for other types of transformer-based models, helping to unlock the full potential of these powerful AI architectures across a wide range of real-world applications.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers
Total Score

0

ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers

Yanfeng Jiang, Ning Sun, Xueshuo Xie, Fei Yang, Tao Li

Vision Transformers (ViTs) have exhibited exceptional performance across diverse computer vision tasks, while their substantial parameter size incurs significantly increased memory and computational demands, impeding effective inference on resource-constrained devices. Quantization has emerged as a promising solution to mitigate these challenges, yet existing methods still suffer from significant accuracy loss at low-bit. We attribute this issue to the distinctive distributions of post-LayerNorm and post-GELU activations within ViTs, rendering conventional hardware-friendly quantizers ineffective, particularly in low-bit scenarios. To address this issue, we propose a novel framework called Activation-Distribution-Friendly post-training Quantization for Vision Transformers, ADFQ-ViT. Concretely, we introduce the Per-Patch Outlier-aware Quantizer to tackle irregular outliers in post-LayerNorm activations. This quantizer refines the granularity of the uniform quantizer to a per-patch level while retaining a minimal subset of values exceeding a threshold at full-precision. To handle the non-uniform distributions of post-GELU activations between positive and negative regions, we design the Shift-Log2 Quantizer, which shifts all elements to the positive region and then applies log2 quantization. Moreover, we present the Attention-score enhanced Module-wise Optimization which adjusts the parameters of each quantizer by reconstructing errors to further mitigate quantization error. Extensive experiments demonstrate ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit. Specifically, when quantizing the ViT-B model to 4-bit, we achieve a 10.23% improvement in Top-1 accuracy on the ImageNet dataset.

Read more

7/4/2024

DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers
Total Score

0

DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers

Lianwei Yang, Haisong Gong, Qingyi Gu

Vision transformers (ViTs) have garnered significant attention for their performance in vision tasks, but the high computational cost and significant latency issues have hindered widespread adoption. Post-training quantization (PTQ), a promising method for model compression, still faces accuracy degradation challenges with ViTs. There are two reasons for this: the existing quantization paradigm does not fit the power-law distribution of post-Softmax activations well, and accuracy inevitably decreases after reparameterizing post-LayerNorm activations. We propose a Distribution-Friendly and Outlier-Aware Post-training Quantization method for Vision Transformers, named DopQ-ViT. DopQ-ViT analyzes the inefficiencies of current quantizers and introduces a distribution-friendly Tan Quantizer called TanQ. TanQ focuses more on values near 1, more accurately preserving the power-law distribution of post-Softmax activations, and achieves favorable results. Besides, during the reparameterization of post-LayerNorm activations from channel-wise to layer-wise quantization, the accuracy degradation is mainly due to the significant impact of outliers in the scaling factors. Therefore, DopQ-ViT proposes a method to select Median as the Optimal Scaling Factor, denoted as MOSF, which compensates for the influence of outliers and preserves the performance of the quantization model. DopQ-ViT has been extensively validated and significantly improves the performance of quantization models, especially in low-bit settings.

Read more

8/19/2024

AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
Total Score

0

AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer

Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong, Di Huang, Yunhong Wang

Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. Despite the high accuracy, deploying it in real applications raises critical challenges including the high computational cost and inference latency. Recently, the post-training quantization (PTQ) technique has emerged as a promising way to enhance ViT's efficiency. Nevertheless, existing PTQ approaches for ViT suffer from the inflexible quantization on the post-Softmax and post-GELU activations that obey the power-law-like distributions. To address these issues, we propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer. It optimizes the logarithmic base to accommodate the power-law-like distribution of activations, while simultaneously allowing for hardware-friendly quantization and de-quantization. By employing the bias reparameterization, the AdaLog quantizer is applicable to both the post-Softmax and post-GELU activations. Moreover, we develop an efficient Fast Progressive Combining Search (FPCS) strategy to determine the optimal logarithm base for AdaLog, as well as the scaling factors and zero points for the uniform quantizers. Extensive experimental results on public benchmarks demonstrate the effectiveness of our approach for various ViT-based architectures and vision tasks including classification, object detection, and instance segmentation. Code is available at https://github.com/GoatWu/AdaLog.

Read more

7/19/2024

👀

Total Score

0

Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems

Jemin Lee, Yongin Kwon, Sihyeong Park, Misun Yu, Jeman Park, Hwanjun Song

Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread implementation. To address this issue, researchers have proposed efficient hybrid transformer architectures that combine convolutional and transformer layers with optimized attention computation of linear complexity. Additionally, post-training quantization has been proposed as a means of mitigating computational demands. For mobile devices, achieving optimal acceleration for ViTs necessitates the strategic integration of quantization techniques and efficient hybrid transformer structures. However, no prior investigation has applied quantization to efficient hybrid transformers. In this paper, we discover that applying existing post-training quantization (PTQ) methods for ViTs to efficient hybrid transformers leads to a drastic accuracy drop, attributed to the four following challenges: (i) highly dynamic ranges, (ii) zero-point overflow, (iii) diverse normalization, and (iv) limited model parameters ($<$5M). To overcome these challenges, we propose a new post-training quantization method, which is the first to quantize efficient hybrid ViTs (MobileViTv1, MobileViTv2, Mobile-Former, EfficientFormerV1, EfficientFormerV2). We achieve a significant improvement of 17.73% for 8-bit and 29.75% for 6-bit on average, respectively, compared with existing PTQ methods (EasyQuant, FQ-ViT, PTQ4ViT, and RepQ-ViT)}. We plan to release our code at https://gitlab.com/ones-ai/q-hyvit.

Read more

5/20/2024