AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer

Read original: arXiv:2407.12951 - Published 7/19/2024 by Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong, Di Huang, Yunhong Wang

AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer

Overview

This paper presents AdaLog, a post-training quantization (PTQ) method for Vision Transformers (ViTs) that uses an Adaptive Logarithm Quantizer (AdaLog).
The key idea is to adaptively determine the quantization parameters for each layer of the ViT model based on the statistical properties of the feature activations, rather than using a one-size-fits-all approach.
The authors show that AdaLog can achieve high accuracy with aggressive quantization, outperforming other state-of-the-art PTQ methods for ViTs.

Plain English Explanation

The paper introduces a new way to compress and speed up Vision Transformers, which are a type of deep learning model used for image recognition tasks. The main challenge is how to reduce the size and computational cost of these models without significantly impacting their accuracy.

The proposed method, called AdaLog, uses a clever technique called "post-training quantization" to achieve this. Quantization involves reducing the number of bits used to represent the model's parameters, which reduces the model's size and speeds up its inference. However, naively applying quantization can hurt accuracy.

AdaLog solves this by adaptively determining the best quantization parameters for each layer of the Vision Transformer. It looks at the statistical properties of the activations (the intermediate results) in each layer and uses that to pick the right quantization approach. This is better than using a one-size-fits-all quantization scheme across the whole model.

By doing this, AdaLog can aggressively compress the Vision Transformer models while still maintaining high accuracy, outperforming other state-of-the-art quantization techniques for these types of models. This makes it easier to deploy powerful Vision Transformer models on resource-constrained devices like smartphones.

Technical Explanation

The key innovation in this paper is the Adaptive Logarithm Quantizer (AdaLog), which is used as the post-training quantization (PTQ) method for Vision Transformers (ViTs).

Typical PTQ approaches use a single set of quantization parameters for the entire model. In contrast, AdaLog adaptively determines the quantization parameters for each layer of the ViT based on the statistical properties of the feature activations in that layer.

Specifically, AdaLog models the activation distributions using a logarithmic function, which allows it to better capture the long-tailed nature of typical activation distributions. It then uses a progressive search algorithm to find the optimal quantization parameters for each layer that minimize the quantization error.

The authors show that this adaptive, layer-wise quantization approach outperforms other state-of-the-art PTQ methods for ViTs, such as ADFQ-ViT, PTQ4ViT, Q-HyVIT, ADPQ, and TrioViT. AdaLog achieves higher accuracy at lower bit-widths, enabling more aggressive model compression.

Critical Analysis

The paper provides a thorough evaluation of AdaLog on various ViT models and datasets, demonstrating its effectiveness. However, a few potential limitations and areas for further research are worth noting:

The authors only consider post-training quantization, and it would be interesting to see how AdaLog compares to other quantization approaches like quantization-aware training.
The progressive search algorithm used to find the optimal quantization parameters has a relatively high computational cost, which could limit its practical applicability, especially for larger models.
The paper does not explore the effects of AdaLog on the model's robustness or other non-accuracy metrics, which could be important in real-world deployments.

Overall, AdaLog represents a promising step forward in making powerful Vision Transformer models more efficient and deployable on resource-constrained devices. Further research addressing the above limitations could help unlock the full potential of this approach.

Conclusion

This paper presents AdaLog, a novel post-training quantization method for Vision Transformers that uses an Adaptive Logarithm Quantizer to determine the optimal quantization parameters for each layer of the model. By adapting the quantization to the statistical properties of the activations in each layer, AdaLog is able to achieve high accuracy even with aggressive model compression, outperforming other state-of-the-art PTQ techniques.

The key contribution of this work is the insight that a one-size-fits-all quantization approach is suboptimal for complex models like ViTs, and that an adaptive, layer-wise quantization strategy can unlock significant efficiency gains. As Vision Transformers continue to advance and find widespread adoption, techniques like AdaLog will play an important role in making these powerful models more practical for real-world applications on resource-constrained devices.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer

Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong, Di Huang, Yunhong Wang

Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. Despite the high accuracy, deploying it in real applications raises critical challenges including the high computational cost and inference latency. Recently, the post-training quantization (PTQ) technique has emerged as a promising way to enhance ViT's efficiency. Nevertheless, existing PTQ approaches for ViT suffer from the inflexible quantization on the post-Softmax and post-GELU activations that obey the power-law-like distributions. To address these issues, we propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer. It optimizes the logarithmic base to accommodate the power-law-like distribution of activations, while simultaneously allowing for hardware-friendly quantization and de-quantization. By employing the bias reparameterization, the AdaLog quantizer is applicable to both the post-Softmax and post-GELU activations. Moreover, we develop an efficient Fast Progressive Combining Search (FPCS) strategy to determine the optimal logarithm base for AdaLog, as well as the scaling factors and zero points for the uniform quantizers. Extensive experimental results on public benchmarks demonstrate the effectiveness of our approach for various ViT-based architectures and vision tasks including classification, object detection, and instance segmentation. Code is available at https://github.com/GoatWu/AdaLog.

7/19/2024

ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers

Yanfeng Jiang, Ning Sun, Xueshuo Xie, Fei Yang, Tao Li

Vision Transformers (ViTs) have exhibited exceptional performance across diverse computer vision tasks, while their substantial parameter size incurs significantly increased memory and computational demands, impeding effective inference on resource-constrained devices. Quantization has emerged as a promising solution to mitigate these challenges, yet existing methods still suffer from significant accuracy loss at low-bit. We attribute this issue to the distinctive distributions of post-LayerNorm and post-GELU activations within ViTs, rendering conventional hardware-friendly quantizers ineffective, particularly in low-bit scenarios. To address this issue, we propose a novel framework called Activation-Distribution-Friendly post-training Quantization for Vision Transformers, ADFQ-ViT. Concretely, we introduce the Per-Patch Outlier-aware Quantizer to tackle irregular outliers in post-LayerNorm activations. This quantizer refines the granularity of the uniform quantizer to a per-patch level while retaining a minimal subset of values exceeding a threshold at full-precision. To handle the non-uniform distributions of post-GELU activations between positive and negative regions, we design the Shift-Log2 Quantizer, which shifts all elements to the positive region and then applies log2 quantization. Moreover, we present the Attention-score enhanced Module-wise Optimization which adjusts the parameters of each quantizer by reconstructing errors to further mitigate quantization error. Extensive experiments demonstrate ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit. Specifically, when quantizing the ViT-B model to 4-bit, we achieve a 10.23% improvement in Top-1 accuracy on the ImageNet dataset.

7/4/2024

DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers

Lianwei Yang, Haisong Gong, Qingyi Gu

Vision transformers (ViTs) have garnered significant attention for their performance in vision tasks, but the high computational cost and significant latency issues have hindered widespread adoption. Post-training quantization (PTQ), a promising method for model compression, still faces accuracy degradation challenges with ViTs. There are two reasons for this: the existing quantization paradigm does not fit the power-law distribution of post-Softmax activations well, and accuracy inevitably decreases after reparameterizing post-LayerNorm activations. We propose a Distribution-Friendly and Outlier-Aware Post-training Quantization method for Vision Transformers, named DopQ-ViT. DopQ-ViT analyzes the inefficiencies of current quantizers and introduces a distribution-friendly Tan Quantizer called TanQ. TanQ focuses more on values near 1, more accurately preserving the power-law distribution of post-Softmax activations, and achieves favorable results. Besides, during the reparameterization of post-LayerNorm activations from channel-wise to layer-wise quantization, the accuracy degradation is mainly due to the significant impact of outliers in the scaling factors. Therefore, DopQ-ViT proposes a method to select Median as the Optimal Scaling Factor, denoted as MOSF, which compensates for the influence of outliers and preserves the performance of the quantization model. DopQ-ViT has been extensively validated and significantly improves the performance of quantization models, especially in low-bit settings.

8/19/2024

👀

PTQ4ViT: Post-training quantization for vision transformers with twin uniform quantization

Zhihang Yuan, Chenhao Xue, Yiqi Chen, Qiang Wu, Guangyu Sun

Quantization is one of the most effective methods to compress neural networks, which has achieved great success on convolutional neural networks (CNNs). Recently, vision transformers have demonstrated great potential in computer vision. However, previous post-training quantization methods performed not well on vision transformer, resulting in more than 1% accuracy drop even in 8-bit quantization. Therefore, we analyze the problems of quantization on vision transformers. We observe the distributions of activation values after softmax and GELU functions are quite different from the Gaussian distribution. We also observe that common quantization metrics, such as MSE and cosine distance, are inaccurate to determine the optimal scaling factor. In this paper, we propose the twin uniform quantization method to reduce the quantization error on these activation values. And we propose to use a Hessian guided metric to evaluate different scaling factors, which improves the accuracy of calibration at a small cost. To enable the fast quantization of vision transformers, we develop an efficient framework, PTQ4ViT. Experiments show the quantized vision transformers achieve near-lossless prediction accuracy (less than 0.5% drop at 8-bit quantization) on the ImageNet classification task.

6/26/2024