Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection

Read original: arXiv:2306.07215 - Published 8/21/2024 by Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng

Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection

Overview

This paper presents an efficient quantization-aware training (QAT) approach with adaptive coreset selection.
QAT aims to train deep neural networks that can be efficiently deployed on resource-constrained devices by quantizing model parameters to low-precision.
The key innovation is an adaptive coreset selection technique that selects a small subset of the training data to optimize the quantization-aware training process.

Plain English Explanation

The researchers have developed a new way to train neural networks that can run efficiently on devices with limited computing power, like smartphones or edge devices.

The main idea is to quantize the neural network's parameters - this means representing them using fewer bits, which reduces the memory and computation required. However, this quantization process can negatively impact the network's performance.

To address this, the researchers use an adaptive coreset selection technique. This selects a small, representative subset of the training data that is most important for the quantization process. By focusing the training on this subset, they can optimize the quantized network's performance without needing to use the full training dataset.

This efficient quantization-aware training approach allows neural networks to be deployed on resource-constrained devices while maintaining high accuracy - a key requirement for many real-world applications like mobile apps or edge computing.

Technical Explanation

The paper proposes an Adaptive Coreset Selection (ACS) technique to improve the efficiency of Quantization-Aware Training (QAT) for deep neural networks.

QAT aims to train models that can be efficiently deployed on resource-constrained devices by quantizing model parameters to low-precision representations. However, the quantization process can degrade model performance. The key innovation in this work is the use of an adaptive coreset selection method to address this challenge.

The ACS approach selects a small subset of the training data that is most important for the quantization process. This "coreset" is used to optimize the quantized network, rather than using the full training set. The coreset is adaptively updated during training to ensure it remains representative and informative.

Experiments show that this ACS-QAT approach outperforms standard QAT techniques across a range of computer vision and natural language processing tasks. It achieves higher accuracy for the same computational budget, enabling efficient deployment of high-performing models on edge devices.

The paper also analyzes the properties of the selected coresets, demonstrating that they capture diverse and informative samples from the original training distribution.

Critical Analysis

The paper makes a strong contribution by developing an effective technique to improve the efficiency of quantization-aware training for deep neural networks. The adaptive coreset selection approach is a clever way to focus the training process on the most relevant data, avoiding the need to process the full training set.

One potential limitation is that the coreset selection process adds some additional computational overhead during training. The authors do not provide a detailed analysis of this overhead and how it scales with dataset size. It would be helpful to understand the tradeoffs involved and the practical limitations of the approach.

Additionally, the experiments are conducted on relatively standard benchmark tasks. It would be valuable to see how the ACS-QAT approach performs on more complex, real-world applications where the benefits of efficient deployment may be more substantial.

Overall, this is a well-designed and impactful piece of research that advances the state-of-the-art in quantization-aware training. The adaptive coreset selection technique is a clever and effective solution to a significant challenge in deploying high-performing deep learning models on resource-constrained hardware.

Conclusion

This paper presents an efficient quantization-aware training approach that uses adaptive coreset selection to optimize the training process for low-precision neural network deployment. By focusing the training on a small, representative subset of the data, the method can achieve higher accuracy compared to standard QAT techniques, while still enabling efficient model execution on edge devices.

The adaptive coreset selection is a key innovation that addresses a major challenge in quantization-aware training. This work demonstrates the potential of targeted data selection to improve the efficiency of deep learning model optimization and deployment, with important implications for a wide range of real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection

Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng

Quantization-aware training (QAT) is a representative model compression method to reduce redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long training time and high energy costs. In addition, the potential label noise in the training data undermines the robustness of QAT. We propose two metrics based on analysis of loss and gradient of quantized weights: error vector score and disagreement score, to quantify the importance of each sample during training. Guided by these two metrics, we proposed a quantization-aware Adaptive Coreset Selection (ACS) method to select the data for the current training epoch. We evaluate our method on various networks (ResNet-18, MobileNetV2, RetinaNet), datasets(CIFAR-10, CIFAR-100, ImageNet-1K, COCO), and under different quantization settings. Specifically, our method can achieve an accuracy of 68.39% of 4-bit quantized ResNet-18 on the ImageNet-1K dataset with only a 10% subset, which has an absolute gain of 4.24% compared to the baseline. Our method can also improve the robustness of QAT by removing noisy samples in the training set.

8/21/2024

🏋️

AdaQAT: Adaptive Bit-Width Quantization-Aware Training

C'edric Gernigon (TARAN), Silviu-Ioan Filip (TARAN), Olivier Sentieys (TARAN), Cl'ement Coggiola (CNES), Mickael Bruno (CNES)

Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging. In this work, we present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimizes weight and activation signal bit-widths during training for more efficient DNN inference. We use relaxed real-valued bit-widths that are updated using a gradient descent rule, but are otherwise discretized for all quantization operations. The result is a simple and flexible QAT approach for mixed-precision uniform quantization problems. Compared to other methods that are generally designed to be run on a pretrained network, AdaQAT works well in both training from scratch and fine-tuning scenarios.Initial results on the CIFAR-10 and ImageNet datasets using ResNet20 and ResNet18 models, respectively, indicate that our method is competitive with other state-of-the-art mixed-precision quantization approaches.

4/29/2024

Dataset Quantization with Active Learning based Adaptive Sampling

Zhenghao Zhao, Yuzhang Shang, Junyi Wu, Yan Yan

Deep learning has made remarkable progress recently, largely due to the availability of large, well-labeled datasets. However, the training on such datasets elevates costs and computational demands. To address this, various techniques like coreset selection, dataset distillation, and dataset quantization have been explored in the literature. Unlike traditional techniques that depend on uniform sample distributions across different classes, our research demonstrates that maintaining performance is feasible even with uneven distributions. We find that for certain classes, the variation in sample quantity has a minimal impact on performance. Inspired by this observation, an intuitive idea is to reduce the number of samples for stable classes and increase the number of samples for sensitive classes to achieve a better performance with the same sampling ratio. Then the question arises: how can we adaptively select samples from a dataset to achieve optimal performance? In this paper, we propose a novel active learning based adaptive sampling strategy, Dataset Quantization with Active Learning based Adaptive Sampling (DQAS), to optimize the sample selection. In addition, we introduce a novel pipeline for dataset quantization, utilizing feature space from the final stage of dataset quantization to generate more precise dataset bins. Our comprehensive evaluations on the multiple datasets show that our approach outperforms the state-of-the-art dataset compression methods.

7/11/2024

Low-Rank Quantization-Aware Training for LLMs

Yelysei Bondarenko, Riccardo Del Chiaro, Markus Nagel

Large language models (LLMs) are omnipresent, however their practical deployment is challenging due to their ever increasing computational and memory demands. Quantization is one of the most effective ways to make them more compute and memory efficient. Quantization-aware training (QAT) methods, generally produce the best quantized performance, however it comes at the cost of potentially long training time and excessive memory usage, making it impractical when applying for LLMs. Inspired by parameter-efficient fine-tuning (PEFT) and low-rank adaptation (LoRA) literature, we propose LR-QAT -- a lightweight and memory-efficient QAT algorithm for LLMs. LR-QAT employs several components to save memory without sacrificing predictive performance: (a) low-rank auxiliary weights that are aware of the quantization grid; (b) a downcasting operator using fixed-point or double-packed integers and (c) checkpointing. Unlike most related work, our method (i) is inference-efficient, leading to no additional overhead compared to traditional PTQ; (ii) can be seen as a general extended pretraining framework, meaning that the resulting model can still be utilized for any downstream task afterwards; (iii) can be applied across a wide range of quantization settings, such as different choices quantization granularity, activation quantization, and seamlessly combined with many PTQ techniques. We apply LR-QAT to LLaMA-1/2/3 and Mistral model families and validate its effectiveness on several downstream tasks. Our method outperforms common post-training quantization (PTQ) approaches and reaches the same model performance as full-model QAT at the fraction of its memory usage. Specifically, we can train a 7B LLM on a single consumer grade GPU with 24GB of memory. Our source code is available at https://github.com/qualcomm-ai-research/LR-QAT

9/4/2024