Compact Model Training by Low-Rank Projection with Energy Transfer

Read original: arXiv:2204.05566 - Published 8/15/2024 by Kailing Guo, Zhenquan Lin, Canyang Chen, Xiaofen Xing, Fang Liu, Xiangmin Xu

📈

Overview

Low-rank compression is an important technique in traditional machine learning, but has not been as popular in deep learning.
Previous low-rank compression methods compressed pre-trained models, but the optimal solution with a low-rank constraint may be different from the original.
Retraining a low-rank compressed model often leads to significant performance degradation.
Low-rank compression has attracted less attention compared to other methods like pruning in recent years.

Plain English Explanation

Low-rank compression is a way to make machine learning models smaller and more efficient by approximating the original model's weights with a lower-dimensional representation. This technique has been commonly used in traditional machine learning, but hasn't been as widely adopted in the world of deep learning.

The reason for this is that previous low-rank compression methods would start with a well-trained deep learning model, then try to compress it by finding a low-rank approximation of the model's weight matrices. However, this optimal low-rank solution may be quite different from the original unconstrained weights. As a result, retraining the compressed model often leads to a significant drop in performance.

In contrast, other model compression techniques like pruning have become more popular in deep learning in recent years. These methods can often preserve model performance better than the traditional low-rank approaches.

Technical Explanation

In this paper, the authors propose a new training method called "Low-Rank Projection with Energy Transfer" (LRPET) that can train low-rank compressed deep learning models from scratch, without the need for pretraining.

The key ideas are:

Alternating optimization: They alternate between performing stochastic gradient descent training and projecting each weight matrix onto a low-rank manifold.
Energy transfer: The reduction in "energy" (sum of squared singular values) caused by the low-rank projection is compensated by uniformly transferring the "energy" of the pruned singular values to the remaining ones.
BN rectification: In modern deep networks, batch normalization (BN) layers can influence the optimal low-rank approximation of the preceding convolution layers. The authors propose "BN rectification" to address this issue.

By training the low-rank model from scratch and using energy transfer, the authors are able to better utilize the model capacity and avoid the significant performance degradation seen in previous low-rank compression methods.

Critical Analysis

The authors provide a thorough theoretical analysis to support their claims about the benefits of their LRPET training method. However, some potential limitations or areas for further research include:

Evaluating the LRPET method on a wider range of deep learning architectures and tasks beyond the image classification experiments presented.
Investigating the computational and memory efficiency of the LRPET training process compared to other compression techniques like pruning or low-rank decomposition.
Exploring the interaction between low-rank compression and other model optimization techniques, such as model quantization.

Overall, the LRPET method represents an interesting approach to training low-rank compressed deep learning models, and the authors' theoretical analysis provides a solid foundation for further research in this area.

Conclusion

This paper presents a new training method called LRPET that can effectively train low-rank compressed deep learning models from scratch, avoiding the performance degradation seen in previous approaches. By alternating optimization with low-rank projections and using an energy transfer mechanism, LRPET is able to better utilize the model capacity within the low-rank constraint. The authors also introduce BN rectification to address the influence of batch normalization layers on the optimal low-rank approximation.

While low-rank compression has been less popular in deep learning compared to other techniques like pruning, the LRPET method demonstrates the potential for low-rank approaches to achieve competitive performance. Further research exploring the practical benefits and limitations of LRPET across a wider range of deep learning applications could help drive more widespread adoption of low-rank compression in the field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📈

Compact Model Training by Low-Rank Projection with Energy Transfer

Kailing Guo, Zhenquan Lin, Canyang Chen, Xiaofen Xing, Fang Liu, Xiangmin Xu

Low-rankness plays an important role in traditional machine learning, but is not so popular in deep learning. Most previous low-rank network compression methods compress networks by approximating pre-trained models and re-training. However, the optimal solution in the Euclidean space may be quite different from the one with low-rank constraint. A well-pre-trained model is not a good initialization for the model with low-rank constraints. Thus, the performance of a low-rank compressed network degrades significantly. Compared with other network compression methods such as pruning, low-rank methods attract less attention in recent years. In this paper, we devise a new training method, low-rank projection with energy transfer (LRPET), that trains low-rank compressed networks from scratch and achieves competitive performance. We propose to alternately perform stochastic gradient descent training and projection of each weight matrix onto the corresponding low-rank manifold. Compared to re-training on the compact model, this enables full utilization of model capacity since solution space is relaxed back to Euclidean space after projection. The matrix energy (the sum of squares of singular values) reduction caused by projection is compensated by energy transfer. We uniformly transfer the energy of the pruned singular values to the remaining ones. We theoretically show that energy transfer eases the trend of gradient vanishing caused by projection. In modern networks, a batch normalization (BN) layer can be merged into the previous convolution layer for inference, thereby influencing the optimal low-rank approximation of the previous layer. We propose BN rectification to cut off its effect on the optimal low-rank approximation, which further improves the performance.

8/15/2024

🌐

Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition

Xitong Zhang, Ismail R. Alkhouri, Rongrong Wang

Deep Neural Networks (DNNs) have achieved remarkable success in addressing many previously unsolvable tasks. However, the storage and computational requirements associated with DNNs pose a challenge for deploying these trained models on resource-limited devices. Therefore, a plethora of compression and pruning techniques have been proposed in recent years. Low-rank decomposition techniques are among the approaches most utilized to address this problem. Compared to post-training compression, compression-promoted training is still under-explored. In this paper, we present a theoretically-justified novel approach, termed Low-Rank Induced Training (LoRITa), that promotes low-rankness through the composition of linear layers and compresses by using singular value truncation. This is achieved without the need to change the structure at inference time or require constrained and/or additional optimization, other than the standard weight decay regularization. Moreover, LoRITa eliminates the need to (i) initialize with pre-trained models and (ii) specify rank selection prior to training. Our experimental results (i) demonstrate the effectiveness of our approach using MNIST on Fully Connected Networks, CIFAR10 on Vision Transformers, and CIFAR10/100 on Convolutional Neural Networks, and (ii) illustrate that we achieve either competitive or SOTA results when compared to leading structured pruning methods in terms of FLOPs and parameters drop.

5/7/2024

Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications

Yang Li, Changsheng Zhao, Hyungtak Lee, Ernie Chang, Yangyang Shi, Vikas Chandra

Large language models (LLMs) significantly enhance the performance of various applications, but they are computationally intensive and energy-demanding. This makes it challenging to deploy them on devices with limited resources, such as personal computers and mobile/wearable devices, and results in substantial inference costs in resource-rich environments like cloud servers. To extend the use of LLMs, we introduce a low-rank decomposition approach to effectively compress these models, tailored to the requirements of specific applications. We observe that LLMs pretrained on general datasets contain many redundant components not needed for particular applications. Our method focuses on identifying and removing these redundant parts, retaining only the necessary elements for the target applications. Specifically, we represent the weight matrices of LLMs as a linear combination of base components. We then prune the irrelevant bases and enhance the model with new bases beneficial for specific applications. Deep compression results on the Llama 2-7b and -13B models, conducted on target applications including mathematical reasoning and code generation, show that our method significantly reduces model size while maintaining comparable accuracy to state-of-the-art low-rank compression techniques.

5/28/2024

📈

On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee

Chenyang Li, Jihoon Chung, Mengnan Du, Haimin Wang, Xianlian Zhou, Bo Shen

Model compression is a crucial part of deploying neural networks (NNs), especially when the memory and storage of computing devices are limited in many applications. This paper focuses on two model compression techniques: low-rank approximation and weight pruning in neural networks, which are very popular nowadays. However, training NN with low-rank approximation and weight pruning always suffers significant accuracy loss and convergence issues. In this paper, a holistic framework is proposed for model compression from a novel perspective of nonconvex optimization by designing an appropriate objective function. Then, we introduce NN-BCD, a block coordinate descent (BCD) algorithm to solve the nonconvex optimization. One advantage of our algorithm is that an efficient iteration scheme can be derived with closed-form, which is gradient-free. Therefore, our algorithm will not suffer from vanishing/exploding gradient problems. Furthermore, with the Kurdyka-{L}ojasiewicz (K{L}) property of our objective function, we show that our algorithm globally converges to a critical point at the rate of O(1/k), where k denotes the number of iterations. Lastly, extensive experiments with tensor train decomposition and weight pruning demonstrate the efficiency and superior performance of the proposed framework. Our code implementation is available at https://github.com/ChenyangLi-97/NN-BCD

8/16/2024