Compressing neural network by tensor network with exponentially fewer variational parameters

Read original: arXiv:2305.06058 - Published 5/6/2024 by Yong Qing, Ke Li, Peng-Fei Zhou, Shi-Ju Ran

🧠

Overview

The paper proposes a general compression scheme that significantly reduces the variational parameters of neural networks (NNs) by encoding them to deep automatically-differentiable tensor networks (ADTN).
The ADTN representation contains exponentially fewer free parameters compared to the original NN, while maintaining or even improving the model's performance.
The compression method is demonstrated on several widely-recognized NN architectures and datasets, achieving superior compression rates with improved or comparable accuracy.

Plain English Explanation

Neural networks are powerful machine learning models that can tackle challenging tasks, but they often contain a massive number of adjustable parameters. If these parameters are not properly constrained, the neural network can become overly complex, leading to issues like overfitting and high hardware costs.

To address this, the researchers have developed a new compression technique that can significantly reduce the number of parameters in a neural network. Their approach is to represent the network's parameters using a special mathematical structure called a deep automatically-differentiable tensor network (ADTN). This ADTN representation has far fewer free parameters than the original neural network, yet it can still capture the essential patterns and relationships in the data.

The researchers have tested their compression method on several well-known neural network architectures, such as LeNet-5, AlexNet, and VGG-16. They found that they could compress these networks by a significant amount, often reducing the number of parameters by orders of magnitude, while maintaining or even improving the model's performance on standard benchmark datasets like MNIST, CIFAR-10, and CIFAR-100.

This work suggests that tensor networks, a powerful mathematical framework, can be an exceptionally efficient way to represent the parameters of neural networks, potentially leading to more compact and efficient models for a wide range of applications.

Technical Explanation

The paper presents a general compression scheme that significantly reduces the variational parameters of neural networks (NNs) by encoding them to deep automatically-differentiable tensor networks (ADTN). The ADTN representation contains exponentially fewer free parameters compared to the original NN, while maintaining or even improving the model's performance.

The researchers demonstrate the superior compression performance of their scheme on several widely-recognized NN architectures, including FC-2, LeNet-5, AlexNet, ZFNet, and VGG-16, across various datasets (MNIST, CIFAR-10, and CIFAR-100).

For example, the researchers were able to compress two linear layers in VGG-16 with approximately 10^7 parameters to two ADTN's with just 424 parameters, while improving the testing accuracy on CIFAR-10 from 90.17% to 91.74%. This demonstrates the exceptional compressibility of the tensor network representation compared to the commonly-used matrices and multi-way arrays.

Critical Analysis

The paper presents a promising approach for significantly reducing the parameter count of neural networks while maintaining or improving their performance. However, the authors acknowledge that the compression method may not be suitable for all types of neural network architectures and applications. For instance, the paper focuses on relatively moderate-sized networks, and it's unclear how well the compression scheme would scale to larger and more complex models, such as those used in 3D medical imaging or natural language processing.

Additionally, the paper does not provide a detailed analysis of the computational and memory overhead associated with the ADTN representation, which could be an important consideration for real-world deployment. The authors also do not discuss the potential limitations or challenges of their automated tensor network construction algorithm, which is a crucial component of the proposed compression scheme.

Overall, the research presents an interesting and potentially impactful approach to neural network compression, but further investigation is needed to fully understand its capabilities, limitations, and practical implications across a broader range of applications.

Conclusion

The paper introduces a general compression scheme that can significantly reduce the number of parameters in neural networks by encoding them to deep automatically-differentiable tensor networks (ADTN). The ADTN representation contains exponentially fewer free parameters compared to the original neural network, while maintaining or even improving the model's performance.

The researchers have demonstrated the superior compression performance of their approach on several well-known neural network architectures and datasets, achieving substantial reductions in parameter count without compromising (and in some cases, enhancing) the models' accuracy. This work suggests that tensor networks can be an exceptionally efficient mathematical structure for representing the parameters of neural networks, potentially leading to more compact and efficient models for a wide range of applications in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🧠

Compressing neural network by tensor network with exponentially fewer variational parameters

Yong Qing, Ke Li, Peng-Fei Zhou, Shi-Ju Ran

Neural network (NN) designed for challenging machine learning tasks is in general a highly nonlinear mapping that contains massive variational parameters. High complexity of NN, if unbounded or unconstrained, might unpredictably cause severe issues including over-fitting, loss of generalization power, and unbearable cost of hardware. In this work, we propose a general compression scheme that significantly reduces the variational parameters of NN by encoding them to deep automatically-differentiable tensor network (ADTN) that contains exponentially-fewer free parameters. Superior compression performance of our scheme is demonstrated on several widely-recognized NN's (FC-2, LeNet-5, AlextNet, ZFNet and VGG-16) and datasets (MNIST, CIFAR-10 and CIFAR-100). For instance, we compress two linear layers in VGG-16 with approximately $10^{7}$ parameters to two ADTN's with just 424 parameters, where the testing accuracy on CIFAR-10 is improved from $90.17 %$ to $91.74%$. Our work suggests TN as an exceptionally efficient mathematical structure for representing the variational parameters of NN's, which exhibits superior compressibility over the commonly-used matrices and multi-way arrays.

5/6/2024

Variational autoencoder-based neural network model compression

Liang Cheng, Peiyuan Guan, Amir Taherkordi, Lei Liu, Dapeng Lan

Variational Autoencoders (VAEs), as a form of deep generative model, have been widely used in recent years, and shown great great peformance in a number of different domains, including image generation and anomaly detection, etc.. This paper aims to explore neural network model compression method based on VAE. The experiment uses different neural network models for MNIST recognition as compression targets, including Feedforward Neural Network (FNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM). These models are the most basic models in deep learning, and other more complex and advanced models are based on them or inherit their features and evolve. In the experiment, the first step is to train the models mentioned above, each trained model will have different accuracy and number of total parameters. And then the variants of parameters for each model are processed as training data in VAEs separately, and the trained VAEs are tested by the true model parameters. The experimental results show that using the latent space as a representation of the model compression can improve the compression rate compared to some traditional methods such as pruning and quantization, meanwhile the accuracy is not greatly affected using the model parameters reconstructed based on the latent space. In the future, a variety of different large-scale deep learning models will be used more widely, so exploring different ways to save time and space on saving or transferring models will become necessary, and the use of VAE in this paper can provide a basis for these further explorations.

8/28/2024

Tensor network compressibility of convolutional models

Sukhbinder Singh, Saeed S. Jahromi, Roman Orus

Convolutional neural networks (CNNs) are one of the most widely used neural network architectures, showcasing state-of-the-art performance in computer vision tasks. Although larger CNNs generally exhibit higher accuracy, their size can be effectively reduced by ``tensorization'' while maintaining accuracy, namely, replacing the convolution kernels with compact decompositions such as Tucker, Canonical Polyadic decompositions, or quantum-inspired decompositions such as matrix product states, and directly training the factors in the decompositions to bias the learning towards low-rank decompositions. But why doesn't tensorization seem to impact the accuracy adversely? We explore this by assessing how textit{truncating} the convolution kernels of textit{dense} (untensorized) CNNs impact their accuracy. Specifically, we truncated the kernels of (i) a vanilla four-layer CNN and (ii) ResNet-50 pre-trained for image classification on CIFAR-10 and CIFAR-100 datasets. We found that kernels (especially those inside deeper layers) could often be truncated along several cuts resulting in significant loss in kernel norm but not in classification accuracy. This suggests that such ``correlation compression'' (underlying tensorization) is an intrinsic feature of how information is encoded in dense CNNs. We also found that aggressively truncated models could often recover the pre-truncation accuracy after only a few epochs of re-training, suggesting that compressing the internal correlations of convolution layers does not often transport the model to a worse minimum. Our results can be applied to tensorize and compress CNN models more effectively.

8/20/2024

MCNC: Manifold Constrained Network Compression

Chayne Thrash, Ali Abbasi, Parsa Nooralinejad, Soroush Abbasi Koohpayegani, Reed Andreas, Hamed Pirsiavash, Soheil Kolouri

The outstanding performance of large foundational models across diverse tasks-from computer vision to speech and natural language processing-has significantly increased their demand. However, storing and transmitting these models pose significant challenges due to their massive size (e.g., 350GB for GPT-3). Recent literature has focused on compressing the original weights or reducing the number of parameters required for fine-tuning these models. These compression methods typically involve constraining the parameter space, for example, through low-rank reparametrization (e.g., LoRA) or quantization (e.g., QLoRA) during model training. In this paper, we present MCNC as a novel model compression method that constrains the parameter space to low-dimensional pre-defined and frozen nonlinear manifolds, which effectively cover this space. Given the prevalence of good solutions in over-parameterized deep neural networks, we show that by constraining the parameter space to our proposed manifold, we can identify high-quality solutions while achieving unprecedented compression rates across a wide variety of tasks. Through extensive experiments in computer vision and natural language processing tasks, we demonstrate that our method, MCNC, significantly outperforms state-of-the-art baselines in terms of compression, accuracy, and/or model reconstruction time.

6/28/2024