Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

2405.20541

Published 6/3/2024 by Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L. Leavitt, Mansheej Paul

Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

Abstract

In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the perplexity of a larger model can yield high-quality data, we investigate whether smaller models can be used for perplexity-based pruning and how pruning is affected by the domain composition of the data being pruned. We demonstrate that for multiple dataset compositions, perplexity-based pruning of pretraining data can emph{significantly} improve downstream task performance: pruning based on perplexities computed with a 125 million parameter model improves the average performance on downstream tasks of a 3 billion parameter model by up to 2.04 and achieves up to a $1.45times$ reduction in pretraining steps to reach commensurate baseline performance. Furthermore, we demonstrate that such perplexity-based data pruning also yields downstream performance gains in the over-trained and data-constrained regimes.

Create account to get full access

Overview

This paper explores a new approach for pruning large language models using perplexity-based data pruning with small reference models.
The authors propose a method to efficiently prune large language models by leveraging small reference models to estimate the perplexity of the training data, and then selectively retaining the most informative data points.
This technique can help reduce the computational and memory requirements of large language models without significantly impacting their performance.

Plain English Explanation

The paper focuses on a challenge faced when training large language models - the massive amount of data required can make the training process computationally expensive and memory-intensive. The authors present a potential solution, which is to selectively prune the training data in a way that retains the most important information.

The key idea is to use a small reference model to estimate the perplexity of each data point in the training set. Perplexity is a measure of how well the model can predict the next word in a sequence. Data points with higher perplexity are likely to be less informative for the model, so the authors propose pruning those points and only keeping the most informative data.

This approach has several potential benefits. By reducing the size of the training set, it can make the overall training process more efficient, requiring less compute power and memory. And since the most relevant data is retained, the performance of the final language model may not be significantly impacted.

The authors demonstrate the effectiveness of this technique through experiments on several language modeling tasks. They show that their perplexity-based pruning method can substantially reduce the size of the training set while maintaining the performance of the final language model.

Technical Explanation

The paper proposes a novel approach for pruning large language models using perplexity-based data pruning with small reference models. The key idea is to leverage a small language model as a reference to estimate the perplexity of each data point in the training set, and then selectively retain the most informative data points.

The authors first train a small reference model, which is significantly smaller and less computationally expensive than the target large language model. They then use this reference model to compute the perplexity of each data point in the training set. Data points with higher perplexity are likely to be less informative for the target model, so the authors prune these points and only retain the most informative data.

This approach has several advantages over existing pruning techniques. First, it is computationally efficient, as the reference model is much smaller than the target model. Second, it is adaptive, as the pruning is based on the specific characteristics of the training data, rather than being a one-size-fits-all approach. And third, it can be applied iteratively, with multiple rounds of pruning to further refine the training set.

The authors evaluate their perplexity-based pruning method on several language modeling tasks, including machine translation, text generation, and question answering. Their results show that this technique can substantially reduce the size of the training set, in some cases by over 90%, while maintaining the performance of the final language model.

Critical Analysis

The paper presents a promising approach for efficiently pruning large language models, but it also raises some potential concerns and areas for further research.

One key limitation is that the performance of the pruning method relies heavily on the quality of the small reference model. If the reference model is not well-trained or does not capture the relevant characteristics of the data, the perplexity estimates may be inaccurate, leading to suboptimal pruning decisions. The authors acknowledge this and suggest exploring ways to improve the reference model selection process.

Additionally, the paper does not address the potential for data bias introduced by the pruning process. By selectively retaining the most "informative" data points, the authors may be inadvertently amplifying certain biases present in the original training set. This could lead to fairness and robustness issues in the final language model, which is an important consideration for real-world applications.

Another potential area for further research is the generalizability of this approach. The authors demonstrate its effectiveness on a few specific language modeling tasks, but it remains to be seen how well it would transfer to other domains or tasks, such as multi-modal or few-shot learning scenarios.

Overall, the paper presents a novel and promising approach to the challenging problem of pruning large language models. While it has some limitations, the authors' work opens up new avenues for research in efficient model optimization and data selection strategies.

Conclusion

This paper introduces a novel approach for pruning large language models using perplexity-based data pruning with small reference models. The key idea is to leverage a small, computationally efficient language model to estimate the perplexity of each data point in the training set, and then selectively retain the most informative data points.

The authors demonstrate the effectiveness of this technique through experiments on several language modeling tasks, showing that it can substantially reduce the size of the training set while maintaining the performance of the final language model. This has important implications for making the training of large language models more efficient and accessible, potentially expanding their applications and impact in the field of natural language processing.

While the paper highlights some limitations and areas for further research, the authors' work represents an important step forward in the ongoing pursuit of more efficient and scalable deep learning models.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Large Language Model Pruning

Hanjuan Huang (Dept. of Computer Science and Information Engineering National Taiwan University of Science and Technology, Taipei, Taiwan), Hao-Jia Song (Dept. of Computer Science and Information Engineering National Taiwan University of Science and Technology, Taipei, Taiwan), Hsing-Kuo Pao (Dept. of Computer Science and Information Engineering National Taiwan University of Science and Technology, Taipei, Taiwan)

We surely enjoy the larger the better models for their superior performance in the last couple of years when both the hardware and software support the birth of such extremely huge models. The applied fields include text mining and others. In particular, the success of LLMs on text understanding and text generation draws attention from researchers who have worked on NLP and related areas for years or even decades. On the side, LLMs may suffer from problems like model overfitting, hallucination, and device limitation to name a few. In this work, we suggest a model pruning technique specifically focused on LLMs. The proposed methodology emphasizes the explainability of deep learning models. By having the theoretical foundation, we obtain a trustworthy deep model so that huge models with a massive number of model parameters become not quite necessary. A mutual information-based estimation is adopted to find neurons with redundancy to eliminate. Moreover, an estimator with well-tuned parameters helps to find precise estimation to guide the pruning procedure. At the same time, we also explore the difference between pruning on large-scale models vs. pruning on small-scale models. The choice of pruning criteria is sensitive in small models but not for large-scale models. It is a novel finding through this work. Overall, we demonstrate the superiority of the proposed model to the state-of-the-art models.

6/4/2024

cs.CL cs.AI cs.LG

🛠️

ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models

Xiang Meng, Kayhan Behdin, Haoyue Wang, Rahul Mazumder

The impressive performance of Large Language Models (LLMs) across various natural language processing tasks comes at the cost of vast computational resources and storage requirements. One-shot pruning techniques offer a way to alleviate these burdens by removing redundant weights without the need for retraining. Yet, the massive scale of LLMs often forces current pruning approaches to rely on heuristics instead of optimization-based techniques, potentially resulting in suboptimal compression. In this paper, we introduce ALPS, an optimization-based framework that tackles the pruning problem using the operator splitting technique and a preconditioned conjugate gradient-based post-processing step. Our approach incorporates novel techniques to accelerate and theoretically guarantee convergence while leveraging vectorization and GPU parallelism for efficiency. ALPS substantially outperforms state-of-the-art methods in terms of the pruning objective and perplexity reduction, particularly for highly sparse models. On the OPT-30B model with 70% sparsity, ALPS achieves a 13% reduction in test perplexity on the WikiText dataset and a 19% improvement in zero-shot benchmark performance compared to existing methods.

6/13/2024

cs.LG

Efficient Pruning of Large Language Model with Adaptive Estimation Fusion

Jun Liu, Chao Wu, Changdi Yang, Hao Tang, Zhenglun Kong, Geng Yuan, Wei Niu, Dong Huang, Yanzhi Wang

Large language models (LLMs) have become crucial for many generative downstream tasks, leading to an inevitable trend and significant challenge to deploy them efficiently on resource-constrained devices. Structured pruning is a widely used method to address this challenge. However, when dealing with the complex structure of the multiple decoder layers, general methods often employ common estimation approaches for pruning. These approaches lead to a decline in accuracy for specific downstream tasks. In this paper, we introduce a simple yet efficient method that adaptively models the importance of each substructure. Meanwhile, it can adaptively fuse coarse-grained and finegrained estimations based on the results from complex and multilayer structures. All aspects of our design seamlessly integrate into the endto-end pruning framework. Our experimental results, compared with state-of-the-art methods on mainstream datasets, demonstrate average accuracy improvements of 1.1%, 1.02%, 2.0%, and 1.2% for LLaMa-7B,Vicuna-7B, Baichuan-7B, and Bloom-7b1, respectively.

5/16/2024

cs.CL cs.AI cs.LG

💬

One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models

Hang Shao, Bei Liu, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian

Various Large Language Models~(LLMs) from the Generative Pretrained Transformer(GPT) family have achieved outstanding performances in a wide range of text generation tasks. However, the enormous model sizes have hindered their practical use in real-world applications due to high inference latency. Therefore, improving the efficiencies of LLMs through quantization, pruning, and other means has been a key issue in LLM studies. In this work, we propose a method based on Hessian sensitivity-aware mixed sparsity pruning to prune LLMs to at least 50% sparsity without the need of any retraining. It allocates sparsity adaptively based on sensitivity, allowing us to reduce pruning-induced error while maintaining the overall sparsity level. The advantages of the proposed method exhibit even more when the sparsity is extremely high. Furthermore, our method is compatible with quantization, enabling further compression of LLMs. We have released the available code.

4/24/2024

cs.CL cs.AI