Extracting Memorized Training Data via Decomposition

Read original: arXiv:2409.12367 - Published 9/20/2024 by Ellen Su, Anu Vellore, Amy Chang, Raffaele Mura, Blaine Nelson, Paul Kassianik, Amin Karbasi

Extracting Memorized Training Data via Decomposition

Overview

The paper explores techniques for extracting memorized training data from large language models (LLMs)
This is an important issue as LLMs can inadvertently memorize and leak private information from their training data
The proposed approach involves decomposing the model into multiple components and analyzing the outputs of these components to identify and extract memorized data

Plain English Explanation

The paper focuses on the important problem of large language models potentially memorizing and leaking private information from their training data. To address this, the researchers developed a technique called decomposition that breaks down the language model into multiple components. By analyzing the outputs of these individual components, they can identify and extract any private or sensitive information that may have been memorized by the model during training. This approach allows them to preserve the knowledge in the language model while mitigating the risks of leaking private data.

Technical Explanation

The key technical innovation in this paper is the decomposition approach, where the language model is broken down into multiple subcomponents. The researchers hypothesized that memorized training data would be concentrated in specific components of the model, rather than distributed evenly across the entire model.

By analyzing the outputs of these individual components, they were able to identify and extract memorized training data more precisely than previous approaches. This decomposition technique allows the model's knowledge to be preserved while mitigating the risks of data leakage.

The experiments demonstrated the effectiveness of this approach on a variety of language models and datasets, showing that it can reliably identify and extract memorized training data.

Critical Analysis

The paper provides a thorough and rigorous technical approach to addressing the important issue of protecting the privacy of training data used to build large language models. The decomposition technique is a novel and promising solution that could have significant implications for the responsible development of LLMs.

However, the paper does acknowledge some limitations. The extraction process is not perfect, and there may still be some residual memorized data left in the model even after the proposed mitigation. Additionally, the decomposition approach requires access to the internal structure of the language model, which may not always be feasible in real-world applications.

Further research could explore ways to make the extraction process more robust and to apply similar techniques in a more model-agnostic manner. Exploring the tradeoffs between data privacy and model performance would also be an important area for future work.

Conclusion

This paper presents a novel decomposition-based approach for extracting memorized training data from large language models, which is a crucial step in preserving the knowledge of these models while mitigating the risks of data leakage. The proposed technique demonstrates promising results and could have significant implications for the responsible development of large language models in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

New!Extracting Memorized Training Data via Decomposition

Ellen Su, Anu Vellore, Amy Chang, Raffaele Mura, Blaine Nelson, Paul Kassianik, Amin Karbasi

The widespread use of Large Language Models (LLMs) in society creates new information security challenges for developers, organizations, and end-users alike. LLMs are trained on large volumes of data, and their susceptibility to reveal the exact contents of the source training datasets poses security and safety risks. Although current alignment procedures restrict common risky behaviors, they do not completely prevent LLMs from leaking data. Prior work demonstrated that LLMs may be tricked into divulging training data by using out-of-distribution queries or adversarial techniques. In this paper, we demonstrate a simple, query-based decompositional method to extract news articles from two frontier LLMs. We use instruction decomposition techniques to incrementally extract fragments of training data. Out of 3723 New York Times articles, we extract at least one verbatim sentence from 73 articles, and over 20% of verbatim sentences from 6 articles. Our analysis demonstrates that this method successfully induces the LLM to generate texts that are reliable reproductions of news articles, meaning that they likely originate from the source training dataset. This method is simple, generalizable, and does not fine-tune or change the production model. If replicable at scale, this training data extraction methodology could expose new LLM security and safety vulnerabilities, including privacy risks and unauthorized data leaks. These implications require careful consideration from model development to its end-use.

9/20/2024

💬

Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey

Victoria Smith, Ali Shahin Shamsabadi, Carolyn Ashurst, Adrian Weller

Large Language Models (LLMs) have shown greatly enhanced performance in recent years, attributed to increased size and extensive training data. This advancement has led to widespread interest and adoption across industries and the public. However, training data memorization in Machine Learning models scales with model size, particularly concerning for LLMs. Memorized text sequences have the potential to be directly leaked from LLMs, posing a serious threat to data privacy. Various techniques have been developed to attack LLMs and extract their training data. As these models continue to grow, this issue becomes increasingly critical. To help researchers and policymakers understand the state of knowledge around privacy attacks and mitigations, including where more work is needed, we present the first SoK on data privacy for LLMs. We (i) identify a taxonomy of salient dimensions where attacks differ on LLMs, (ii) systematize existing attacks, using our taxonomy of dimensions to highlight key trends, (iii) survey existing mitigation strategies, highlighting their strengths and limitations, and (iv) identify key gaps, demonstrating open problems and areas for concern.

6/19/2024

🏋️

Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models

Jeffrey G. Wang, Jason Wang, Marvin Li, Seth Neel

In this paper we develop state-of-the-art privacy attacks against Large Language Models (LLMs), where an adversary with some access to the model tries to learn something about the underlying training data. Our headline results are new membership inference attacks (MIAs) against pretrained LLMs that perform hundreds of times better than baseline attacks, and a pipeline showing that over 50% (!) of the fine-tuning dataset can be extracted from a fine-tuned LLM in natural settings. We consider varying degrees of access to the underlying model, pretraining and fine-tuning data, and both MIAs and training data extraction. For pretraining data, we propose two new MIAs: a supervised neural network classifier that predicts training data membership on the basis of (dimensionality-reduced) model gradients, as well as a variant of this attack that only requires logit access to the model by leveraging recent model-stealing work on LLMs. To our knowledge this is the first MIA that explicitly incorporates model-stealing information. Both attacks outperform existing black-box baselines, and our supervised attack closes the gap between MIA attack success against LLMs and the strongest known attacks for other machine learning models. In fine-tuning, we find that a simple attack based on the ratio of the loss between the base and fine-tuned models is able to achieve near-perfect MIA performance; we then leverage our MIA to extract a large fraction of the fine-tuning dataset from fine-tuned Pythia and Llama models. Our code is available at github.com/safr-ai-lab/pandora-llm.

7/16/2024

Decompose, Enrich, and Extract! Schema-aware Event Extraction using LLMs

Fatemeh Shiri, Van Nguyen, Farhad Moghimifar, John Yoo, Gholamreza Haffari, Yuan-Fang Li

Large Language Models (LLMs) demonstrate significant capabilities in processing natural language data, promising efficient knowledge extraction from diverse textual sources to enhance situational awareness and support decision-making. However, concerns arise due to their susceptibility to hallucination, resulting in contextually inaccurate content. This work focuses on harnessing LLMs for automated Event Extraction, introducing a new method to address hallucination by decomposing the task into Event Detection and Event Argument Extraction. Moreover, the proposed method integrates dynamic schema-aware augmented retrieval examples into prompts tailored for each specific inquiry, thereby extending and adapting advanced prompting techniques such as Retrieval-Augmented Generation. Evaluation findings on prominent event extraction benchmarks and results from a synthesized benchmark illustrate the method's superior performance compared to baseline approaches.

6/4/2024