DAGER: Exact Gradient Inversion for Large Language Models

Read original: arXiv:2405.15586 - Published 5/27/2024 by Ivo Petrov, Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Muller, Martin Vechev

💬

Overview

Federated learning allows for collaborative training without sharing private client data, but prior work has shown that the data can be recovered using gradient inversion attacks.
While these attacks work well on images, they have limitations in the text domain and can only approximately reconstruct small batches and short input sequences.
This paper proposes DAGER, the first algorithm to exactly recover whole batches of input text.

Plain English Explanation

Federated learning is a way for multiple computers or devices to work together on a machine learning task without having to share their private data. Instead of sending the actual data, the computers send only the changes, or "gradients," that they want to make to the machine learning model. The server then combines these gradients to update the model.

However, prior research has shown that the server can actually use these gradients to figure out what the original data was. This is called a "gradient inversion attack." While these attacks work well on things like images, they have limitations when it comes to text data.

In this paper, the researchers propose a new algorithm called DAGER that can exactly recover whole batches of input text, even for large language models. DAGER takes advantage of the way self-attention layers and word embeddings work to efficiently search for the original text that matches the gradients.

Technical Explanation

The key innovation in DAGER is leveraging the low-rank structure of self-attention layer gradients and the discrete nature of token embeddings to efficiently check if a given token sequence is part of the client data. This allows DAGER to exactly recover full batches of text, rather than just approximate reconstructions of small fragments.

The paper describes two versions of DAGER - one that uses an exhaustive heuristic search for encoder-based architectures, and one that uses a greedy approach for decoder-based architectures like large language models (LLMs). The researchers provide an efficient GPU implementation of DAGER and show that it outperforms prior attacks in terms of speed (20x faster at the same batch size), scalability (able to handle 10x larger batches), and reconstruction quality (ROUGE-1/2 > 0.99).

The key technical insights come from exploiting the low-rank structure of self-attention gradients and the discrete nature of token embeddings to efficiently search the space of possible text sequences.

Critical Analysis

While DAGER represents a significant advance in gradient inversion attacks on text data, the paper acknowledges some important limitations and areas for further research. For example, the attacks are demonstrated in an "honest-but-curious" setting, where the server follows the protocol but tries to recover the data. More adversarial threat models, where the server actively tries to subvert the protocol, are an important area for future work.

Additionally, the paper only considers relatively short input sequences and small batch sizes, compared to the scale of real-world language models and datasets. Dealing with this uncertainty and scaling up the attacks will be crucial for understanding the real-world implications of gradient leakage in federated learning.

Overall, this paper makes an important contribution by demonstrating that even text data, which was believed to be more resistant to gradient inversion, can be vulnerable to attacks under certain conditions. This highlights the need for continued research into secure federated learning protocols that can protect against these types of threats.

Conclusion

This paper introduces DAGER, the first algorithm capable of exactly recovering whole batches of input text from the gradients shared in a federated learning setting. By exploiting the low-rank structure of self-attention gradients and the discrete nature of token embeddings, DAGER outperforms prior attacks in terms of speed, scalability, and reconstruction quality.

While DAGER represents a significant advance, the paper also acknowledges important limitations and areas for future research, such as exploring more adversarial threat models and scaling up the attacks to handle larger language models and datasets. Overall, this work highlights the ongoing challenge of protecting the privacy of user data in federated learning systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

DAGER: Exact Gradient Inversion for Large Language Models

Ivo Petrov, Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Muller, Martin Vechev

Federated learning works by aggregating locally computed gradients from multiple clients, thus enabling collaborative training without sharing private client data. However, prior work has shown that the data can actually be recovered by the server using so-called gradient inversion attacks. While these attacks perform well when applied on images, they are limited in the text domain and only permit approximate reconstruction of small batches and short input sequences. In this work, we propose DAGER, the first algorithm to recover whole batches of input text exactly. DAGER leverages the low-rank structure of self-attention layer gradients and the discrete nature of token embeddings to efficiently check if a given token sequence is part of the client data. We use this check to exactly recover full batches in the honest-but-curious setting without any prior on the data for both encoder- and decoder-based architectures using exhaustive heuristic search and a greedy approach, respectively. We provide an efficient GPU implementation of DAGER and show experimentally that it recovers full batches of size up to 128 on large language models (LLMs), beating prior attacks in speed (20x at same batch size), scalability (10x larger batches), and reconstruction quality (ROUGE-1/2 > 0.99).

5/27/2024

SPEAR:Exact Gradient Inversion of Batches in Federated Learning

Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Muller, Martin Vechev

Federated learning is a framework for collaborative machine learning where clients only share gradient updates and not their private data with a server. However, it was recently shown that gradient inversion attacks can reconstruct this data from the shared gradients. In the important honest-but-curious setting, existing attacks enable exact reconstruction only for a batch size of $b=1$, with larger batches permitting only approximate reconstruction. In this work, we propose SPEAR, the first algorithm reconstructing whole batches with $b >1$ exactly. SPEAR combines insights into the explicit low-rank structure of gradients with a sampling-based algorithm. Crucially, we leverage ReLU-induced gradient sparsity to precisely filter out large numbers of incorrect samples, making a final reconstruction step tractable. We provide an efficient GPU implementation for fully connected networks and show that it recovers high-dimensional ImageNet inputs in batches of up to $b lesssim 25$ exactly while scaling to large networks. Finally, we show theoretically that much larger batches can be reconstructed with high probability given exponential time.

6/4/2024

Federated Learning under Attack: Improving Gradient Inversion for Batch of Images

Luiz Leite, Yuri Santo, Bruno L. Dalmazo, Andr'e Riker

Federated Learning (FL) has emerged as a machine learning approach able to preserve the privacy of user's data. Applying FL, clients train machine learning models on a local dataset and a central server aggregates the learned parameters coming from the clients, training a global machine learning model without sharing user's data. However, the state-of-the-art shows several approaches to promote attacks on FL systems. For instance, inverting or leaking gradient attacks can find, with high precision, the local dataset used during the training phase of the FL. This paper presents an approach, called Deep Leakage from Gradients with Feedback Blending (DLG-FB), which is able to improve the inverting gradient attack, considering the spatial correlation that typically exists in batches of images. The performed evaluation shows an improvement of 19.18% and 48,82% in terms of attack success rate and the number of iterations per attacked image, respectively.

9/27/2024

Corpus Poisoning via Approximate Greedy Gradient Descent

Jinyan Su, John X. Morris, Preslav Nakov, Claire Cardie

Dense retrievers are widely used in information retrieval and have also been successfully extended to other knowledge intensive areas such as language models, e.g., Retrieval-Augmented Generation (RAG) systems. Unfortunately, they have recently been shown to be vulnerable to corpus poisoning attacks in which a malicious user injects a small fraction of adversarial passages into the retrieval corpus to trick the system into returning these passages among the top-ranked results for a broad set of user queries. Further study is needed to understand the extent to which these attacks could limit the deployment of dense retrievers in real-world applications. In this work, we propose Approximate Greedy Gradient Descent (AGGD), a new attack on dense retrieval systems based on the widely used HotFlip method for efficiently generating adversarial passages. We demonstrate that AGGD can select a higher quality set of token-level perturbations than HotFlip by replacing its random token sampling with a more structured search. Experimentally, we show that our method achieves a high attack success rate on several datasets and using several retrievers, and can generalize to unseen queries and new domains. Notably, our method is extremely effective in attacking the ANCE retrieval model, achieving attack success rates that are 17.6% and 13.37% higher on the NQ and MS MARCO datasets, respectively, compared to HotFlip. Additionally, we demonstrate AGGD's potential to replace HotFlip in other adversarial attacks, such as knowledge poisoning of RAG systems.footnote{Code can be find in url{https://github.com/JinyanSu1/AGGD}}

6/10/2024