FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

Read original: arXiv:2310.16152 - Published 5/28/2024 by Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Kang Gu, Najrin Sultana, Shagufta Mehnaz

💬

Overview

Federated learning (FL) is a key component in various language modeling applications, but the privacy of the data used to train these models is a concern.
Existing attacks aim to extract data from federated language models, regardless of how sensitive or personal the data is.
This paper introduces two novel findings about leaking privacy-sensitive user data from federated large language models.

Plain English Explanation

Federated learning (https://aimodels.fyi/papers/arxiv/federated-learning-privacy-attacks-defenses-applications-policy) is a way of training machine learning models using data from many different sources, without the need to share the raw data itself. This is particularly useful for applications like machine translation, next-word prediction, and medical record analysis, where the data may contain sensitive information like healthcare records, phone numbers, or login credentials.

Although federated learning helps protect privacy by keeping the raw data private, the researchers in this paper found that the model itself can still leak sensitive information. Specifically, they discovered that the intermediate model snapshots during the training process can reveal more private data than the final trained model. They also identified that an attacker can target specific parts of the model that are responsible for memorizing the sensitive training data, making it easier to extract that information.

The researchers showed how a malicious participant in the federated learning process can leak the private data of other users, without needing any cooperation from the central server. Their best-performing attack method was able to improve the recall of identifying users in the training data by 29% and reconstruct up to 71% of the private data, outperforming existing attacks that made stronger assumptions about the attacker's capabilities.

Technical Explanation

The paper introduces two key findings regarding the privacy leakage of federated language models:

Model snapshots from the intermediate rounds of federated learning can cause greater privacy leakage than the final trained model. This is because the intermediate models may retain more information about the sensitive training data (https://aimodels.fyi/papers/arxiv/can-public-large-language-models-help-private).
Privacy leakage can be aggravated by tampering with a model's selective weights that are specifically responsible for memorizing the sensitive training data. The researchers show how a malicious client can leak the privacy-sensitive data of other users in the federated learning process, even without any cooperation from the server (https://aimodels.fyi/papers/arxiv/efficiency-privacy-attacks-federated-learning, https://aimodels.fyi/papers/arxiv/local-model-reconstruction-attacks-federated-learning-their).

The researchers evaluate their attack methods on federated language models and find that their best-performing approach improves the membership inference recall by 29% and achieves up to 71% private data reconstruction, outperforming existing attacks (https://aimodels.fyi/papers/arxiv/sok-gradient-leakage-federated-learning).

Critical Analysis

The paper highlights significant privacy concerns with federated learning, particularly in the context of language models that may contain sensitive user data. While the researchers demonstrate effective attack methods, it is important to note that the specific attack scenarios and assumptions may not always align with real-world deployment scenarios.

Additionally, the paper does not provide in-depth discussion of potential mitigation strategies or defenses against these types of privacy attacks. Further research is needed to develop robust privacy-preserving techniques for federated learning systems (https://aimodels.fyi/papers/arxiv/federated-learning-privacy-attacks-defenses-applications-policy).

Conclusion

This paper makes important contributions to the understanding of privacy risks in federated learning, particularly for language modeling applications. The findings suggest that the intermediate model snapshots and targeted weight manipulation can lead to significant privacy leakage, even when the final trained model may appear to be secure. These insights highlight the need for continued research and development of effective privacy-preserving techniques for federated learning systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Kang Gu, Najrin Sultana, Shagufta Mehnaz

Federated learning (FL) has become a key component in various language modeling applications such as machine translation, next-word prediction, and medical record analysis. These applications are trained on datasets from many FL participants that often include privacy-sensitive data, such as healthcare records, phone/credit card numbers, login credentials, etc. Although FL enables computation without necessitating clients to share their raw data, determining the extent of privacy leakage in federated language models is challenging and not straightforward. Moreover, existing attacks aim to extract data regardless of how sensitive or naive it is. To fill this research gap, we introduce two novel findings with regard to leaking privacy-sensitive user data from federated large language models. Firstly, we make a key observation that model snapshots from the intermediate rounds in FL can cause greater privacy leakage than the final trained model. Secondly, we identify that privacy leakage can be aggravated by tampering with a model's selective weights that are specifically responsible for memorizing the sensitive training data. We show how a malicious client can leak the privacy-sensitive data of some other users in FL even without any cooperation from the server. Our best-performing method improves the membership inference recall by 29% and achieves up to 71% private data reconstruction, evidently outperforming existing attacks with stronger assumptions of adversary capabilities.

5/28/2024

⛏️

Federated Learning Privacy: Attacks, Defenses, Applications, and Policy Landscape - A Survey

Joshua C. Zhao, Saurabh Bagchi, Salman Avestimehr, Kevin S. Chan, Somali Chaterji, Dimitris Dimitriadis, Jiacheng Li, Ninghui Li, Arash Nourian, Holger R. Roth

Deep learning has shown incredible potential across a vast array of tasks and accompanying this growth has been an insatiable appetite for data. However, a large amount of data needed for enabling deep learning is stored on personal devices and recent concerns on privacy have further highlighted challenges for accessing such data. As a result, federated learning (FL) has emerged as an important privacy-preserving technology enabling collaborative training of machine learning models without the need to send the raw, potentially sensitive, data to a central server. However, the fundamental premise that sending model updates to a server is privacy-preserving only holds if the updates cannot be reverse engineered to infer information about the private training data. It has been shown under a wide variety of settings that this premise for privacy does {em not} hold. In this survey paper, we provide a comprehensive literature review of the different privacy attacks and defense methods in FL. We identify the current limitations of these attacks and highlight the settings in which FL client privacy can be broken. We dissect some of the successful industry applications of FL and draw lessons for future successful adoption. We survey the emerging landscape of privacy regulation for FL. We conclude with future directions for taking FL toward the cherished goal of generating accurate models while preserving the privacy of the data from its participants.

5/7/2024

Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage

Md Rafi Ur Rashid, Jing Liu, Toshiaki Koike-Akino, Shagufta Mehnaz, Ye Wang

Fine-tuning large language models on private data for downstream applications poses significant privacy risks in potentially exposing sensitive information. Several popular community platforms now offer convenient distribution of a large variety of pre-trained models, allowing anyone to publish without rigorous verification. This scenario creates a privacy threat, as pre-trained models can be intentionally crafted to compromise the privacy of fine-tuning datasets. In this study, we introduce a novel poisoning technique that uses model-unlearning as an attack tool. This approach manipulates a pre-trained language model to increase the leakage of private data during the fine-tuning process. Our method enhances both membership inference and data extraction attacks while preserving model utility. Experimental results across different models, datasets, and fine-tuning setups demonstrate that our attacks significantly surpass baseline performance. This work serves as a cautionary note for users who download pre-trained models from unverified sources, highlighting the potential risks involved.

9/2/2024

Addressing Membership Inference Attack in Federated Learning with Model Compression

Gergely D'aniel N'emeth, Miguel 'Angel Lozano, Novi Quadrianto, Nuria Oliver

Federated Learning (FL) has been proposed as a privacy-preserving solution for machine learning. However, recent works have reported that FL can leak private client data through membership inference attacks. In this paper, we show that the effectiveness of these attacks on the clients negatively correlates with the size of the client's datasets and model complexity. Based on this finding, we study the capabilities of model-agnostic Federated Learning to preserve privacy, as it enables the use of models of varying complexity in the clients. To systematically study this topic, we first propose a taxonomy of model-agnostic FL methods according to the strategies adopted by the clients to select the sub-models from the server's model. This taxonomy provides a framework for existing model-agnostic FL approaches and leads to the proposal of new FL methods to fill the gaps in the taxonomy. Next, we analyze the privacy-performance trade-off of all the model-agnostic FL architectures as per the proposed taxonomy when subjected to 3 different membership inference attacks on the CIFAR-10 and CIFAR-100 vision datasets. In our experiments, we find that randomness in the strategy used to select the server's sub-model to train the clients' models can control the clients' privacy while keeping competitive performance on the server's side.

7/8/2024