Federated Learning Privacy: Attacks, Defenses, Applications, and Policy Landscape - A Survey

2405.03636

Published 5/7/2024 by Joshua C. Zhao, Saurabh Bagchi, Salman Avestimehr, Kevin S. Chan, Somali Chaterji, Dimitris Dimitriadis, Jiacheng Li, Ninghui Li, Arash Nourian, Holger R. Roth

cs.CR cs.LG

⛏️

Abstract

Deep learning has shown incredible potential across a vast array of tasks and accompanying this growth has been an insatiable appetite for data. However, a large amount of data needed for enabling deep learning is stored on personal devices and recent concerns on privacy have further highlighted challenges for accessing such data. As a result, federated learning (FL) has emerged as an important privacy-preserving technology enabling collaborative training of machine learning models without the need to send the raw, potentially sensitive, data to a central server. However, the fundamental premise that sending model updates to a server is privacy-preserving only holds if the updates cannot be reverse engineered to infer information about the private training data. It has been shown under a wide variety of settings that this premise for privacy does {em not} hold. In this survey paper, we provide a comprehensive literature review of the different privacy attacks and defense methods in FL. We identify the current limitations of these attacks and highlight the settings in which FL client privacy can be broken. We dissect some of the successful industry applications of FL and draw lessons for future successful adoption. We survey the emerging landscape of privacy regulation for FL. We conclude with future directions for taking FL toward the cherished goal of generating accurate models while preserving the privacy of the data from its participants.

Create account to get full access

Overview

Deep learning has enabled incredible advancements across many tasks, but requires large datasets which are often stored on personal devices
Privacy concerns have led to the development of federated learning (FL), which allows collaborative model training without sharing raw data
However, recent research has shown that the model updates sent in FL can be reverse-engineered to reveal private training data

Plain English Explanation

Federated learning is a technique that allows multiple devices, like smartphones or personal computers, to collaboratively train a machine learning model without sharing their private data. The idea is that each device trains the model on its own data, and then only sends the model updates to a central server. This way, the raw personal data never leaves the device, preserving privacy.

However, researchers have found that even these model updates can sometimes be used to figure out the original private data that was used to train the model. This means the fundamental premise of federated learning - that it protects privacy - may not always hold true.

The paper provides a comprehensive review of the different privacy attacks that can be used to exploit federated learning, as well as some of the ways researchers have tried to defend against these attacks. It also looks at real-world applications of federated learning and the emerging privacy regulations around it.

Technical Explanation

The paper first outlines how federated learning works - client devices train a shared model on their local data, then send the model updates to a central server, which aggregates the updates to improve the global model. The key premise is that this preserves privacy since the raw training data never leaves the client devices.

However, the paper then discusses how this premise can break down. Researchers have demonstrated a variety of attacks that can reverse-engineer the client updates to extract information about the private training data. These range from membership inference attacks that can detect if a client's data was used, to reconstruction attacks that can reconstruct the original training data.

The paper surveys the different defense mechanisms that have been proposed, such as differentially private aggregation, secure multiparty computation, and personalized models. It also examines real-world industry applications of federated learning and the emerging privacy regulations around it.

Critical Analysis

The paper provides a thorough overview of the key privacy challenges facing federated learning. It is clear that the assumption of privacy-preservation through model updates alone is flawed, as a range of attacks can exploit these updates. The defense mechanisms discussed, while promising, still have limitations in terms of the privacy guarantees they can provide.

One area the paper does not deeply explore is the tradeoffs between privacy and model performance. Stronger privacy defenses may come at the cost of reduced model accuracy. Finding the right balance will be crucial for the practical adoption of federated learning.

Additionally, the paper focuses primarily on centralized federated learning architectures. Decentralized approaches, where there is no central server, may offer better inherent privacy properties and warrant further investigation.

Overall, this paper serves as an important wake-up call that simply using federated learning does not automatically ensure privacy. Significant further research is needed to develop truly private and robust federated learning systems.

Conclusion

This survey paper provides a comprehensive review of the privacy challenges facing federated learning. While the premise of federated learning - training accurate models without sharing raw data - is appealing, the paper demonstrates that the model updates themselves can be exploited to reveal private information.

The paper catalogues the various privacy attacks that have been developed, as well as some of the defense mechanisms researchers have proposed. It also examines real-world applications and the emerging regulatory landscape around federated learning.

Ultimately, the paper highlights that significant further work is needed to realize the vision of federated learning as a privacy-preserving machine learning paradigm. Balancing privacy, performance, and practical deployment remains an open challenge that will require innovative solutions from the research community.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Federated Learning: A Cutting-Edge Survey of the Latest Advancements and Applications

Azim Akhtarshenas, Mohammad Ali Vahedifar, Navid Ayoobi, Behrouz Maham, Tohid Alizadeh, Sina Ebrahimi, David L'opez-P'erez

Robust machine learning (ML) models can be developed by leveraging large volumes of data and distributing the computational tasks across numerous devices or servers. Federated learning (FL) is a technique in the realm of ML that facilitates this goal by utilizing cloud infrastructure to enable collaborative model training among a network of decentralized devices. Beyond distributing the computational load, FL targets the resolution of privacy issues and the reduction of communication costs simultaneously. To protect user privacy, FL requires users to send model updates rather than transmitting large quantities of raw and potentially confidential data. Specifically, individuals train ML models locally using their own data and then upload the results in the form of weights and gradients to the cloud for aggregation into the global model. This strategy is also advantageous in environments with limited bandwidth or high communication costs, as it prevents the transmission of large data volumes. With the increasing volume of data and rising privacy concerns, alongside the emergence of large-scale ML models like Large Language Models (LLMs), FL presents itself as a timely and relevant solution. It is therefore essential to review current FL algorithms to guide future research that meets the rapidly evolving ML demands. This survey provides a comprehensive analysis and comparison of the most recent FL algorithms, evaluating them on various fronts including mathematical frameworks, privacy protection, resource allocation, and applications. Beyond summarizing existing FL methods, this survey identifies potential gaps, open areas, and future challenges based on the performance reports and algorithms used in recent studies. This survey enables researchers to readily identify existing limitations in the FL field for further exploration.

5/28/2024

cs.LG cs.AI cs.CR cs.DC

📈

Federated Learning in Healthcare: Model Misconducts, Security, Challenges, Applications, and Future Research Directions -- A Systematic Review

Md Shahin Ali, Md Manjurul Ahsan, Lamia Tasnim, Sadia Afrin, Koushik Biswas, Md Maruf Hossain, Md Mahfuz Ahmed, Ronok Hashan, Md Khairul Islam, Shivakumar Raman

Data privacy has become a major concern in healthcare due to the increasing digitization of medical records and data-driven medical research. Protecting sensitive patient information from breaches and unauthorized access is critical, as such incidents can have severe legal and ethical complications. Federated Learning (FL) addresses this concern by enabling multiple healthcare institutions to collaboratively learn from decentralized data without sharing it. FL's scope in healthcare covers areas such as disease prediction, treatment customization, and clinical trial research. However, implementing FL poses challenges, including model convergence in non-IID (independent and identically distributed) data environments, communication overhead, and managing multi-institutional collaborations. A systematic review of FL in healthcare is necessary to evaluate how effectively FL can provide privacy while maintaining the integrity and usability of medical data analysis. In this study, we analyze existing literature on FL applications in healthcare. We explore the current state of model security practices, identify prevalent challenges, and discuss practical applications and their implications. Additionally, the review highlights promising future research directions to refine FL implementations, enhance data security protocols, and expand FL's use to broader healthcare applications, which will benefit future researchers and practitioners.

5/24/2024

cs.CR cs.AI cs.LG

👁️

Privacy-Preserving Edge Federated Learning for Intelligent Mobile-Health Systems

Amin Aminifar, Matin Shokri, Amir Aminifar

Machine Learning (ML) algorithms are generally designed for scenarios in which all data is stored in one data center, where the training is performed. However, in many applications, e.g., in the healthcare domain, the training data is distributed among several entities, e.g., different hospitals or patients' mobile devices/sensors. At the same time, transferring the data to a central location for learning is certainly not an option, due to privacy concerns and legal issues, and in certain cases, because of the communication and computation overheads. Federated Learning (FL) is the state-of-the-art collaborative ML approach for training an ML model across multiple parties holding local data samples, without sharing them. However, enabling learning from distributed data over such edge Internet of Things (IoT) systems (e.g., mobile-health and wearable technologies, involving sensitive personal/medical data) in a privacy-preserving fashion presents a major challenge mainly due to their stringent resource constraints, i.e., limited computing capacity, communication bandwidth, memory storage, and battery lifetime. In this paper, we propose a privacy-preserving edge FL framework for resource-constrained mobile-health and wearable technologies over the IoT infrastructure. We evaluate our proposed framework extensively and provide the implementation of our technique on Amazon's AWS cloud platform based on the seizure detection application in epilepsy monitoring using wearable technologies.

5/10/2024

cs.LG cs.CR

Safely Learning with Private Data: A Federated Learning Framework for Large Language Model

JiaYing Zheng, HaiNan Zhang, LingXiang Wang, WangJie Qiu, HongWei Zheng, ZhiMing Zheng

Private data, being larger and quality-higher than public data, can greatly improve large language models (LLM). However, due to privacy concerns, this data is often dispersed in multiple silos, making its secure utilization for LLM training a challenge. Federated learning (FL) is an ideal solution for training models with distributed private data, but traditional frameworks like FedAvg are unsuitable for LLM due to their high computational demands on clients. An alternative, split learning, offloads most training parameters to the server while training embedding and output layers locally, making it more suitable for LLM. Nonetheless, it faces significant challenges in security and efficiency. Firstly, the gradients of embeddings are prone to attacks, leading to potential reverse engineering of private data. Furthermore, the server's limitation of handle only one client's training request at a time hinders parallel training, severely impacting training efficiency. In this paper, we propose a Federated Learning framework for LLM, named FL-GLM, which prevents data leakage caused by both server-side and peer-client attacks while improving training efficiency. Specifically, we first place the input block and output block on local client to prevent embedding gradient attacks from server. Secondly, we employ key-encryption during client-server communication to prevent reverse engineering attacks from peer-clients. Lastly, we employ optimization methods like client-batching or server-hierarchical, adopting different acceleration methods based on the actual computational capabilities of the server. Experimental results on NLU and generation tasks demonstrate that FL-GLM achieves comparable metrics to centralized chatGLM model, validating the effectiveness of our federated learning framework.

6/27/2024

cs.CR cs.CL