Privacy Attacks in Decentralized Learning

Read original: arXiv:2402.10001 - Published 6/5/2024 by Abdellah El Mrini, Edwige Cyffers, Aur'elien Bellet

📊

Overview

This paper introduces a new attack against Decentralized Gradient Descent (D-GD), a collaborative learning method that allows users to train models without sharing their private data.
The attack enables a user (or group of users) to reconstruct the private data of other users outside their immediate network neighborhood.
The approach is based on a reconstruction attack against the gossip averaging protocol used in D-GD, which the authors extend to handle the additional challenges of the D-GD setting.
The authors validate the effectiveness of their attack on real graphs and datasets, showing that a single or small number of attackers can compromise a surprisingly large number of users.
They also investigate factors that affect the performance of the attack, such as graph topology, number of attackers, and attacker position in the graph.

Plain English Explanation

Decentralized Gradient Descent (D-GD) is a way for a group of users to work together on a machine learning task without having to share their private data with each other. Instead of sending their data directly, the users update a shared model on their own devices and then share those updates with their neighbors in the network. By iteratively averaging these local updates, the group can collaboratively train the model.

The authors of this paper show that this approach is not as private as it might seem. They developed a new attack that allows a user (or a few users) to figure out the private data of other users, even if they are not directly connected to them in the network. The attack works by exploiting vulnerabilities in the way the local updates are shared and averaged.

The authors tested their attack on real-world data and networks, and found that it can compromise a surprisingly large number of users, even if only a small number of attackers are involved. They also looked at how factors like the structure of the network and the position of the attackers in the network affect the success of the attack.

This research is important because it shows that even decentralized and privacy-preserving machine learning approaches like D-GD may have vulnerabilities that could allow bad actors to access private user data. It highlights the need for continued research and development of secure and private distributed learning algorithms.

Technical Explanation

The authors propose the first attack against Decentralized Gradient Descent (D-GD), a collaborative learning method that allows a set of users to train a shared model without directly sharing their private data. Instead, users iteratively average local model updates with their neighbors in a network graph.

The authors' attack is based on a reconstruction attack against the gossip averaging protocol used in D-GD. They extend this attack to handle the additional challenges of the D-GD setting, where there is no direct communication between non-neighbor nodes.

The authors validate the effectiveness of their attack on real graphs and datasets, demonstrating that the number of users compromised by a single or a handful of attackers is often surprisingly large. They empirically investigate several factors that affect the performance of the attack, including the graph topology, the number of attackers, and their position in the graph.

The authors find that the absence of direct communication between non-neighbor nodes in D-GD does not necessarily prevent users from inferring precise information about the data of others. Their attack shows that a user (or set of users) can in fact reconstruct the private data of other users outside their immediate neighborhood.

Critical Analysis

The authors present a compelling attack against the privacy guarantees of Decentralized Gradient Descent (D-GD), a collaborative learning method designed to protect user data. While the lack of direct communication between non-neighbor nodes in D-GD may seem to provide privacy, the authors effectively demonstrate that users can still infer sensitive information about each other's data.

One potential limitation of the research is that it does not explore potential defenses or countermeasures against this type of attack. The authors acknowledge that further work is needed to develop more secure decentralized learning algorithms that can withstand this type of reconstruction attack.

Additionally, the authors' attack is based on certain assumptions about the attackers' knowledge and capabilities, such as their ability to observe the network graph and the model updates shared by their neighbors. It would be valuable to understand how the attack's effectiveness might change under different attacker models or network conditions.

Overall, this research raises important questions about the privacy guarantees of decentralized learning approaches like D-GD and highlights the need for continued innovation in this area. The authors' attack serves as a cautionary tale and a call to action for the development of more robust and secure distributed machine learning systems.

Conclusion

This paper introduces a new attack against Decentralized Gradient Descent (D-GD), a collaborative learning method designed to protect user privacy. The authors demonstrate that despite the lack of direct communication between non-neighbor nodes in D-GD, users can still reconstruct the private data of other users outside their immediate network neighborhood.

The authors' attack, based on a reconstruction attack against the gossip averaging protocol used in D-GD, is shown to be highly effective in real-world settings. The number of users compromised by a single or small group of attackers can be surprisingly large, and the authors identify several key factors that influence the attack's performance.

This research is a significant contribution to the understanding of privacy risks in decentralized learning systems. It highlights the need for continued innovation in the development of secure and private distributed machine learning algorithms, such as BGAD, DistTack, and VVPFD, as well as privacy-preserving aggregation techniques. By addressing the vulnerabilities identified in this paper, researchers can work towards building more robust and trustworthy decentralized learning systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Privacy Attacks in Decentralized Learning

Abdellah El Mrini, Edwige Cyffers, Aur'elien Bellet

Decentralized Gradient Descent (D-GD) allows a set of users to perform collaborative learning without sharing their data by iteratively averaging local model updates with their neighbors in a network graph. The absence of direct communication between non-neighbor nodes might lead to the belief that users cannot infer precise information about the data of others. In this work, we demonstrate the opposite, by proposing the first attack against D-GD that enables a user (or set of users) to reconstruct the private data of other users outside their immediate neighborhood. Our approach is based on a reconstruction attack against the gossip averaging protocol, which we then extend to handle the additional challenges raised by D-GD. We validate the effectiveness of our attack on real graphs and datasets, showing that the number of users compromised by a single or a handful of attackers is often surprisingly large. We empirically investigate some of the factors that affect the performance of the attack, namely the graph topology, the number of attackers, and their position in the graph.

6/5/2024

🛠️

Muffliato: Peer-to-Peer Privacy Amplification for Decentralized Optimization and Averaging

Edwige Cyffers, Mathieu Even, Aur'elien Bellet, Laurent Massouli'e

Decentralized optimization is increasingly popular in machine learning for its scalability and efficiency. Intuitively, it should also provide better privacy guarantees, as nodes only observe the messages sent by their neighbors in the network graph. But formalizing and quantifying this gain is challenging: existing results are typically limited to Local Differential Privacy (LDP) guarantees that overlook the advantages of decentralization. In this work, we introduce pairwise network differential privacy, a relaxation of LDP that captures the fact that the privacy leakage from a node $u$ to a node $v$ may depend on their relative position in the graph. We then analyze the combination of local noise injection with (simple or randomized) gossip averaging protocols on fixed and random communication graphs. We also derive a differentially private decentralized optimization algorithm that alternates between local gradient descent steps and gossip averaging. Our results show that our algorithms amplify privacy guarantees as a function of the distance between nodes in the graph, matching the privacy-utility trade-off of the trusted curator, up to factors that explicitly depend on the graph topology. Finally, we illustrate our privacy gains with experiments on synthetic and real-world datasets.

6/12/2024

📉

Differentially Private Decentralized Learning with Random Walks

Edwige Cyffers, Aur'elien Bellet, Jalaj Upadhyay

The popularity of federated learning comes from the possibility of better scalability and the ability for participants to keep control of their data, improving data security and sovereignty. Unfortunately, sharing model updates also creates a new privacy attack surface. In this work, we characterize the privacy guarantees of decentralized learning with random walk algorithms, where a model is updated by traveling from one node to another along the edges of a communication graph. Using a recent variant of differential privacy tailored to the study of decentralized algorithms, namely Pairwise Network Differential Privacy, we derive closed-form expressions for the privacy loss between each pair of nodes where the impact of the communication topology is captured by graph theoretic quantities. Our results further reveal that random walk algorithms tends to yield better privacy guarantees than gossip algorithms for nodes close from each other. We supplement our theoretical results with empirical evaluation on synthetic and real-world graphs and datasets.

6/5/2024

🧠

Local Differential Privacy in Graph Neural Networks: a Reconstruction Approach

Karuna Bhaila, Wen Huang, Yongkai Wu, Xintao Wu

Graph Neural Networks have achieved tremendous success in modeling complex graph data in a variety of applications. However, there are limited studies investigating privacy protection in GNNs. In this work, we propose a learning framework that can provide node privacy at the user level, while incurring low utility loss. We focus on a decentralized notion of Differential Privacy, namely Local Differential Privacy, and apply randomization mechanisms to perturb both feature and label data at the node level before the data is collected by a central server for model training. Specifically, we investigate the application of randomization mechanisms in high-dimensional feature settings and propose an LDP protocol with strict privacy guarantees. Based on frequency estimation in statistical analysis of randomized data, we develop reconstruction methods to approximate features and labels from perturbed data. We also formulate this learning framework to utilize frequency estimates of graph clusters to supervise the training procedure at a sub-graph level. Extensive experiments on real-world and semi-synthetic datasets demonstrate the validity of our proposed model.

8/7/2024