Beyond Noise: Privacy-Preserving Decentralized Learning with Virtual Nodes

2404.09536

Published 4/16/2024 by Sayan Biswas, Mathieu Even, Anne-Marie Kermarrec, Laurent Massoulie, Rafael Pires, Rishi Sharma, Martijn de Vos

cs.DC cs.AI cs.CR cs.LG

Beyond Noise: Privacy-Preserving Decentralized Learning with Virtual Nodes

Abstract

Decentralized learning (DL) enables collaborative learning without a server and without training data leaving the users' devices. However, the models shared in DL can still be used to infer training data. Conventional privacy defenses such as differential privacy and secure aggregation fall short in effectively safeguarding user privacy in DL. We introduce Shatter, a novel DL approach in which nodes create virtual nodes (VNs) to disseminate chunks of their full model on their behalf. This enhances privacy by (i) preventing attackers from collecting full models from other nodes, and (ii) hiding the identity of the original node that produced a given model chunk. We theoretically prove the convergence of Shatter and provide a formal analysis demonstrating how Shatter reduces the efficacy of attacks compared to when exchanging full models between participating nodes. We evaluate the convergence and attack resilience of Shatter with existing DL algorithms, with heterogeneous datasets, and against three standard privacy attacks, including gradient inversion. Our evaluation shows that Shatter not only renders these privacy attacks infeasible when each node operates 16 VNs but also exhibits a positive impact on model convergence compared to standard DL. This enhanced privacy comes with a manageable increase in communication volume.

Create account to get full access

Overview

This paper introduces a novel privacy-preserving decentralized learning approach using "virtual nodes"
The method aims to overcome the "vanishing variance" problem in fully decentralized neural networks
The approach integrates with existing decentralized learning frameworks like federated learning and differential privacy
Experiments demonstrate the method's effectiveness in preserving privacy while maintaining model performance

Plain English Explanation

The paper presents a new way to do decentralized machine learning while protecting people's privacy. In traditional decentralized learning, there's a problem where the model can lose important information as it gets passed around between devices. The authors introduce "virtual nodes" to fix this issue.

Virtual nodes act like intermediaries, allowing the model to be updated without directly sharing sensitive data between devices. This helps maintain the model's accuracy while also preserving the privacy of the individuals involved. The approach integrates with existing privacy-preserving techniques like differential privacy and federated learning.

Through experiments, the researchers show their virtual node method can achieve good model performance without compromising people's privacy. This is an important advance, as privacy is a major concern for many real-world applications of machine learning.

Technical Explanation

The paper introduces a privacy-preserving decentralized learning framework that uses virtual nodes to overcome the "vanishing variance" problem in fully decentralized neural networks. This issue arises when model updates get passed directly between devices, causing a loss of important information.

The proposed approach integrates with existing decentralized learning techniques like federated learning and differential privacy. Virtual nodes act as intermediaries, allowing model updates to be aggregated and shared without directly exposing private user data.

The authors design a virtual node mechanism that provably preserves the stability and convergence properties of the underlying decentralized learning algorithm. They also propose a virtual node selection strategy to optimize the trade-off between privacy and model performance.

Experimental results on standard benchmarks demonstrate the effectiveness of the virtual node approach. Compared to baselines, the method achieves similar model accuracy while providing stronger privacy guarantees, as measured by DP and make-split metrics.

Critical Analysis

The paper thoroughly addresses the "vanishing variance" problem in fully decentralized neural networks, which is a well-known challenge in this field. The virtual node mechanism provides a principled solution that integrates with existing privacy-preserving techniques.

However, the paper does not discuss potential limitations or areas for further research. For example, it's unclear how the virtual node approach would scale to very large decentralized systems, or how it might be impacted by Byzantine failures or other adversarial attacks.

Additionally, the experiments are conducted on standard benchmarks, but more real-world evaluations would help assess the practical implications and deployability of the proposed method. Exploring applications beyond the basic supervised learning setting, such as vertical federated learning or group decision-making, could also yield valuable insights.

Overall, the paper presents a promising approach, but further research is needed to fully understand its capabilities, limitations, and potential impact in diverse decentralized learning scenarios.

Conclusion

This paper introduces a novel privacy-preserving decentralized learning framework that uses "virtual nodes" to overcome the "vanishing variance" problem in fully decentralized neural networks. The approach integrates with existing techniques like federated learning and differential privacy to maintain model performance while preserving the privacy of individual users.

Experimental results demonstrate the effectiveness of the virtual node method, which achieves similar accuracy to baselines while providing stronger privacy guarantees. This work represents an important advancement in the field of decentralized machine learning, with potential applications in domains where privacy is a critical concern, such as healthcare, finance, and personal communications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

📉

Privacy-Preserving, Dropout-Resilient Aggregation in Decentralized Learning

Ali Reza Ghavamipour, Benjamin Zi Hao Zhao, Fatih Turkmen

Decentralized learning (DL) offers a novel paradigm in machine learning by distributing training across clients without central aggregation, enhancing scalability and efficiency. However, DL's peer-to-peer model raises challenges in protecting against inference attacks and privacy leaks. By forgoing central bottlenecks, DL demands privacy-preserving aggregation methods to protect data from 'honest but curious' clients and adversaries, maintaining network-wide privacy. Privacy-preserving DL faces the additional hurdle of client dropout, clients not submitting updates due to connectivity problems or unavailability, further complicating aggregation. This work proposes three secret sharing-based dropout resilience approaches for privacy-preserving DL. Our study evaluates the efficiency, performance, and accuracy of these protocols through experiments on datasets such as MNIST, Fashion-MNIST, SVHN, and CIFAR-10. We compare our protocols with traditional secret-sharing solutions across scenarios, including those with up to 1000 clients. Evaluations show that our protocols significantly outperform conventional methods, especially in scenarios with up to 30% of clients dropout and model sizes of up to $10^6$ parameters. Our approaches demonstrate markedly high efficiency with larger models, higher dropout rates, and extensive client networks, highlighting their effectiveness in enhancing decentralized learning systems' privacy and dropout robustness.

4/30/2024

cs.CR cs.AI

↗️

Privacy-Preserving Aggregation for Decentralized Learning with Byzantine-Robustness

Ali Reza Ghavamipour, Benjamin Zi Hao Zhao, Oguzhan Ersoy, Fatih Turkmen

Decentralized machine learning (DL) has been receiving an increasing interest recently due to the elimination of a single point of failure, present in Federated learning setting. Yet, it is threatened by the looming threat of Byzantine clients who intentionally disrupt the learning process by broadcasting arbitrary model updates to other clients, seeking to degrade the performance of the global model. In response, robust aggregation schemes have emerged as promising solutions to defend against such Byzantine clients, thereby enhancing the robustness of Decentralized Learning. Defenses against Byzantine adversaries, however, typically require access to the updates of other clients, a counterproductive privacy trade-off that in turn increases the risk of inference attacks on those same model updates. In this paper, we introduce SecureDL, a novel DL protocol designed to enhance the security and privacy of DL against Byzantine threats. SecureDL~facilitates a collaborative defense, while protecting the privacy of clients' model updates through secure multiparty computation. The protocol employs efficient computation of cosine similarity and normalization of updates to robustly detect and exclude model updates detrimental to model convergence. By using MNIST, Fashion-MNIST, SVHN and CIFAR-10 datasets, we evaluated SecureDL against various Byzantine attacks and compared its effectiveness with four existing defense mechanisms. Our experiments show that SecureDL is effective even in the case of attacks by the malicious majority (e.g., 80% Byzantine clients) while preserving high training accuracy.

4/30/2024

cs.CR cs.AI

The Privacy Power of Correlated Noise in Decentralized Learning

Youssef Allouah, Anastasia Koloskova, Aymane El Firdoussi, Martin Jaggi, Rachid Guerraoui

Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data. Yet, without additional precautions, curious users can still leverage models obtained from their peers to violate privacy. In this paper, we propose Decor, a variant of decentralized SGD with differential privacy (DP) guarantees. Essentially, in Decor, users securely exchange randomness seeds in one communication round to generate pairwise-canceling correlated Gaussian noises, which are injected to protect local models at every communication round. We theoretically and empirically show that, for arbitrary connected graphs, Decor matches the central DP optimal privacy-utility trade-off. We do so under SecLDP, our new relaxation of local DP, which protects all user communications against an external eavesdropper and curious users, assuming that every pair of connected users shares a secret, i.e., an information hidden to all others. The main theoretical challenge is to control the accumulation of non-canceling correlated noise due to network sparsity. We also propose a companion SecLDP privacy accountant for public use.

5/6/2024

cs.LG cs.CR cs.DC stat.ML

No Vandalism: Privacy-Preserving and Byzantine-Robust Federated Learning

Zhibo Xing, Zijian Zhang, Zi'ang Zhang, Jiamou Liu, Liehuang Zhu, Giovanni Russello

Federated learning allows several clients to train one machine learning model jointly without sharing private data, providing privacy protection. However, traditional federated learning is vulnerable to poisoning attacks, which can not only decrease the model performance, but also implant malicious backdoors. In addition, direct submission of local model parameters can also lead to the privacy leakage of the training dataset. In this paper, we aim to build a privacy-preserving and Byzantine-robust federated learning scheme to provide an environment with no vandalism (NoV) against attacks from malicious participants. Specifically, we construct a model filter for poisoned local models, protecting the global model from data and model poisoning attacks. This model filter combines zero-knowledge proofs to provide further privacy protection. Then, we adopt secret sharing to provide verifiable secure aggregation, removing malicious clients that disrupting the aggregation process. Our formal analysis proves that NoV can protect data privacy and weed out Byzantine attackers. Our experiments illustrate that NoV can effectively address data and model poisoning attacks, including PGD, and outperforms other related schemes.

6/4/2024

cs.CR cs.LG