Secure Aggregation Meets Sparsification in Decentralized Learning

Read original: arXiv:2405.07708 - Published 5/15/2024 by Sayan Biswas, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma, Milos Vujasinovic

🛸

Overview

Decentralized learning (DL) is vulnerable to privacy breaches due to sophisticated attacks on machine learning (ML) models
Secure aggregation is a cryptographic technique that enables multiple parties to compute an aggregate of their private data without revealing individual inputs
Sparsification techniques are used in DL to enhance communication efficiency by selectively sharing only the most crucial parameters or gradients
Applying secure aggregation to sparsified models in DL is challenging due to the transmission of disjoint parameter sets by distinct nodes

Plain English Explanation

Decentralized learning is a way for multiple computers or devices to work together to train a machine learning model, without one central authority having access to all the private data. However, this decentralized approach makes the model more vulnerable to privacy breaches and sophisticated attacks.

Secure aggregation is a cryptographic technique that allows these devices to combine their private data into a single result, without anyone (including the central coordinator) being able to see the individual inputs. This helps protect the privacy of the data.

To make decentralized learning more efficient, the devices often only share the most important parts of the machine learning model, a process called sparsification. However, applying secure aggregation to these sparse models is challenging, because the way the data is shared can interfere with the privacy-preserving properties of the secure aggregation.

Technical Explanation

This paper introduces CESAR, a new secure aggregation protocol designed to work with existing sparsification techniques in decentralized learning. CESAR provably protects against "honest-but-curious" adversaries (devices that follow the protocol but try to learn private information) and can be adapted to handle collusion between these adversaries.

The paper provides a deep analysis of how the sparsification used by the nodes interacts with the proportion of parameters shared under CESAR, in both colluding and non-colluding scenarios. This offers insights into how the CESAR protocol works and where it can be applied.

Experiments on a network of 48 nodes show that CESAR achieves accuracy very close (within 0.5%) to decentralized parallel stochastic gradient descent (D-PSGD), a common decentralized learning approach, while only adding 11% data overhead. Additionally, CESAR outperforms the popular TopK sparsification technique by up to 0.3% on data that is independently and identically distributed (IID) across the nodes.

Critical Analysis

The paper provides a strong theoretical foundation for the CESAR protocol and its interactions with sparsification. However, the experimental evaluation is limited to a relatively small network of 48 nodes. Larger-scale real-world deployments may reveal additional challenges or tradeoffs that are not captured here.

Additionally, the paper focuses on honest-but-curious adversaries and collusion between them. Further research may be needed to understand CESAR's robustness against more sophisticated adversaries, such as those that actively deviate from the protocol (Byzantine adversaries) or attempt to infer private information through side channels.

Finally, the paper does not explore the impact of non-IID data distributions, which are common in many real-world decentralized learning scenarios. Understanding how CESAR performs in the presence of statistical heterogeneity across nodes would be an important area for future work.

Conclusion

This paper introduces CESAR, a novel secure aggregation protocol designed to work with sparsification techniques in decentralized learning. CESAR provably protects against honest-but-curious adversaries and can be adapted to handle collusion. The theoretical analysis and experimental results demonstrate CESAR's ability to achieve high accuracy with modest communication overhead, surpassing existing sparsification techniques in some cases.

While the foundations laid in this work are promising, further research is needed to understand CESAR's performance and security guarantees in larger-scale, more complex decentralized learning scenarios. Exploring its robustness against a broader range of adversaries and its behavior with non-IID data distributions would be valuable next steps.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🛸

Secure Aggregation Meets Sparsification in Decentralized Learning

Sayan Biswas, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma, Milos Vujasinovic

Decentralized learning (DL) faces increased vulnerability to privacy breaches due to sophisticated attacks on machine learning (ML) models. Secure aggregation is a computationally efficient cryptographic technique that enables multiple parties to compute an aggregate of their private data while keeping their individual inputs concealed from each other and from any central aggregator. To enhance communication efficiency in DL, sparsification techniques are used, selectively sharing only the most crucial parameters or gradients in a model, thereby maintaining efficiency without notably compromising accuracy. However, applying secure aggregation to sparsified models in DL is challenging due to the transmission of disjoint parameter sets by distinct nodes, which can prevent masks from canceling out effectively. This paper introduces CESAR, a novel secure aggregation protocol for DL designed to be compatible with existing sparsification mechanisms. CESAR provably defends against honest-but-curious adversaries and can be formally adapted to counteract collusion between them. We provide a foundational understanding of the interaction between the sparsification carried out by the nodes and the proportion of the parameters shared under CESAR in both colluding and non-colluding environments, offering analytical insight into the working and applicability of the protocol. Experiments on a network with 48 nodes in a 3-regular topology show that with random subsampling, CESAR is always within 0.5% accuracy of decentralized parallel stochastic gradient descent (D-PSGD), while adding only 11% of data overhead. Moreover, it surpasses the accuracy on TopK by up to 0.3% on independent and identically distributed (IID) data.

5/15/2024

↗️

Privacy-Preserving Aggregation for Decentralized Learning with Byzantine-Robustness

Ali Reza Ghavamipour, Benjamin Zi Hao Zhao, Oguzhan Ersoy, Fatih Turkmen

Decentralized machine learning (DL) has been receiving an increasing interest recently due to the elimination of a single point of failure, present in Federated learning setting. Yet, it is threatened by the looming threat of Byzantine clients who intentionally disrupt the learning process by broadcasting arbitrary model updates to other clients, seeking to degrade the performance of the global model. In response, robust aggregation schemes have emerged as promising solutions to defend against such Byzantine clients, thereby enhancing the robustness of Decentralized Learning. Defenses against Byzantine adversaries, however, typically require access to the updates of other clients, a counterproductive privacy trade-off that in turn increases the risk of inference attacks on those same model updates. In this paper, we introduce SecureDL, a novel DL protocol designed to enhance the security and privacy of DL against Byzantine threats. SecureDL~facilitates a collaborative defense, while protecting the privacy of clients' model updates through secure multiparty computation. The protocol employs efficient computation of cosine similarity and normalization of updates to robustly detect and exclude model updates detrimental to model convergence. By using MNIST, Fashion-MNIST, SVHN and CIFAR-10 datasets, we evaluated SecureDL against various Byzantine attacks and compared its effectiveness with four existing defense mechanisms. Our experiments show that SecureDL is effective even in the case of attacks by the malicious majority (e.g., 80% Byzantine clients) while preserving high training accuracy.

4/30/2024

🤯

Secure Aggregation is Not Private Against Membership Inference Attacks

Khac-Hoang Ngo, Johan Ostman, Giuseppe Durisi, Alexandre Graell i Amat

Secure aggregation (SecAgg) is a commonly-used privacy-enhancing mechanism in federated learning, affording the server access only to the aggregate of model updates while safeguarding the confidentiality of individual updates. Despite widespread claims regarding SecAgg's privacy-preserving capabilities, a formal analysis of its privacy is lacking, making such presumptions unjustified. In this paper, we delve into the privacy implications of SecAgg by treating it as a local differential privacy (LDP) mechanism for each local update. We design a simple attack wherein an adversarial server seeks to discern which update vector a client submitted, out of two possible ones, in a single training round of federated learning under SecAgg. By conducting privacy auditing, we assess the success probability of this attack and quantify the LDP guarantees provided by SecAgg. Our numerical results unveil that, contrary to prevailing claims, SecAgg offers weak privacy against membership inference attacks even in a single training round. Indeed, it is difficult to hide a local update by adding other independent local updates when the updates are of high dimension. Our findings underscore the imperative for additional privacy-enhancing mechanisms, such as noise injection, in federated learning.

7/16/2024

A survey on secure decentralized optimization and learning

Changxin Liu, Nicola Bastianello, Wei Huo, Yang Shi, Karl H. Johansson

Decentralized optimization has become a standard paradigm for solving large-scale decision-making problems and training large machine learning models without centralizing data. However, this paradigm introduces new privacy and security risks, with malicious agents potentially able to infer private data or impair the model accuracy. Over the past decade, significant advancements have been made in developing secure decentralized optimization and learning frameworks and algorithms. This survey provides a comprehensive tutorial on these advancements. We begin with the fundamentals of decentralized optimization and learning, highlighting centralized aggregation and distributed consensus as key modules exposed to security risks in federated and distributed optimization, respectively. Next, we focus on privacy-preserving algorithms, detailing three cryptographic tools and their integration into decentralized optimization and learning systems. Additionally, we examine resilient algorithms, exploring the design and analysis of resilient aggregation and consensus protocols that support these systems. We conclude the survey by discussing current trends and potential future directions.

8/19/2024