FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System

Read original: arXiv:2303.10837 - Published 6/18/2024 by Weizhao Jin, Yuhang Yao, Shanshan Han, Jiajun Gu, Carlee Joe-Wong, Srivatsan Ravi, Salman Avestimehr, Chaoyang He

🧪

Overview

Federated Learning (FL) trains machine learning models on distributed devices without sharing local data, addressing privacy concerns
However, the aggregated local models on the server can still reveal sensitive personal information through inversion attacks
Privacy-preserving methods like homomorphic encryption (HE) are necessary for secure FL training
Despite HE's privacy advantages, its applications have suffered from impractical overheads, especially for large foundation models

Plain English Explanation

FedML-HE: Practical Federated Learning with Efficient Homomorphic Encryption-based Secure Model Aggregation

Federated Learning (FL) is a way to train machine learning models without sharing people's private data. Instead of sending all the data to a central server, FL lets devices like phones or tablets train a model on their own data and then share just the updates to that model. This helps protect people's privacy.

However, even though the raw data isn't shared, the model updates on the server could still reveal sensitive information through complex attacks. To fix this, the paper introduces FedML-HE, a new FL system that uses a technique called homomorphic encryption (HE) to securely combine the model updates.

HE allows mathematical operations to be performed on encrypted data without decrypting it first. This means the server can work with the encrypted model updates without ever seeing the underlying private information. But previous HE-based FL systems had very high computational and communication costs, making them impractical, especially for large, complex machine learning models.

The key innovation in FedML-HE is that it only encrypts the most sensitive parts of the model, significantly reducing the overhead. This makes it much more feasible to use HE for securing FL, even with large foundation models like BERT. The authors show FedML-HE can reduce the overhead by up to 40 times compared to prior HE-based FL approaches.

Technical Explanation

FedML-HE is a practical federated learning system that uses efficient homomorphic encryption (HE)-based secure model aggregation. Federated learning trains machine learning models on distributed devices by aggregating local model updates instead of sharing local data, addressing privacy concerns. However, the aggregated local models on the server can still reveal sensitive personal information through inversion attacks.

To preserve privacy, FedML-HE proposes to selectively encrypt sensitive parameters during the federated learning process. This significantly reduces both the computation and communication overheads compared to applying HE to the entire model. The system demonstrates considerable overhead reduction, particularly for large foundation models (e.g., ~10x reduction for ResNet-50, and up to ~40x reduction for BERT), making HE-based federated learning much more practical and scalable.

The key technical contributions of FedML-HE include:

Selective parameter encryption: Only encrypting the most sensitive parameters to minimize HE overhead
Optimized HE-based secure aggregation: Leveraging techniques like Berrut approximation to further reduce HE costs
System integration and deployment: Integrating the HE-based secure aggregation into a real-world federated learning framework (FedML)

Critical Analysis

The FedML-HE paper presents a promising approach to make homomorphic encryption-based secure model aggregation practical for federated learning, especially for large foundation models. By selectively encrypting sensitive parameters, the system is able to dramatically reduce the computational and communication overheads compared to prior HE-based FL methods.

However, the paper does not provide a thorough analysis of the privacy guarantees offered by the selective encryption approach. While the authors claim it preserves privacy, the specific privacy properties and the potential risks of partial parameter encryption are not fully explored. Additional research may be needed to rigorously evaluate the privacy-preserving capabilities of this technique.

Furthermore, the paper focuses on the efficiency and scalability of the HE-based aggregation, but does not discuss other potential challenges in federated learning, such as client heterogeneity, model convergence, or the impact of client dropout. These factors could also affect the practical deployment and performance of FedML-HE in real-world scenarios.

Finally, the evaluation is limited to a few benchmark tasks and models. Applying FedML-HE to a wider range of federated learning applications, especially those involving sensitive user data, would help further validate the system's capabilities and limitations.

Conclusion

The FedML-HE paper presents a significant advancement in making homomorphic encryption-based secure aggregation practical for federated learning, particularly for large-scale foundation models. By selectively encrypting sensitive parameters, the system is able to dramatically reduce the computational and communication overheads, overcoming a major barrier to the widespread adoption of privacy-preserving federated learning.

The key innovation of FedML-HE is its ability to balance privacy preservation and system efficiency, demonstrating the potential for scalable HE-based federated learning deployment. As machine learning models become larger and more powerful, preserving user privacy will be critical. The techniques introduced in this paper represent an important step towards realizing the benefits of federated learning while ensuring the protection of sensitive personal information.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🧪

FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System

Weizhao Jin, Yuhang Yao, Shanshan Han, Jiajun Gu, Carlee Joe-Wong, Srivatsan Ravi, Salman Avestimehr, Chaoyang He

Federated Learning trains machine learning models on distributed devices by aggregating local model updates instead of local data. However, privacy concerns arise as the aggregated local models on the server may reveal sensitive personal information by inversion attacks. Privacy-preserving methods, such as homomorphic encryption (HE), then become necessary for FL training. Despite HE's privacy advantages, its applications suffer from impractical overheads, especially for foundation models. In this paper, we present FedML-HE, the first practical federated learning system with efficient HE-based secure model aggregation. FedML-HE proposes to selectively encrypt sensitive parameters, significantly reducing both computation and communication overheads during training while providing customizable privacy preservation. Our optimized system demonstrates considerable overhead reduction, particularly for large foundation models (e.g., ~10x reduction for ResNet-50, and up to ~40x reduction for BERT), demonstrating the potential for scalable HE-based FL deployment.

6/18/2024

Lancelot: Towards Efficient and Privacy-Preserving Byzantine-Robust Federated Learning within Fully Homomorphic Encryption

Siyang Jiang, Hao Yang, Qipeng Xie, Chuan Ma, Sen Wang, Guoliang Xing

In sectors such as finance and healthcare, where data governance is subject to rigorous regulatory requirements, the exchange and utilization of data are particularly challenging. Federated Learning (FL) has risen as a pioneering distributed machine learning paradigm that enables collaborative model training across multiple institutions while maintaining data decentralization. Despite its advantages, FL is vulnerable to adversarial threats, particularly poisoning attacks during model aggregation, a process typically managed by a central server. However, in these systems, neural network models still possess the capacity to inadvertently memorize and potentially expose individual training instances. This presents a significant privacy risk, as attackers could reconstruct private data by leveraging the information contained in the model itself. Existing solutions fall short of providing a viable, privacy-preserving BRFL system that is both completely secure against information leakage and computationally efficient. To address these concerns, we propose Lancelot, an innovative and computationally efficient BRFL framework that employs fully homomorphic encryption (FHE) to safeguard against malicious client activities while preserving data privacy. Our extensive testing, which includes medical imaging diagnostics and widely-used public image datasets, demonstrates that Lancelot significantly outperforms existing methods, offering more than a twenty-fold increase in processing speed, all while maintaining data privacy.

8/13/2024

📈

FLUE: Federated Learning with Un-Encrypted model weights

Elie Atallah

Federated Learning enables diverse devices to collaboratively train a shared model while keeping training data locally stored, avoiding the need for centralized cloud storage. Despite existing privacy measures, concerns arise from potential reverse engineering of gradients, even with added noise, revealing private data. To address this, recent research emphasizes using encrypted model parameters during training. This paper introduces a novel federated learning algorithm, leveraging coded local gradients without encryption, exchanging coded proxies for model parameters, and injecting surplus noise for enhanced privacy. Two algorithm variants are presented, showcasing convergence and learning rates adaptable to coding schemes and raw data characteristics. Two encryption-free implementations with fixed and random coding matrices are provided, demonstrating promising simulation results from both federated optimization and machine learning perspectives.

7/29/2024

An Efficient and Multi-private Key Secure Aggregation for Federated Learning

Xue Yang, Zifeng Liu, Xiaohu Tang, Rongxing Lu, Bo Liu

With the emergence of privacy leaks in federated learning, secure aggregation protocols that mainly adopt either homomorphic encryption or threshold secret sharing have been widely developed for federated learning to protect the privacy of the local training data of each client. However, these existing protocols suffer from many shortcomings, such as the dependence on a trusted third party, the vulnerability to clients being corrupted, low efficiency, the trade-off between security and fault tolerance, etc. To solve these disadvantages, we propose an efficient and multi-private key secure aggregation scheme for federated learning. Specifically, we skillfully modify the variant ElGamal encryption technique to achieve homomorphic addition operation, which has two important advantages: 1) The server and each client can freely select public and private keys without introducing a trust third party and 2) Compared to the variant ElGamal encryption, the plaintext space is relatively large, which is more suitable for the deep model. Besides, for the high dimensional deep model parameter, we introduce a super-increasing sequence to compress multi-dimensional data into 1-D, which can greatly reduce encryption and decryption times as well as communication for ciphertext transmission. Detailed security analyses show that our proposed scheme achieves the semantic security of both individual local gradients and the aggregated result while achieving optimal robustness in tolerating both client collusion and dropped clients. Extensive simulations demonstrate that the accuracy of our scheme is almost the same as the non-private approach, while the efficiency of our scheme is much better than the state-of-the-art homomorphic encryption-based secure aggregation schemes. More importantly, the efficiency advantages of our scheme will become increasingly prominent as the number of model parameters increases.

6/3/2024