ULDP-FL: Federated Learning with Across Silo User-Level Differential Privacy

Read original: arXiv:2308.12210 - Published 6/18/2024 by Fumiyuki Kato, Li Xiong, Shun Takagi, Yang Cao, Masatoshi Yoshikawa

🤯

Overview

Differentially Private Federated Learning (DP-FL) is a collaborative machine learning approach that ensures formal privacy.
Most DP-FL methods ensure differential privacy (DP) at the record-level within each silo (group) for cross-silo federated learning.
However, when a single user's data spans multiple silos, the desired user-level DP guarantee remains unknown.
This study presents Uldp-FL, a novel federated learning framework designed to guarantee user-level DP in cross-silo federated learning scenarios where a user's data may belong to multiple silos.

Plain English Explanation

Federated learning is a way for different organizations to collaborate on training a machine learning model without directly sharing their sensitive data. Differentially Private Federated Learning (DP-FL) is an approach that adds extra privacy protections to this process.

Most DP-FL methods focus on ensuring privacy at the individual record level within each organization or "silo". However, in reality, a single person's data may be spread across multiple silos. The researchers behind this paper wanted to find a way to protect the privacy of individual users, even when their data is split up.

To do this, they developed a new federated learning framework called Uldp-FL. This system directly ensures user-level differential privacy, meaning it protects the privacy of individual users, rather than just individual records. It does this through a novel technique called "per-user weighted clipping," which is different from previous "group-privacy" approaches.

The researchers also came up with ways to improve the usefulness of the Uldp-FL system, by carefully weighting the contributions of different users based on how their data is distributed. And they designed a new private protocol to ensure that no extra information is revealed to the organizations or the central server during the federated learning process.

Through experiments on real-world datasets, the researchers showed that their Uldp-FL framework provides substantially better privacy-utility tradeoffs compared to existing methods, when the goal is to protect the privacy of individual users in cross-silo federated learning.

Technical Explanation

The key innovation in this paper is the Uldp-FL framework, which is designed to provide user-level differential privacy guarantees in cross-silo federated learning scenarios.

Unlike prior DP-FL approaches that focus on record-level privacy within each silo, Uldp-FL directly ensures user-level DP by using a "per-user weighted clipping" technique, rather than a group-privacy approach. This means that the system protects the privacy of individual users, even when their data is spread across multiple silos.

The researchers provide a theoretical analysis of Uldp-FL's privacy and utility guarantees. They also enhance the utility of the system through an improved weighting strategy that considers the distribution of each user's data records. Additionally, they design a novel private protocol that ensures no extra information is revealed to the silos or the central server during the federated learning process.

Experiments on real-world datasets demonstrate that Uldp-FL significantly outperforms baseline DP-FL methods in terms of privacy-utility tradeoffs when the goal is to achieve user-level differential privacy in cross-silo federated learning. Differentially Private Hierarchical Federated Learning, Noise-Aware Algorithm for Heterogeneous Differentially Private Federated, FedLAP: DP Federated Learning by Sharing Differentially, and Multi-Level Personalized Federated Learning for Heterogeneous Long are some related works in this space.

Critical Analysis

The paper acknowledges that while Uldp-FL provides user-level differential privacy guarantees, there may be some inherent limitations to this approach. For example, the researchers note that their weighting strategy, while improving utility, could potentially reveal additional information about user data distributions to the central server.

Additionally, the paper does not explore how Uldp-FL would perform in scenarios with highly skewed or imbalanced user data distributions across silos. This could be an important consideration, as unequal representation of user data could impact the overall model performance and privacy-utility tradeoffs.

Further research could also investigate the scalability of Uldp-FL as the number of users and silos increases, as well as its robustness to potential attacks or adversarial behavior from malicious actors within the federated learning system.

Conclusion

This study presents Uldp-FL, a novel federated learning framework that directly ensures user-level differential privacy in cross-silo federated learning scenarios. By using a per-user weighted clipping approach and an enhanced weighting strategy, Uldp-FL is able to achieve substantial improvements in privacy-utility tradeoffs compared to baseline DP-FL methods.

The authors' work is the first to effectively provide user-level differential privacy guarantees in the general cross-silo federated learning setting. This is a significant advancement, as it addresses a crucial limitation of prior DP-FL approaches and brings us closer to realizing the full potential of federated learning while maintaining strong privacy protections for individual users.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤯

ULDP-FL: Federated Learning with Across Silo User-Level Differential Privacy

Fumiyuki Kato, Li Xiong, Shun Takagi, Yang Cao, Masatoshi Yoshikawa

Differentially Private Federated Learning (DP-FL) has garnered attention as a collaborative machine learning approach that ensures formal privacy. Most DP-FL approaches ensure DP at the record-level within each silo for cross-silo FL. However, a single user's data may extend across multiple silos, and the desired user-level DP guarantee for such a setting remains unknown. In this study, we present Uldp-FL, a novel FL framework designed to guarantee user-level DP in cross-silo FL where a single user's data may belong to multiple silos. Our proposed algorithm directly ensures user-level DP through per-user weighted clipping, departing from group-privacy approaches. We provide a theoretical analysis of the algorithm's privacy and utility. Additionally, we enhance the utility of the proposed algorithm with an enhanced weighting strategy based on user record distribution and design a novel private protocol that ensures no additional information is revealed to the silos and the server. Experiments on real-world datasets show substantial improvements in our methods in privacy-utility trade-offs under user-level DP compared to baseline methods. To the best of our knowledge, our work is the first FL framework that effectively provides user-level DP in the general cross-silo FL setting.

6/18/2024

❗

Cross-silo Federated Learning with Record-level Personalized Differential Privacy

Junxu Liu, Jian Lou, Li Xiong, Jinfei Liu, Xiaofeng Meng

Federated learning (FL) enhanced by differential privacy has emerged as a popular approach to better safeguard the privacy of client-side data by protecting clients' contributions during the training process. Existing solutions typically assume a uniform privacy budget for all records and provide one-size-fits-all solutions that may not be adequate to meet each record's privacy requirement. In this paper, we explore the uncharted territory of cross-silo FL with record-level personalized differential privacy. We devise a novel framework named textit{rPDP-FL}, employing a two-stage hybrid sampling scheme with both uniform client-level sampling and non-uniform record-level sampling to accommodate varying privacy requirements. A critical and non-trivial problem is how to determine the ideal per-record sampling probability $q$ given the personalized privacy budget $varepsilon$. We introduce a versatile solution named textit{Simulation-CurveFitting}, allowing us to uncover a significant insight into the nonlinear correlation between $q$ and $varepsilon$ and derive an elegant mathematical model to tackle the problem. Our evaluation demonstrates that our solution can provide significant performance gains over the baselines that do not consider personalized privacy preservation.

7/2/2024

Mitigating Disparate Impact of Differential Privacy in Federated Learning through Robust Clustering

Saber Malekmohammadi, Afaf Taik, Golnoosh Farnadi

Federated Learning (FL) is a decentralized machine learning (ML) approach that keeps data localized and often incorporates Differential Privacy (DP) to enhance privacy guarantees. Similar to previous work on DP in ML, we observed that differentially private federated learning (DPFL) introduces performance disparities, particularly affecting minority groups. Recent work has attempted to address performance fairness in vanilla FL through clustering, but this method remains sensitive and prone to errors, which are further exacerbated by the DP noise in DPFL. To fill this gap, in this paper, we propose a novel clustered DPFL algorithm designed to effectively identify clients' clusters in highly heterogeneous settings while maintaining high accuracy with DP guarantees. To this end, we propose to cluster clients based on both their model updates and training loss values. Our proposed approach also addresses the server's uncertainties in clustering clients' model updates by employing larger batch sizes along with Gaussian Mixture Model (GMM) to alleviate the impact of noise and potential clustering errors, especially in privacy-sensitive scenarios. We provide theoretical analysis of the effectiveness of our proposed approach. We also extensively evaluate our approach across diverse data distributions and privacy budgets and show its effectiveness in mitigating the disparate impact of DP in FL settings with a small computational cost.

5/30/2024

Universally Harmonizing Differential Privacy Mechanisms for Federated Learning: Boosting Accuracy and Convergence

Shuya Feng, Meisam Mohammady, Hanbin Hong, Shenao Yan, Ashish Kundu, Binghui Wang, Yuan Hong

Differentially private federated learning (DP-FL) is a promising technique for collaborative model training while ensuring provable privacy for clients. However, optimizing the tradeoff between privacy and accuracy remains a critical challenge. To our best knowledge, we propose the first DP-FL framework (namely UDP-FL), which universally harmonizes any randomization mechanism (e.g., an optimal one) with the Gaussian Moments Accountant (viz. DP-SGD) to significantly boost accuracy and convergence. Specifically, UDP-FL demonstrates enhanced model performance by mitigating the reliance on Gaussian noise. The key mediator variable in this transformation is the R'enyi Differential Privacy notion, which is carefully used to harmonize privacy budgets. We also propose an innovative method to theoretically analyze the convergence for DP-FL (including our UDP-FL ) based on mode connectivity analysis. Moreover, we evaluate our UDP-FL through extensive experiments benchmarked against state-of-the-art (SOTA) methods, demonstrating superior performance on both privacy guarantees and model performance. Notably, UDP-FL exhibits substantial resilience against different inference attacks, indicating a significant advance in safeguarding sensitive data in federated learning environments.

7/25/2024