Cross-silo Federated Learning with Record-level Personalized Differential Privacy

Read original: arXiv:2401.16251 - Published 7/2/2024 by Junxu Liu, Jian Lou, Li Xiong, Jinfei Liu, Xiaofeng Meng

❗

Overview

The paper explores a novel approach to federated learning (FL) with personalized differential privacy at the record level.
Existing solutions typically assume a uniform privacy budget for all records, which may not be adequate to meet the varying privacy requirements of individual records.
The proposed framework, called rPDP-FL, employs a two-stage hybrid sampling scheme to accommodate varying privacy requirements.
A critical problem is determining the ideal per-record sampling probability given the personalized privacy budget, which is tackled using a versatile solution called Simulation-CurveFitting.

Plain English Explanation

Federated learning is a way to train machine learning models without having to share sensitive data between devices or organizations. Differential privacy is a technique used to protect the privacy of individual data records during the training process.

The paper explores a new approach that combines federated learning with personalized differential privacy. This means that each individual data record can have its own unique privacy requirements, rather than a one-size-fits-all solution.

The researchers propose a framework called rPDP-FL that uses a two-stage sampling scheme to accommodate these varying privacy needs. A key challenge is figuring out the best way to determine the sampling probability for each record based on its privacy budget. The researchers introduce a solution called Simulation-CurveFitting to solve this problem.

The goal is to provide better privacy protections for individuals while still allowing effective federated learning to take place. This could be useful in applications where data privacy is particularly important, such as healthcare or finance.

Technical Explanation

The paper presents a novel framework called rPDP-FL that combines federated learning with personalized differential privacy at the record level. Existing solutions typically assume a uniform privacy budget for all records, which may not be adequate to meet each record's unique privacy requirements.

The rPDP-FL framework employs a two-stage hybrid sampling scheme. The first stage involves uniform client-level sampling, while the second stage uses non-uniform record-level sampling to accommodate varying privacy needs. A critical challenge is determining the ideal per-record sampling probability q given the personalized privacy budget ε.

To address this, the authors introduce a solution called Simulation-CurveFitting. This versatile approach allows the researchers to uncover the nonlinear correlation between q and ε, and derive an elegant mathematical model to tackle the problem.

The evaluation demonstrates that the proposed solution can provide significant performance gains over baselines that do not consider personalized privacy preservation, as seen in [related work](https://aimodels.fyi/papers/arxiv/mitigating-disparate-impact-differential-privacy-federated-learning, https://aimodels.fyi/papers/arxiv/differentially-private-hierarchical-federated-learning, https://aimodels.fyi/papers/arxiv/qmgeo-differentially-private-federated-learning-via-stochastic).

Critical Analysis

The paper presents a novel and promising approach to enhancing federated learning with personalized differential privacy. However, there are a few potential limitations and areas for further research:

The paper focuses on the cross-silo federated learning setting, which may not capture the full complexity of real-world federated learning scenarios involving many distributed clients.
The Simulation-CurveFitting approach, while effective, relies on extensive simulations and may not be feasible for large-scale deployments with limited computational resources.
The paper does not explore the potential trade-offs between privacy, model accuracy, and system performance in depth, which would be important for practical applications.

Additionally, further research could investigate the impact of personalized differential privacy on model fairness and the potential for disparate treatment across different subgroups of the population. Exploring these areas could help to better understand the broader implications and limitations of the proposed approach.

Conclusion

This paper presents a novel framework called rPDP-FL that combines federated learning with personalized differential privacy at the record level. By employing a two-stage hybrid sampling scheme and the Simulation-CurveFitting solution, the researchers demonstrate significant performance gains over baselines that do not consider personalized privacy preservation.

The proposed approach has the potential to enable more effective and privacy-preserving federated learning, particularly in domains where data privacy is of utmost concern. Further research is needed to explore the broader implications and limitations of this approach, but the work represents an important step forward in the field of privacy-preserving machine learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

❗

Cross-silo Federated Learning with Record-level Personalized Differential Privacy

Junxu Liu, Jian Lou, Li Xiong, Jinfei Liu, Xiaofeng Meng

Federated learning (FL) enhanced by differential privacy has emerged as a popular approach to better safeguard the privacy of client-side data by protecting clients' contributions during the training process. Existing solutions typically assume a uniform privacy budget for all records and provide one-size-fits-all solutions that may not be adequate to meet each record's privacy requirement. In this paper, we explore the uncharted territory of cross-silo FL with record-level personalized differential privacy. We devise a novel framework named textit{rPDP-FL}, employing a two-stage hybrid sampling scheme with both uniform client-level sampling and non-uniform record-level sampling to accommodate varying privacy requirements. A critical and non-trivial problem is how to determine the ideal per-record sampling probability $q$ given the personalized privacy budget $varepsilon$. We introduce a versatile solution named textit{Simulation-CurveFitting}, allowing us to uncover a significant insight into the nonlinear correlation between $q$ and $varepsilon$ and derive an elegant mathematical model to tackle the problem. Our evaluation demonstrates that our solution can provide significant performance gains over the baselines that do not consider personalized privacy preservation.

7/2/2024

🤯

ULDP-FL: Federated Learning with Across Silo User-Level Differential Privacy

Fumiyuki Kato, Li Xiong, Shun Takagi, Yang Cao, Masatoshi Yoshikawa

Differentially Private Federated Learning (DP-FL) has garnered attention as a collaborative machine learning approach that ensures formal privacy. Most DP-FL approaches ensure DP at the record-level within each silo for cross-silo FL. However, a single user's data may extend across multiple silos, and the desired user-level DP guarantee for such a setting remains unknown. In this study, we present Uldp-FL, a novel FL framework designed to guarantee user-level DP in cross-silo FL where a single user's data may belong to multiple silos. Our proposed algorithm directly ensures user-level DP through per-user weighted clipping, departing from group-privacy approaches. We provide a theoretical analysis of the algorithm's privacy and utility. Additionally, we enhance the utility of the proposed algorithm with an enhanced weighting strategy based on user record distribution and design a novel private protocol that ensures no additional information is revealed to the silos and the server. Experiments on real-world datasets show substantial improvements in our methods in privacy-utility trade-offs under user-level DP compared to baseline methods. To the best of our knowledge, our work is the first FL framework that effectively provides user-level DP in the general cross-silo FL setting.

6/18/2024

Enhancing Federated Learning with Adaptive Differential Privacy and Priority-Based Aggregation

Mahtab Talaei, Iman Izadi

Federated learning (FL), a novel branch of distributed machine learning (ML), develops global models through a private procedure without direct access to local datasets. However, it is still possible to access the model updates (gradient updates of deep neural networks) transferred between clients and servers, potentially revealing sensitive local information to adversaries using model inversion attacks. Differential privacy (DP) offers a promising approach to addressing this issue by adding noise to the parameters. On the other hand, heterogeneities in data structure, storage, communication, and computational capabilities of devices can cause convergence problems and delays in developing the global model. A personalized weighted averaging of local parameters based on the resources of each device can yield a better aggregated model in each round. In this paper, to efficiently preserve privacy, we propose a personalized DP framework that injects noise based on clients' relative impact factors and aggregates parameters while considering heterogeneities and adjusting properties. To fulfill the DP requirements, we first analyze the convergence boundary of the FL algorithm when impact factors are personalized and fixed throughout the learning process. We then further study the convergence property considering time-varying (adaptive) impact factors.

6/27/2024

Differentially-Private Hierarchical Federated Learning

Frank Po-Chen Lin, Christopher Brinton

While federated learning (FL) eliminates the transmission of raw data over a network, it is still vulnerable to privacy breaches from the communicated model parameters. In this work, we propose underline{H}ierarchical underline{F}ederated Learning with underline{H}ierarchical underline{D}ifferential underline{P}rivacy ({tt H$^2$FDP}), a DP-enhanced FL methodology for jointly optimizing privacy and performance in hierarchical networks. Building upon recent proposals for Hierarchical Differential Privacy (HDP), one of the key concepts of {tt H$^2$FDP} is adapting DP noise injection at different layers of an established FL hierarchy -- edge devices, edge servers, and cloud servers -- according to the trust models within particular subnetworks. We conduct a comprehensive analysis of the convergence behavior of {tt H$^2$FDP}, revealing conditions on parameter tuning under which the training process converges sublinearly to a finite stationarity gap that depends on the network hierarchy, trust model, and target privacy level. Leveraging these relationships, we develop an adaptive control algorithm for {tt H$^2$FDP} that tunes properties of local model training to minimize communication energy, latency, and the stationarity gap while striving to maintain a sub-linear convergence rate and meet desired privacy criteria. Subsequent numerical evaluations demonstrate that {tt H$^2$FDP} obtains substantial improvements in these metrics over baselines for different privacy budgets, and validate the impact of different system configurations.

5/17/2024