Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

Read original: arXiv:2405.20769 - Published 6/3/2024 by Christian Janos Lebeda, Matthew Regehr, Gautam Kamath, Thomas Steinke

Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

Overview

This paper presents methods to avoid privacy accounting pitfalls when composing subsampled differential privacy mechanisms.
The authors provide rigorous analysis and practical guidelines to help ensure accurate privacy guarantees when using subsampling techniques in differential privacy.
Key contributions include new theoretical results, connections to existing work, and empirical demonstrations of the proposed techniques.

Plain English Explanation

Differential privacy is a powerful framework for protecting the privacy of individuals in data analysis and machine learning. It works by adding carefully calibrated random noise to the output of computations, ensuring that the results don't reveal too much about any one person's data.

Differential privacy via distributionally robust optimization is one way to achieve differential privacy, and subsampling is a common technique used to make these mechanisms more efficient. However, correctly accounting for the privacy loss when composing subsampled mechanisms can be tricky.

This paper tackles that challenge by providing new mathematical analysis and practical guidelines. The authors show how to properly track the privacy loss when chaining together multiple subsampled differentially private computations, ensuring that the overall privacy guarantee is accurately captured. They also demonstrate these techniques through empirical examples, helping practitioners apply the methods effectively.

By addressing these nuanced privacy accounting issues, the work helps make differential privacy more robust and reliable, supporting its broader adoption in real-world data-driven applications.

Technical Explanation

The key technical contributions of the paper are:

New Theoretical Results: The authors derive tight composition theorems for subsampled differentially private mechanisms, providing improved privacy accounting over previous approaches. This includes novel privacy amplification bounds that quantify the privacy benefits of subsampling.
Connections to Existing Work: The paper establishes connections between their results and prior work on differential privacy, including universal exact compression of differentially private mechanisms and differentially private synthetic data generation. These connections yield insights and generalize the applicability of the proposed techniques.
Empirical Demonstrations: The authors provide experimental evaluations that showcase the practical benefits of their privacy accounting methods. They demonstrate how the techniques can improve the accuracy of differentially private algorithms, especially when composing multiple subsampled mechanisms.

The theoretical and empirical contributions of the paper help address key challenges in the rigorous application of differential privacy, supporting its use in real-world data analysis and machine learning tasks.

Critical Analysis

The paper provides a comprehensive treatment of the privacy accounting challenges that can arise when composing subsampled differentially private mechanisms. The authors thoroughly analyze the problem, derive new theoretical results, and demonstrate the practical value of their approach.

One potential limitation is that the paper focuses primarily on subsampling techniques, whereas there are other approaches (e.g., differentially private projection depth-based medians) that could also benefit from improved privacy accounting methods. Extending the analysis to a broader class of differential privacy mechanisms could further broaden the impact of this work.

Additionally, the paper does not explore the computational complexity implications of the proposed privacy accounting methods. Understanding the tradeoffs between the tightness of the privacy bounds and the computational efficiency could inform practical deployment scenarios.

Overall, the paper makes a valuable contribution to the field of differential privacy by providing rigorous theoretical analysis and practical guidance to help researchers and practitioners navigate the pitfalls of composing subsampled mechanisms.

Conclusion

This paper addresses an important challenge in the application of differential privacy: accurately accounting for privacy loss when composing subsampled mechanisms. The authors' theoretical analysis, connections to related work, and empirical demonstrations provide a comprehensive framework for practitioners to apply subsampling techniques while ensuring reliable privacy guarantees.

By addressing these nuanced privacy accounting issues, the work helps make differential privacy more robust and reliable, supporting its broader adoption in real-world data-driven applications, such as differentially private high-dimensional model selection. The insights and practical guidelines presented in this paper represent an important step forward in the practical application of differential privacy.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

Christian Janos Lebeda, Matthew Regehr, Gautam Kamath, Thomas Steinke

We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the composition of a subsampled mechanism are determined by self-composing the worst-case datasets for the uncomposed mechanism. We show that this is not true in general. Second, Poisson subsampling is sometimes assumed to have similar privacy guarantees compared to sampling without replacement. We show that the privacy guarantees may in fact differ significantly between the two sampling schemes. In particular, we give an example of hyperparameters that result in $varepsilon approx 1$ for Poisson subsampling and $varepsilon > 10$ for sampling without replacement. This occurs for some parameters that could realistically be chosen for DP-SGD.

6/3/2024

🌐

Unified Mechanism-Specific Amplification by Subsampling and Group Privacy Amplification

Jan Schuchardt, Mihail Stoian, Arthur Kosmala, Stephan Gunnemann

Amplification by subsampling is one of the main primitives in machine learning with differential privacy (DP): Training a model on random batches instead of complete datasets results in stronger privacy. This is traditionally formalized via mechanism-agnostic subsampling guarantees that express the privacy parameters of a subsampled mechanism as a function of the original mechanism's privacy parameters. We propose the first general framework for deriving mechanism-specific guarantees, which leverage additional information beyond these parameters to more tightly characterize the subsampled mechanism's privacy. Such guarantees are of particular importance for privacy accounting, i.e., tracking privacy over multiple iterations. Overall, our framework based on conditional optimal transport lets us derive existing and novel guarantees for approximate DP, accounting with R'enyi DP, and accounting with dominating pairs in a unified, principled manner. As an application, we analyze how subsampling affects the privacy of groups of multiple users. Our tight mechanism-specific bounds outperform tight mechanism-agnostic bounds and classic group privacy results.

6/12/2024

How Private are DP-SGD Implementations?

Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ. While shuffling-based DP-SGD is more commonly used in practical implementations, it has not been amenable to easy privacy analysis, either analytically or even numerically. On the other hand, Poisson subsampling-based DP-SGD is challenging to scalably implement, but has a well-understood privacy analysis, with multiple open-source numerically tight privacy accountants available. This has led to a common practice of using shuffling-based DP-SGD in practice, but using the privacy analysis for the corresponding Poisson subsampling version. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling, and thus advises caution in reporting privacy parameters for DP-SGD.

6/7/2024

📈

Notes on Sampled Gaussian Mechanism

Nikita P. Kalinin

In these notes, we prove a recent conjecture posed in the paper by Raisa, O. et al. [Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimization (2024)]. Theorem 6.2 of the paper asserts that for the Sampled Gaussian Mechanism - a composition of subsampling and additive Gaussian noise, the effective noise level, $sigma_{text{eff}} = frac{sigma(q)}{q}$, decreases as a function of the subsampling rate $q$. Consequently, larger subsampling rates are preferred for better privacy-utility trade-offs. Our notes provide a rigorous proof of Conjecture 6.3, which was left unresolved in the original paper, thereby completing the proof of Theorem 6.2.

9/10/2024