Unified Mechanism-Specific Amplification by Subsampling and Group Privacy Amplification

Read original: arXiv:2403.04867 - Published 6/12/2024 by Jan Schuchardt, Mihail Stoian, Arthur Kosmala, Stephan Gunnemann

🌐

Overview

Amplification by subsampling is a key technique in machine learning with differential privacy
Subsampling, or training on random batches instead of full datasets, can improve privacy
This paper proposes a new framework for deriving tighter, mechanism-specific privacy guarantees for subsampled mechanisms

Plain English Explanation

Differential privacy is a way to protect people's privacy when using their data to train machine learning models. One important technique in differential privacy is subsampling - instead of training on a full dataset, the model is trained on random subsets or "batches" of the data. This can help improve privacy, but the exact privacy guarantees are complex to calculate.

This paper introduces a new framework that can derive more precise privacy guarantees for subsampled mechanisms. Rather than relying on general rules, the framework leverages additional details about the specific mechanism being used to get tighter bounds on the privacy loss. This is especially important for privacy accounting, which is the process of tracking privacy over multiple rounds of training.

The paper shows how this framework can be used to analyze the privacy of subsampling for different types of differential privacy, like approximate DP and Rényi DP. It also looks at how subsampling affects the privacy of groups of users, going beyond classic "group privacy" results.

Overall, this new framework provides a more principled and powerful way to understand the privacy benefits of subsampling in machine learning with differential privacy.

Technical Explanation

The paper proposes a general framework for deriving mechanism-specific subsampling guarantees, which leverage additional details about the subsampled mechanism beyond just the original privacy parameters. This allows for tighter characterization of the subsampled mechanism's privacy compared to prior mechanism-agnostic approaches.

The framework is based on conditional optimal transport, which provides a way to relate the privacy of the subsampled mechanism to the privacy of the original mechanism. This lets the authors derive both existing and novel subsampling guarantees for different differential privacy variants, like approximate DP and Rényi DP, in a unified manner.

As a key application, the paper analyzes how subsampling affects the privacy of groups of users. The mechanism-specific bounds derived with this framework outperform both tight mechanism-agnostic bounds as well as classic group privacy results.

Critical Analysis

The paper makes a technical contribution by providing a principled framework for deriving mechanism-specific subsampling guarantees. This is an important advance, as prior work relied on more general, mechanism-agnostic rules that could not capture the full nuances of different DP mechanisms.

That said, the framework is quite complex and may be challenging for practitioners to apply directly. The authors acknowledge this and suggest the framework is more suited for privacy analysis by experts than for direct use in system design.

Additionally, the paper focuses on the analysis of subsampling, but does not address other important aspects of privacy accounting, such as the effects of other common techniques like gradient clipping or differential private stochastic gradient descent (DP-SGD). Further research is needed to fully understand the privacy properties of end-to-end DP machine learning systems.

Conclusion

This paper introduces a powerful new framework for deriving tight, mechanism-specific privacy guarantees for subsampled differential privacy mechanisms. By leveraging conditional optimal transport, the framework can capture more details about the specific mechanism being used, leading to improved privacy accounting compared to prior mechanism-agnostic approaches.

While the technical details may be complex, this work represents an important advance in the rigorous analysis of privacy amplification techniques. As machine learning systems continue to be deployed in sensitive domains, tools like this framework will be crucial for ensuring robust privacy protections for individuals' data.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌐

Unified Mechanism-Specific Amplification by Subsampling and Group Privacy Amplification

Jan Schuchardt, Mihail Stoian, Arthur Kosmala, Stephan Gunnemann

Amplification by subsampling is one of the main primitives in machine learning with differential privacy (DP): Training a model on random batches instead of complete datasets results in stronger privacy. This is traditionally formalized via mechanism-agnostic subsampling guarantees that express the privacy parameters of a subsampled mechanism as a function of the original mechanism's privacy parameters. We propose the first general framework for deriving mechanism-specific guarantees, which leverage additional information beyond these parameters to more tightly characterize the subsampled mechanism's privacy. Such guarantees are of particular importance for privacy accounting, i.e., tracking privacy over multiple iterations. Overall, our framework based on conditional optimal transport lets us derive existing and novel guarantees for approximate DP, accounting with R'enyi DP, and accounting with dominating pairs in a unified, principled manner. As an application, we analyze how subsampling affects the privacy of groups of multiple users. Our tight mechanism-specific bounds outperform tight mechanism-agnostic bounds and classic group privacy results.

6/12/2024

Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

Christian Janos Lebeda, Matthew Regehr, Gautam Kamath, Thomas Steinke

We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the composition of a subsampled mechanism are determined by self-composing the worst-case datasets for the uncomposed mechanism. We show that this is not true in general. Second, Poisson subsampling is sometimes assumed to have similar privacy guarantees compared to sampling without replacement. We show that the privacy guarantees may in fact differ significantly between the two sampling schemes. In particular, we give an example of hyperparameters that result in $varepsilon approx 1$ for Poisson subsampling and $varepsilon > 10$ for sampling without replacement. This occurs for some parameters that could realistically be chosen for DP-SGD.

6/3/2024

🔎

Privacy Amplification for Matrix Mechanisms

Christopher A. Choquette-Choo, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta

Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In this paper, we propose MMCC, the first algorithm to analyze privacy amplification via sampling for any generic matrix mechanism. MMCC is nearly tight in that it approaches a lower bound as $epsilonto0$. To analyze correlated outputs in MMCC, we prove that they can be analyzed as if they were independent, by conditioning them on prior outputs. Our conditional composition theorem has broad utility: we use it to show that the noise added to binary-tree-DP-FTRL can asymptotically match the noise added to DP-SGD with amplification. Our amplification algorithm also has practical empirical utility: we show it leads to significant improvement in the privacy-utility trade-offs for DP-FTRL algorithms on standard benchmarks.

5/7/2024

How Private are DP-SGD Implementations?

Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ. While shuffling-based DP-SGD is more commonly used in practical implementations, it has not been amenable to easy privacy analysis, either analytically or even numerically. On the other hand, Poisson subsampling-based DP-SGD is challenging to scalably implement, but has a well-understood privacy analysis, with multiple open-source numerically tight privacy accountants available. This has led to a common practice of using shuffling-based DP-SGD in practice, but using the privacy analysis for the corresponding Poisson subsampling version. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling, and thus advises caution in reporting privacy parameters for DP-SGD.

6/7/2024