How Private are DP-SGD Implementations?

2403.17673

Published 6/7/2024 by Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Abstract

We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ. While shuffling-based DP-SGD is more commonly used in practical implementations, it has not been amenable to easy privacy analysis, either analytically or even numerically. On the other hand, Poisson subsampling-based DP-SGD is challenging to scalably implement, but has a well-understood privacy analysis, with multiple open-source numerically tight privacy accountants available. This has led to a common practice of using shuffling-based DP-SGD in practice, but using the privacy analysis for the corresponding Poisson subsampling version. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling, and thus advises caution in reporting privacy parameters for DP-SGD.

Create account to get full access

Overview

Explores the privacy guarantees of differentially private stochastic gradient descent (DP-SGD), a popular machine learning technique for training models while preserving user privacy.
Investigates the effectiveness of DP-SGD in protecting individual privacy and the factors that influence its privacy guarantees.
Provides insights into the practical considerations and limitations of using DP-SGD for real-world applications.

Plain English Explanation

Differentially private stochastic gradient descent (DP-SGD) is a machine learning technique that aims to train models while protecting the privacy of the individuals whose data is used for training. The key idea behind DP-SGD is to introduce carefully controlled noise into the training process, which helps to obscure the influence of any single individual's data on the final model.

This paper examines how private DP-SGD really is in practice. It looks at the various factors that can affect the privacy guarantees of DP-SGD, such as the size of the training dataset, the number of training steps, and the amount of noise added to the gradients. The researchers also explore the trade-offs between privacy and model performance, as adding more noise can improve privacy but may also degrade the accuracy of the trained model.

The paper provides a detailed technical analysis of DP-SGD, but also presents the findings in a way that is accessible to a general audience. The authors use clear language and provide helpful analogies to explain complex concepts, such as the notion of "differential privacy" and how it relates to protecting individual privacy in machine learning.

Overall, this paper offers valuable insights into the practical limitations and considerations of using DP-SGD for real-world applications, which can help researchers and practitioners make more informed decisions about when and how to apply this privacy-preserving technique.

Technical Explanation

The paper examines the privacy guarantees of differentially private stochastic gradient descent (DP-SGD), a popular machine learning technique for training models while preserving user privacy. The authors investigate the factors that influence the privacy bounds of DP-SGD, including the size of the training dataset, the number of training steps, and the amount of noise added to the gradients.

The researchers present a novel theoretical analysis of the privacy of DP-SGD, which builds on the work on nearly tight black-box auditing of differentially private algorithms. They also introduce the concept of "adaptive batch linear queries" and "batch samplers" to model the privacy analysis of DP-SGD more accurately.

The paper's technical contributions include:

A tight analysis of the privacy guarantees of DP-SGD, which takes into account the adaptive nature of the algorithm and the batch sampling process.
A demonstration that the standard privacy analysis of DP-SGD can be overly optimistic, particularly in settings with small datasets or a large number of training steps.
Insights into the practical trade-offs between privacy and model performance when using DP-SGD, which can help guide the choice of hyperparameters in real-world applications.

The authors also discuss the connections between their work and other related research, such as avoiding pitfalls in privacy accounting for subsampled mechanisms, LazyDP: co-designing algorithm and software for scalable training, and optimal rates for differentially private stochastic convex optimization in a single epoch.

Critical Analysis

The paper provides a comprehensive analysis of the privacy guarantees of DP-SGD, highlighting the importance of considering the adaptive nature of the algorithm and the batch sampling process when evaluating its privacy properties. The authors' theoretical analysis is rigorous and well-grounded in the existing literature on differential privacy.

One potential limitation of the study is that it focuses primarily on the privacy aspects of DP-SGD, without delving deeply into the implications for model performance and practical deployment. While the paper touches on the trade-offs between privacy and accuracy, further exploration of these trade-offs and their impact on real-world applications could be beneficial.

Additionally, the paper does not address the issue of uncertainty quantification in differentially private machine learning, which is an important consideration for the reliability and interpretability of DP-SGD models.

Overall, the paper makes a valuable contribution to the understanding of the privacy properties of DP-SGD, and its findings can inform the design and deployment of privacy-preserving machine learning systems. However, further research may be needed to fully address the practical challenges and considerations of using DP-SGD in real-world settings.

Conclusion

This paper provides a comprehensive analysis of the privacy guarantees of differentially private stochastic gradient descent (DP-SGD), a popular machine learning technique for training models while preserving user privacy. The authors present a novel theoretical analysis that takes into account the adaptive nature of DP-SGD and the batch sampling process, and they demonstrate that the standard privacy analysis can be overly optimistic in certain settings.

The paper offers valuable insights into the practical trade-offs between privacy and model performance when using DP-SGD, which can help guide the choice of hyperparameters in real-world applications. While the study focuses primarily on the privacy aspects, it also highlights the need for further research on the implications for model performance and practical deployment, as well as the importance of addressing uncertainty quantification in differentially private machine learning.

Overall, this paper contributes to the growing body of research on the practical challenges and considerations of using privacy-preserving techniques in machine learning, and it can serve as a valuable resource for researchers and practitioners seeking to understand and apply DP-SGD in their own work.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

👀

Nearly Tight Black-Box Auditing of Differentially Private Machine Learning

Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro

This paper presents a nearly tight audit of the Differentially Private Stochastic Gradient Descent (DP-SGD) algorithm in the black-box model. Our auditing procedure empirically estimates the privacy leakage from DP-SGD using membership inference attacks; unlike prior work, the estimates are appreciably close to the theoretical DP bounds. The main intuition is to craft worst-case initial model parameters, as DP-SGD's privacy analysis is agnostic to the choice of the initial model parameters. For models trained with theoretical $varepsilon=10.0$ on MNIST and CIFAR-10, our auditing procedure yields empirical estimates of $7.21$ and $6.95$, respectively, on 1,000-record samples and $6.48$ and $4.96$ on the full datasets. By contrast, previous work achieved tight audits only in stronger (i.e., less realistic) white-box models that allow the adversary to access the model's inner parameters and insert arbitrary gradients. Our auditing procedure can be used to detect bugs and DP violations more easily and offers valuable insight into how the privacy analysis of DP-SGD can be further improved.

5/24/2024

cs.CR cs.LG

🌐

Unified Mechanism-Specific Amplification by Subsampling and Group Privacy Amplification

Jan Schuchardt, Mihail Stoian, Arthur Kosmala, Stephan Gunnemann

Amplification by subsampling is one of the main primitives in machine learning with differential privacy (DP): Training a model on random batches instead of complete datasets results in stronger privacy. This is traditionally formalized via mechanism-agnostic subsampling guarantees that express the privacy parameters of a subsampled mechanism as a function of the original mechanism's privacy parameters. We propose the first general framework for deriving mechanism-specific guarantees, which leverage additional information beyond these parameters to more tightly characterize the subsampled mechanism's privacy. Such guarantees are of particular importance for privacy accounting, i.e., tracking privacy over multiple iterations. Overall, our framework based on conditional optimal transport lets us derive existing and novel guarantees for approximate DP, accounting with R'enyi DP, and accounting with dominating pairs in a unified, principled manner. As an application, we analyze how subsampling affects the privacy of groups of multiple users. Our tight mechanism-specific bounds outperform tight mechanism-agnostic bounds and classic group privacy results.

6/12/2024

cs.CR cs.LG stat.ML

Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

Christian Janos Lebeda, Matthew Regehr, Gautam Kamath, Thomas Steinke

We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the composition of a subsampled mechanism are determined by self-composing the worst-case datasets for the uncomposed mechanism. We show that this is not true in general. Second, Poisson subsampling is sometimes assumed to have similar privacy guarantees compared to sampling without replacement. We show that the privacy guarantees may in fact differ significantly between the two sampling schemes. In particular, we give an example of hyperparameters that result in $varepsilon approx 1$ for Poisson subsampling and $varepsilon > 10$ for sampling without replacement. This occurs for some parameters that could realistically be chosen for DP-SGD.

6/3/2024

cs.CR cs.DS cs.LG stat.ML

🌀

Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation

Ossi Raisa, Joonas Jalko, Antti Honkela

We study how the batch size affects the total gradient variance in differentially private stochastic gradient descent (DP-SGD), seeking a theoretical explanation for the usefulness of large batch sizes. As DP-SGD is the basis of modern DP deep learning, its properties have been widely studied, and recent works have empirically found large batch sizes to be beneficial. However, theoretical explanations of this benefit are currently heuristic at best. We first observe that the total gradient variance in DP-SGD can be decomposed into subsampling-induced and noise-induced variances. We then prove that in the limit of an infinite number of iterations, the effective noise-induced variance is invariant to the batch size. The remaining subsampling-induced variance decreases with larger batch sizes, so large batches reduce the effective total gradient variance. We confirm numerically that the asymptotic regime is relevant in practical settings when the batch size is not small, and find that outside the asymptotic regime, the total gradient variance decreases even more with large batch sizes. We also find a sufficient condition that implies that large batch sizes similarly reduce effective DP noise variance for one iteration of DP-SGD.

6/13/2024

stat.ML cs.CR cs.LG