Privacy of the last iterate in cyclically-sampled DP-SGD on nonconvex composite losses

Read original: arXiv:2407.05237 - Published 7/9/2024 by Weiwei Kong, M'onica Ribero

📶

Overview

This paper examines the privacy of the last iterate in cyclically-sampled Differentially Private Stochastic Gradient Descent (DP-SGD) on nonconvex composite losses.
The researchers investigate the privacy guarantees of the last iterate in this setting, which is commonly used in practice but has not been well-studied theoretically.
They provide new privacy bounds for the last iterate and show that it can achieve better privacy and utility compared to the average iterate.

Plain English Explanation

This research paper looks at how private the final result is when using a common machine learning technique called Differentially Private Stochastic Gradient Descent (DP-SGD) on complex, nonlinear problems.

DP-SGD is a way to train machine learning models while protecting the privacy of the training data. It works by adding carefully calibrated random noise to the gradients (the steps the model takes to learn) to make it hard for an attacker to figure out the original data.

The researchers studied a specific way of using DP-SGD called "cyclical sampling," where the training data is divided into batches and used in a repeating pattern. This is common in practice, but hasn't been analyzed much before.

The key finding is that the final model produced by this cyclical DP-SGD approach can actually be more private and useful than the average model produced over all the iterations. This is important because in practice, the final model is often the one that gets deployed, so understanding its privacy properties is crucial.

The paper provides new mathematical bounds that quantify how private the final model is, showing it can achieve better privacy and performance compared to just using the average of all the iterates. This provides helpful guidance for machine learning practitioners on how to get the most out of DP-SGD while still protecting people's privacy.

Technical Explanation

The paper focuses on the privacy of the "last iterate" in cyclically-sampled Differentially Private Stochastic Gradient Descent (DP-SGD) on nonconvex composite losses. DP-SGD is a widely used technique for training machine learning models while providing strong privacy guarantees for the training data.

In cyclical DP-SGD, the training data is divided into batches that are used in a repeating pattern during optimization. This is a common technique in practice, but its theoretical privacy properties have not been well-studied, especially for the final, deployed model (the "last iterate") rather than the average of all iterates.

The researchers derive new privacy bounds for the last iterate in this setting, showing that it can achieve better privacy and utility trade-offs compared to using the average iterate. Specifically, they prove that under certain conditions, the last iterate can satisfy a tighter differential privacy guarantee than the average, while also achieving better performance on the target task.

This is an important result, as in many real-world applications, it is the final, deployed model that matters most for privacy and utility, not the average over all training steps. The paper provides a rigorous theoretical analysis to quantify the privacy-utility trade-offs in this practical setting, guiding machine learning practitioners on how to get the most out of DP-SGD.

Critical Analysis

The paper provides a strong theoretical analysis of the privacy guarantees for the last iterate in cyclically-sampled DP-SGD, an important practical setting that has not been well-studied before. The researchers use advanced mathematical techniques, including novel applications of differential privacy composition theorems and a gradient decomposition approach, to derive their main results.

One potential limitation is that the analysis is focused on nonconvex composite losses, which capture a broad class of machine learning problems but may not apply to all possible settings. It would be valuable to see if similar results hold for other loss functions or problem structures.

Additionally, the paper does not provide extensive experimental validation of the theoretical bounds. While the analysis is rigorous, it would strengthen the work to see how the privacy-utility trade-offs play out in practice on real-world datasets and tasks.

Overall, this is a technically sophisticated paper that makes an important contribution to the theoretical understanding of DP-SGD. The results provide guidance for machine learning practitioners on how to get the most out of this popular privacy-preserving technique, especially when deploying the final trained model. Further empirical validation and extensions to other problem settings would help solidify the paper's impact.

Conclusion

This paper provides a detailed theoretical analysis of the privacy guarantees for the last iterate in cyclically-sampled Differentially Private Stochastic Gradient Descent (DP-SGD) on nonconvex composite losses. The key finding is that the last iterate can achieve better privacy and utility trade-offs compared to using the average of all iterates, which is an important result for practical machine learning applications where the final deployed model is what matters most.

The researchers use advanced mathematical techniques, including novel applications of differential privacy composition theorems and a gradient decomposition approach, to derive tight privacy bounds for the last iterate. This provides valuable guidance for machine learning practitioners on how to get the most out of DP-SGD while still protecting people's privacy.

While the theoretical analysis is rigorous, further experimental validation and extensions to other problem settings could help solidify the paper's impact and applicability. Overall, this work represents an important step forward in understanding the privacy properties of practical DP-SGD approaches.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📶

Privacy of the last iterate in cyclically-sampled DP-SGD on nonconvex composite losses

Weiwei Kong, M'onica Ribero

Differentially private stochastic gradient descent (DP-SGD) refers to a family of optimization algorithms that provide a guaranteed level of differential privacy (DP) through DP accounting techniques. However, current accounting techniques make assumptions that diverge significantly from practical DP-SGD implementations. For example, they may assume the loss function is Lipschitz continuous and convex, sample the batches randomly with replacement, or omit the gradient clipping step. In this work, we analyze the most commonly used variant of DP-SGD, in which we sample batches cyclically with replacement, perform gradient clipping, and only release the last DP-SGD iterate. More specifically - without assuming convexity, smoothness, or Lipschitz continuity of the loss function - we establish new R'enyi differential privacy (RDP) bounds for the last DP-SGD iterate under the mild assumption that (i) the DP-SGD stepsize is small relative to the topological constants in the loss function, and (ii) the loss function is weakly-convex. Moreover, we show that our bounds converge to previously established convex bounds when the weak-convexity parameter of the objective function approaches zero. In the case of non-Lipschitz smooth loss functions, we provide a weaker bound that scales well in terms of the number of DP-SGD iterations.

7/9/2024

It's Our Loss: No Privacy Amplification for Hidden State DP-SGD With Non-Convex Loss

Meenatchi Sundaram Muthu Selva Annamalai

Differentially Private Stochastic Gradient Descent (DP-SGD) is a popular iterative algorithm used to train machine learning models while formally guaranteeing the privacy of users. However, the privacy analysis of DP-SGD makes the unrealistic assumption that all intermediate iterates (aka internal state) of the algorithm are released since, in practice, only the final trained model, i.e., the final iterate of the algorithm is released. In this hidden state setting, prior work has provided tighter analyses, albeit only when the loss function is constrained, e.g., strongly convex and smooth or linear. On the other hand, the privacy leakage observed empirically from hidden state DP-SGD, even when using non-convex loss functions, suggests that there is in fact a gap between the theoretical privacy analysis and the privacy guarantees achieved in practice. Therefore, it remains an open question whether hidden state privacy amplification for DP-SGD is possible for all (possibly non-convex) loss functions in general. In this work, we design a counter-example and show, both theoretically and empirically, that a hidden state privacy amplification result for DP-SGD for all loss functions in general is not possible. By carefully constructing a loss function for DP-SGD, we show that for specific loss functions, the final iterate of DP-SGD alone leaks as much information as the sequence of all iterates combined. Furthermore, we empirically verify this result by evaluating the privacy leakage from the final iterate of DP-SGD with our loss function and show that this exactly matches the theoretical upper bound guaranteed by DP. Therefore, we show that the current privacy analysis for DP-SGD is tight for general loss functions and conclude that no privacy amplification is possible for DP-SGD in general for all (possibly non-convex) loss functions.

8/22/2024

🏅

Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD

Anvith Thudi, Hengrui Jia, Casey Meehan, Ilia Shumailov, Nicolas Papernot

Differentially private stochastic gradient descent (DP-SGD) is the canonical approach to private deep learning. While the current privacy analysis of DP-SGD is known to be tight in some settings, several empirical results suggest that models trained on common benchmark datasets leak significantly less privacy for many datapoints. Yet, despite past attempts, a rigorous explanation for why this is the case has not been reached. Is it because there exist tighter privacy upper bounds when restricted to these dataset settings, or are our attacks not strong enough for certain datapoints? In this paper, we provide the first per-instance (i.e., ``data-dependent) DP analysis of DP-SGD. Our analysis captures the intuition that points with similar neighbors in the dataset enjoy better data-dependent privacy than outliers. Formally, this is done by modifying the per-step privacy analysis of DP-SGD to introduce a dependence on the distribution of model updates computed from a training dataset. We further develop a new composition theorem to effectively use this new per-step analysis to reason about an entire training run. Put all together, our evaluation shows that this novel DP-SGD analysis allows us to now formally show that DP-SGD leaks significantly less privacy for many datapoints (when trained on common benchmarks) than the current data-independent guarantee. This implies privacy attacks will necessarily fail against many datapoints if the adversary does not have sufficient control over the possible training datasets.

7/17/2024

How Private are DP-SGD Implementations?

Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ. While shuffling-based DP-SGD is more commonly used in practical implementations, it has not been amenable to easy privacy analysis, either analytically or even numerically. On the other hand, Poisson subsampling-based DP-SGD is challenging to scalably implement, but has a well-understood privacy analysis, with multiple open-source numerically tight privacy accountants available. This has led to a common practice of using shuffling-based DP-SGD in practice, but using the privacy analysis for the corresponding Poisson subsampling version. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling, and thus advises caution in reporting privacy parameters for DP-SGD.

6/7/2024