Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy

Read original: arXiv:2310.14661 - Published 5/2/2024 by Yingyu Lin, Yi-An Ma, Yu-Xiang Wang, Rachel Redberg, Zhiqi Bu

📉

Overview

The paper proposes a new algorithm called Approximate SAample Perturbation (ASAP) that provides strong differential privacy guarantees while using approximate sampling methods.
Posterior sampling, which provides strict differential privacy, can suffer from unbounded privacy breach when using approximate sampling methods like Markov Chain Monte Carlo (MCMC).
ASAP perturbs an MCMC sample with noise proportional to its distance from a reference distribution that satisfies pure differential privacy.
The authors show that ASAP converges in Wasserstein-infinity distance and can be combined with a localization step to achieve optimal rates for the differentially private empirical risk minimization (DP-ERM) problem.

Plain English Explanation

Differentially private algorithms are designed to protect individual privacy when analyzing sensitive data. One approach is posterior sampling, which involves sampling from the posterior distribution of a statistical model. This provides strong privacy guarantees, as each sample is individually private.

However, in practice, we often need to use approximate sampling methods like Markov Chain Monte Carlo (MCMC) to draw samples from the posterior. This can reintroduce errors that reduce the privacy guarantees, making the overall system less private.

The ASAP algorithm proposed in this paper aims to bridge this gap. ASAP takes an MCMC sample and adds noise to it, where the amount of noise depends on how far the sample is from a "reference" distribution that is known to satisfy strict differential privacy. This ensures the final sample is also private, even when using approximate sampling.

The authors show that ASAP can be combined with a "localization" step to create an algorithm that is both private and efficient, achieving the best-known performance for the differentially private empirical risk minimization (DP-ERM) problem. DP-ERM is an important problem in machine learning, where we want to train a model while protecting the privacy of the training data.

Technical Explanation

The key technical contribution of the paper is the ASAP algorithm, which perturbs an MCMC sample to achieve differential privacy guarantees. Specifically, ASAP adds noise to the MCMC sample proportional to its Wasserstein-infinity distance from a "reference" distribution that satisfies pure differential privacy or pure Gaussian differential privacy (i.e., δ=0).

The authors prove that ASAP converges in Wasserstein-infinity distance, meaning the perturbed samples are close to the true posterior distribution. They then show that by combining ASAP with a localization step, they can obtain the first nearly linear-time algorithm that achieves optimal rates for the DP-ERM problem with strongly convex and smooth losses.

This is significant because DP-ERM is a fundamental problem in differentially private machine learning, with applications in areas like next-token prediction in large language models and distributed optimization under inequality constraints. The ASAP algorithm provides a way to achieve strong privacy guarantees while maintaining computational efficiency.

Critical Analysis

The key strength of the ASAP algorithm is that it provides strong differential privacy guarantees while using approximate sampling methods like MCMC. This addresses a significant practical limitation of posterior sampling, which can suffer from unbounded privacy breach when using such methods.

However, the paper does not discuss the potential limitations or downsides of the ASAP approach. For example, the noise addition step may introduce significant distortion in the final samples, potentially impacting the accuracy of downstream tasks. Additionally, the reliance on a "reference" distribution that satisfies pure differential privacy may be challenging to obtain in practice.

Furthermore, the paper focuses on the theoretical analysis and does not provide extensive empirical evaluation of ASAP's performance compared to other differentially private algorithms. More experimental results, especially on real-world datasets and tasks, would help demonstrate the practical benefits and limitations of the proposed approach.

Overall, the ASAP algorithm represents an interesting and potentially valuable contribution to the field of differentially private machine learning. However, further research is needed to fully understand its practical implications and limitations.

Conclusion

The ASAP algorithm proposed in this paper addresses a key challenge in differentially private machine learning: how to maintain strong privacy guarantees when using approximate sampling methods like MCMC. By perturbing MCMC samples based on their distance from a reference distribution, ASAP can provide strict differential privacy while still achieving computational efficiency.

The authors show that ASAP can be combined with a localization step to obtain the first nearly linear-time algorithm that achieves optimal rates for the differentially private empirical risk minimization (DP-ERM) problem. This is a significant result, as DP-ERM is a fundamental problem with applications in areas like language modeling and distributed optimization.

While the theoretical analysis is promising, further research is needed to fully understand the practical implications and limitations of the ASAP approach. Nonetheless, this work represents an important contribution to the field of differentially private machine learning and could pave the way for more efficient and private algorithms in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📉

Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy

Yingyu Lin, Yi-An Ma, Yu-Xiang Wang, Rachel Redberg, Zhiqi Bu

Posterior sampling, i.e., exponential mechanism to sample from the posterior distribution, provides $varepsilon$-pure differential privacy (DP) guarantees and does not suffer from potentially unbounded privacy breach introduced by $(varepsilon,delta)$-approximate DP. In practice, however, one needs to apply approximate sampling methods such as Markov chain Monte Carlo (MCMC), thus re-introducing the unappealing $delta$-approximation error into the privacy guarantees. To bridge this gap, we propose the Approximate SAample Perturbation (abbr. ASAP) algorithm which perturbs an MCMC sample with noise proportional to its Wasserstein-infinity ($W_infty$) distance from a reference distribution that satisfies pure DP or pure Gaussian DP (i.e., $delta=0$). We then leverage a Metropolis-Hastings algorithm to generate the sample and prove that the algorithm converges in $W_infty$ distance. We show that by combining our new techniques with a localization step, we obtain the first nearly linear-time algorithm that achieves the optimal rates in the DP-ERM problem with strongly convex and smooth losses.

5/2/2024

🔎

Privacy Amplification for Matrix Mechanisms

Christopher A. Choquette-Choo, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta

Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In this paper, we propose MMCC, the first algorithm to analyze privacy amplification via sampling for any generic matrix mechanism. MMCC is nearly tight in that it approaches a lower bound as $epsilonto0$. To analyze correlated outputs in MMCC, we prove that they can be analyzed as if they were independent, by conditioning them on prior outputs. Our conditional composition theorem has broad utility: we use it to show that the noise added to binary-tree-DP-FTRL can asymptotically match the noise added to DP-SGD with amplification. Our amplification algorithm also has practical empirical utility: we show it leads to significant improvement in the privacy-utility trade-offs for DP-FTRL algorithms on standard benchmarks.

5/7/2024

Shifted Interpolation for Differential Privacy

Jinho Bok, Weijie Su, Jason M. Altschuler

Noisy gradient descent and its variants are the predominant algorithms for differentially private machine learning. It is a fundamental question to quantify their privacy leakage, yet tight characterizations remain open even in the foundational setting of convex losses. This paper improves over previous analyses by establishing (and refining) the privacy amplification by iteration phenomenon in the unifying framework of $f$-differential privacy--which tightly captures all aspects of the privacy loss and immediately implies tighter privacy accounting in other notions of differential privacy, e.g., $(varepsilon,delta)$-DP and R'enyi DP. Our key technical insight is the construction of shifted interpolated processes that unravel the popular shifted-divergences argument, enabling generalizations beyond divergence-based relaxations of DP. Notably, this leads to the first exact privacy analysis in the foundational setting of strongly convex optimization. Our techniques extend to many settings: convex/strongly convex, constrained/unconstrained, full/cyclic/stochastic batches, and all combinations thereof. As an immediate corollary, we recover the $f$-DP characterization of the exponential mechanism for strongly convex optimization in Gopi et al. (2022), and moreover extend this result to more general settings.

6/13/2024

🚀

Private Mean Estimation with Person-Level Differential Privacy

Sushant Agarwal, Gautam Kamath, Mahbod Majid, Argyris Mouzakis, Rose Silver, Jonathan Ullman

We study person-level differentially private (DP) mean estimation in the case where each person holds multiple samples. DP here requires the usual notion of distributional stability when $textit{all}$ of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that [n = tilde Thetaleft(frac{d}{alpha^2 m} + frac{d}{alpha m^{1/2} varepsilon} + frac{d}{alpha^{k/(k-1)} m varepsilon} + frac{d}{varepsilon}right)] people are necessary and sufficient to estimate the mean up to distance $alpha$ in $ell_2$-norm under $varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the standard clip-and-noise framework, but the analysis for our setting requires both new algorithmic techniques and new analyses. In particular, our new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables may be of interest.

7/22/2024