The Normal Distributions Indistinguishability Spectrum and its Application to Privacy-Preserving Machine Learning

Read original: arXiv:2309.01243 - Published 6/24/2024 by Yun Lu, Malik Magdon-Ismail, Yu Wei, Vassilis Zikas

🌿

Overview

Differential Privacy (DP) is a popular method for machine learning on privacy-sensitive data
In big data analytics, randomized sketching/aggregation algorithms are used to process high-dimensional data
These randomized algorithms should provide some inherent privacy, but existing DP mechanisms don't fully leverage this randomness

Plain English Explanation

Differential Privacy (DP) is a way to protect people's privacy when doing machine learning on sensitive data. In big data analytics, researchers often use random algorithms to process high-dimensional data and make it easier to work with. Intuitively, these random algorithms should provide some privacy protection on their own. However, most existing DP techniques don't take full advantage of this inherent randomness, which can lead to unnecessary "noise" being added to the data.

The key question this research aims to answer is: Can we improve the usefulness of DP mechanisms for random machine learning algorithms by leveraging the randomness of the algorithms themselves?

The researchers' main contribution is proving what they call the "NDIS theorem." This is a mathematical result that allows them to efficiently calculate how "indistinguishable" two normal distributions are from a DP perspective. This has two important practical implications:

It provides a way to estimate the optimal privacy-utility tradeoff for DP mechanisms with normally-distributed outputs.
It allows analyzing DP mechanisms for a wider range of algorithms, not just those with normally-distributed outputs.

The researchers then apply the NDIS theorem to derive new DP mechanisms for specific types of machine learning algorithms, like Gaussian Random Projections and Ordinary Least Squares. Compared to existing techniques, these new DP mechanisms are able to achieve better privacy-utility tradeoffs by taking advantage of the randomness inherent in the underlying algorithms.

Technical Explanation

The key contribution of this work is the NDIS theorem, which provides a closed-form analytic computation for the (ε,δ)-indistinguishability-spectrum (IS) of two arbitrary normal distributions N1 and N2. This means the theorem gives an efficient way to calculate the optimal δ (for any given ε) such that N1 and N2 are (ε,δ)-close according to differential privacy.

The importance of the NDIS theorem is two-fold:

It yields efficient estimators for the IS, which is key for analyzing DP mechanisms with normally-distributed outputs.
It allows analyzing DP mechanisms for more general queries by leveraging their behavior on large inputs.

The researchers apply the NDIS theorem to derive new DP mechanisms for:

Gaussian Random Projections (RP): A common randomized sketching algorithm used in big data analytics.
Ordinary Least Squares (OLS): A widely used machine learning algorithm.

Compared to existing DP techniques, these new mechanisms achieve superior privacy-utility tradeoffs by taking advantage of the inherent randomness in the underlying algorithms.

The researchers also apply the NDIS theorem to analyze a data-driven DP notion called "relative DP," introduced by Lu et al. [S&P 2024]. Their method identifies the range of (ε,δ) values where no additional noise is needed to satisfy relative DP.

Critical Analysis

The main strength of this work is the NDIS theorem, which provides a powerful analytical tool for studying DP mechanisms with normally-distributed outputs. This allows the researchers to derive new DP algorithms that outperform existing techniques by better leveraging the inherent randomness in the underlying machine learning queries.

However, the applicability of this approach is limited to algorithms with normally-distributed outputs or those that can be analyzed in terms of their behavior on large inputs. It's not clear how well the NDIS theorem and the derived DP mechanisms would generalize to a broader class of machine learning algorithms.

Additionally, the researchers only evaluate their methods on synthetic data, so it would be important to see how they perform on real-world datasets and tasks. Further research is needed to understand the practical limitations and potential pitfalls of this approach.

Conclusion

This paper presents a novel theoretical result, the NDIS theorem, that enables more efficient analysis and design of differentially private mechanisms for machine learning algorithms with inherent randomness. By leveraging this theorem, the researchers derive new DP mechanisms that achieve superior privacy-utility tradeoffs compared to existing techniques.

These findings have the potential to improve the practicality of deploying DP in real-world machine learning applications, particularly in domains like big data analytics where randomized sketching/aggregation algorithms are commonly used. However, further research is needed to assess the broader applicability and robustness of this approach.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌿

The Normal Distributions Indistinguishability Spectrum and its Application to Privacy-Preserving Machine Learning

Yun Lu, Malik Magdon-Ismail, Yu Wei, Vassilis Zikas

Differential Privacy (DP) (and its variants) is the most common method for machine learning (ML) on privacy-sensitive data. In big data analytics, one often uses randomized sketching/aggregation algorithms to make processing high-dimensional data tractable. Intuitively, such ML algorithms should provide some inherent privacy, yet most existing DP mechanisms do not leverage or under-utilize this inherent randomness, resulting in potentially redundant noising. The motivating question of our work is: (How) can we improve the utility of DP mechanisms for randomized ML queries, by leveraging the randomness of the query itself? Towards a (positive) answer, our key contribution is (proving) what we call the NDIS theorem, a theoretical result with several practical implications. In a nutshell, NDIS is a closed-form analytic computation for the (varepsilon,delta)-indistinguishability-spectrum (IS) of two arbitrary normal distributions N1 and N2, i.e., the optimal delta (for any given varepsilon) such that N1 and N2 are (varepsilon,delta)-close according to the DP distance. The importance of the NDIS theorem lies in that (1) it yields efficient estimators for IS, and (2) it allows us to analyze DP-mechanism with normally-distributed outputs, as well as more general mechanisms by leveraging their behavior on large inputs. We apply the NDIS theorem to derive DP mechanisms for queries with normally-distributed outputs--i.e., Gaussian Random Projections (RP)--and for more general queries--i.e., Ordinary Least Squares (OLS). Compared to existing techniques, our new DP mechanisms achieve superior privacy/utility trade-offs by leveraging the randomness of the underlying algorithms. We then apply the NDIS theorem to a data-driven DP notion--in particular relative DP introduced by Lu et al. [S&P 2024]. Our method identifies the range of (varepsilon,delta) for which no additional noising is needed.

6/24/2024

🏷️

DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling

Jianxin Wei, Ergute Bao, Xiaokui Xiao, Yin Yang

Nowadays, differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning. The combination of these two techniques, i.e., deep learning with differential privacy, promises the privacy-preserving release of high-utility models trained with sensitive data such as medical records. A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the stochastic gradient descent (SGD) optimizer commonly used for DNN training. Subsequent approaches have improved various aspects of the model training process, including noise decay schedule, model architecture, feature engineering, and hyperparameter tuning. However, the core mechanism for enforcing DP in the SGD optimizer remains unchanged ever since the original DP-SGD algorithm, which has increasingly become a fundamental barrier limiting the performance of DP-compliant machine learning solutions. Motivated by this, we propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core optimizer of DP-SGD, with consistent and significant accuracy gains over the latter. The main idea is to employ importance sampling (IS) in each SGD iteration for mini-batch selection, which reduces both sampling variance and the amount of random noise injected to the gradients that is required to satisfy DP. Integrating IS into the complex mathematical machinery of DP-SGD is highly non-trivial. DPIS addresses the challenge through novel mechanism designs, fine-grained privacy analysis, efficiency enhancements, and an adaptive gradient clipping optimization. Extensive experiments on four benchmark datasets, namely MNIST, FMNIST, CIFAR-10 and IMDb, demonstrate the superior effectiveness of DPIS over existing solutions for deep learning with differential privacy.

8/2/2024

Shifted Interpolation for Differential Privacy

Jinho Bok, Weijie Su, Jason M. Altschuler

Noisy gradient descent and its variants are the predominant algorithms for differentially private machine learning. It is a fundamental question to quantify their privacy leakage, yet tight characterizations remain open even in the foundational setting of convex losses. This paper improves over previous analyses by establishing (and refining) the privacy amplification by iteration phenomenon in the unifying framework of $f$-differential privacy--which tightly captures all aspects of the privacy loss and immediately implies tighter privacy accounting in other notions of differential privacy, e.g., $(varepsilon,delta)$-DP and R'enyi DP. Our key technical insight is the construction of shifted interpolated processes that unravel the popular shifted-divergences argument, enabling generalizations beyond divergence-based relaxations of DP. Notably, this leads to the first exact privacy analysis in the foundational setting of strongly convex optimization. Our techniques extend to many settings: convex/strongly convex, constrained/unconstrained, full/cyclic/stochastic batches, and all combinations thereof. As an immediate corollary, we recover the $f$-DP characterization of the exponential mechanism for strongly convex optimization in Gopi et al. (2022), and moreover extend this result to more general settings.

6/13/2024

🤿

Generalized Rainbow Differential Privacy

Yuzhou Gu, Ziqi Zhou, Onur Gunlu, Rafael G. L. D'Oliveira, Parastoo Sadeghi, Muriel M'edard, Rafael F. Schaefer

We study a new framework for designing differentially private (DP) mechanisms via randomized graph colorings, called rainbow differential privacy. In this framework, datasets are nodes in a graph, and two neighboring datasets are connected by an edge. Each dataset in the graph has a preferential ordering for the possible outputs of the mechanism, and these orderings are called rainbows. Different rainbows partition the graph of connected datasets into different regions. We show that if a DP mechanism at the boundary of such regions is fixed and it behaves identically for all same-rainbow boundary datasets, then a unique optimal $(epsilon,delta)$-DP mechanism exists (as long as the boundary condition is valid) and can be expressed in closed-form. Our proof technique is based on an interesting relationship between dominance ordering and DP, which applies to any finite number of colors and for $(epsilon,delta)$-DP, improving upon previous results that only apply to at most three colors and for $epsilon$-DP. We justify the homogeneous boundary condition assumption by giving an example with non-homogeneous boundary condition, for which there exists no optimal DP mechanism.

4/8/2024