Differential Privacy via Distributionally Robust Optimization

2304.12681

Published 5/24/2024 by Aras Selvi, Huikang Liu, Wolfram Wiesemann

🛠️

Abstract

In recent years, differential privacy has emerged as the de facto standard for sharing statistics of datasets while limiting the disclosure of private information about the involved individuals. This is achieved by randomly perturbing the statistics to be published, which in turn leads to a privacy-accuracy trade-off: larger perturbations provide stronger privacy guarantees, but they result in less accurate statistics that offer lower utility to the recipients. Of particular interest are therefore optimal mechanisms that provide the highest accuracy for a pre-selected level of privacy. To date, work in this area has focused on specifying families of perturbations a priori and subsequently proving their asymptotic and/or best-in-class optimality. In this paper, we develop a class of mechanisms that enjoy non-asymptotic and unconditional optimality guarantees. To this end, we formulate the mechanism design problem as an infinite-dimensional distributionally robust optimization problem. We show that the problem affords a strong dual, and we exploit this duality to develop converging hierarchies of finite-dimensional upper and lower bounding problems. Our upper (primal) bounds correspond to implementable perturbations whose suboptimality can be bounded by our lower (dual) bounds. Both bounding problems can be solved within seconds via cutting plane techniques that exploit the inherent problem structure. Our numerical experiments demonstrate that our perturbations can outperform the previously best results from the literature on artificial as well as standard benchmark problems.

Create account to get full access

Overview

Differential privacy is a standard for sharing statistics of datasets while protecting the privacy of individuals involved.
This is achieved by randomly perturbing the statistics to be published, which creates a trade-off between privacy and accuracy.
Optimal mechanisms that provide the highest accuracy for a given level of privacy are of particular interest.
Previous work has focused on specifying families of perturbations and proving their optimality.
This paper develops a class of mechanisms with non-asymptotic and unconditional optimality guarantees.

Plain English Explanation

When organizations want to share statistics about a dataset, they need to balance the desire to provide accurate information with the need to protect the privacy of the individuals in the dataset. Differential privacy is a way to do this by adding a small amount of randomness or "noise" to the statistics before sharing them.

The more noise that's added, the stronger the privacy protection, but the less accurate the statistics become. Researchers are interested in finding the "optimal" mechanisms that provide the highest accuracy for a given level of privacy.

Previous work in this area has focused on identifying specific families of noise distributions and proving that they are optimal, at least in the long run as the dataset gets very large. In this paper, the authors take a different approach. They formulate the problem as an optimization problem and use advanced mathematical techniques to find new noise mechanisms that are optimal not just in the long run, but in any dataset size.

Their mechanisms can outperform the previous best results, providing more accurate statistics while still protecting privacy. This represents an important advance in the field of differential privacy and could enable organizations to share more useful information about their datasets.

Technical Explanation

The authors formulate the mechanism design problem as an infinite-dimensional distributionally robust optimization problem. They show that this problem has a strong dual, and they exploit this duality to develop converging hierarchies of finite-dimensional upper and lower bounding problems.

The upper (primal) bounds correspond to implementable perturbations whose suboptimality can be bounded by the lower (dual) bounds. Both the upper and lower bounding problems can be solved efficiently using cutting plane techniques that exploit the inherent problem structure.

The authors demonstrate through numerical experiments that their perturbations can outperform the previously best results from the literature on both artificial and standard benchmark problems. This represents a significant advancement over prior work, which has focused on specifying families of perturbations a priori and subsequently proving their asymptotic and/or best-in-class optimality.

Critical Analysis

The authors acknowledge that their approach relies on strong duality, which may not hold in all cases. They also note that their lower bounding problems may become computationally intractable for very high-dimensional problems.

Additionally, while the authors show that their mechanisms outperform previous approaches on the tested problems, it's unclear how they would scale to real-world datasets or how they would perform under different privacy constraints or accuracy requirements.

Further research may be needed to better understand the broader applicability and limitations of this approach, as well as to explore ways to improve the computational efficiency and scalability of the proposed methods. Rigorous empirical evaluations and comparisons to alternative techniques would also be valuable to assess the practical impact of this work.

Conclusion

This paper presents a novel approach to designing optimal differential privacy mechanisms, with the authors developing a class of perturbations that enjoy non-asymptotic and unconditional optimality guarantees. Their techniques outperform previous best results, suggesting an important advancement in the field of differential privacy.

While the approach has some limitations, it represents a significant step forward in balancing the need for accurate statistics with the imperative of protecting individual privacy. Further research and real-world applications will be needed to fully assess the impact and potential of this work.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Robust Constrained Consensus and Inequality-constrained Distributed Optimization with Guaranteed Differential Privacy and Accurate Convergence

Yongqiang Wang, Angelia Nedic

We address differential privacy for fully distributed optimization subject to a shared inequality constraint. By co-designing the distributed optimization mechanism and the differential-privacy noise injection mechanism, we propose the first distributed constrained optimization algorithm that can ensure both provable convergence to a global optimal solution and rigorous $epsilon$-differential privacy, even when the number of iterations tends to infinity. Our approach does not require the Lagrangian function to be strictly convex/concave, and allows the global objective function to be non-separable. As a byproduct of the co-design, we also propose a new constrained consensus algorithm that can achieve rigorous $epsilon$-differential privacy while maintaining accurate convergence, which, to our knowledge, has not been achieved before. Numerical simulation results on a demand response control problem in smart grid confirm the effectiveness of the proposed approach.

4/4/2024

cs.SY eess.SY

🎲

Robustness Implies Privacy in Statistical Estimation

Samuel B. Hopkins, Gautam Kamath, Mahbod Majid, Shyam Narayanan

We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics. We give the first black-box reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental high-dimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearly-optimal polynomial-time robust estimators for the mean and covariance of high-dimensional Gaussians which are based on the Sum-of-Squares method, we design the first polynomial-time private estimators for these problems with nearly-optimal samples-accuracy-privacy tradeoffs. Our algorithms are also robust to a nearly optimal fraction of adversarially-corrupted samples.

6/18/2024

cs.DS cs.CR cs.IT stat.ML

Privacy-Aware Randomized Quantization via Linear Programming

Zhongteng Cai, Xueru Zhang, Mohammad Mahdi Khalili

Differential privacy mechanisms such as the Gaussian or Laplace mechanism have been widely used in data analytics for preserving individual privacy. However, they are mostly designed for continuous outputs and are unsuitable for scenarios where discrete values are necessary. Although various quantization mechanisms were proposed recently to generate discrete outputs under differential privacy, the outcomes are either biased or have an inferior accuracy-privacy trade-off. In this paper, we propose a family of quantization mechanisms that is unbiased and differentially private. It has a high degree of freedom and we show that some existing mechanisms can be considered as special cases of ours. To find the optimal mechanism, we formulate a linear optimization that can be solved efficiently using linear programming tools. Experiments show that our proposed mechanism can attain a better privacy-accuracy trade-off compared to baselines.

6/6/2024

cs.CR cs.AI

📊

A New Analysis of Differential Privacy's Generalization Guarantees

Christopher Jung, Katrina Ligett, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, Moshe Shenfeld

We give a new proof of the transfer theorem underlying adaptive data analysis: that any mechanism for answering adaptively chosen statistical queries that is differentially private and sample-accurate is also accurate out-of-sample. Our new proof is elementary and gives structural insights that we expect will be useful elsewhere. We show: 1) that differential privacy ensures that the expectation of any query on the posterior distribution on datasets induced by the transcript of the interaction is close to its true value on the data distribution, and 2) sample accuracy on its own ensures that any query answer produced by the mechanism is close to its posterior expectation with high probability. This second claim follows from a thought experiment in which we imagine that the dataset is resampled from the posterior distribution after the mechanism has committed to its answers. The transfer theorem then follows by summing these two bounds, and in particular, avoids the monitor argument used to derive high probability bounds in prior work. An upshot of our new proof technique is that the concrete bounds we obtain are substantially better than the best previously known bounds, even though the improvements are in the constants, rather than the asymptotics (which are known to be tight). As we show, our new bounds outperform the naive sample-splitting baseline at dramatically smaller dataset sizes compared to the previous state of the art, bringing techniques from this literature closer to practicality.

6/5/2024

cs.LG cs.CR stat.ML