Centering Policy and Practice: Research Gaps around Usable Differential Privacy

2406.12103

Published 6/19/2024 by Rachel Cummings, Jayshree Sarathy

🤿

Abstract

As a mathematically rigorous framework that has amassed a rich theoretical literature, differential privacy is considered by many experts to be the gold standard for privacy-preserving data analysis. Others argue that while differential privacy is a clean formulation in theory, it poses significant challenges in practice. Both perspectives are, in our view, valid and important. To bridge the gaps between differential privacy's promises and its real-world usability, researchers and practitioners must work together to advance policy and practice of this technology. In this paper, we outline pressing open questions towards building usable differential privacy and offer recommendations for the field, such as developing risk frameworks to align with user needs, tailoring communications for different stakeholders, modeling the impact of privacy-loss parameters, investing in effective user interfaces, and facilitating algorithmic and procedural audits of differential privacy systems.

Create account to get full access

Overview

This paper examines the research gaps around making differential privacy, a technique for protecting data privacy, more usable in practical applications.
The authors identify key areas where more work is needed to bridge the gap between theoretical guarantees of differential privacy and its real-world deployment.
The paper is supported by funding from the National Science Foundation, DARPA, and Columbia University.

Plain English Explanation

Differential privacy is a mathematical technique that aims to protect the privacy of individuals in a dataset, while still allowing useful information to be extracted from that data. However, putting differential privacy into practice has proven challenging.

This paper explores the research gaps that need to be filled in order to make differential privacy more usable in real-world applications. The key idea is to "center" differential privacy - to put it at the forefront of data privacy research and policymaking, rather than treating it as a niche or specialized topic.

Some of the specific areas the paper highlights include:

Developing differentially private synthetic data generation methods that maintain the statistical properties of the original data.
Designing causal discovery techniques under local differential privacy constraints.
Improving the transparency and understandability of differential privacy for non-expert users.

By addressing these research gaps, the authors hope to make differential privacy a more practical and widely-adopted tool for protecting personal data.

Technical Explanation

The paper begins by arguing that differential privacy needs to be "centered" - that is, it should be a core focus of research and policy, rather than an afterthought or specialized topic. The authors identify several key research areas where more work is needed to bridge the gap between the theory of differential privacy and its real-world deployment.

One important area is the development of methods for generating differentially private synthetic data. Synthetic data can be a powerful tool for sharing data while preserving privacy, but ensuring the statistical properties of the original data are maintained is challenging. The authors call for more research in this area.

Another focus is on causal discovery techniques under local differential privacy constraints. Causal reasoning is crucial for many data-driven applications, but doing so while preserving individual privacy is an open problem.

The paper also highlights the need for improving the transparency and understandability of differential privacy for non-expert users. Currently, the technical details of differential privacy can be opaque, which hinders its adoption.

To address these gaps, the authors propose a research agenda centered around "usable differential privacy" - developing practical techniques and tools that make differential privacy more accessible and deployable in real-world settings.

Critical Analysis

The paper makes a compelling case for the need to "center" differential privacy as a core focus of data privacy research and policy. The identified research gaps are well-justified and align with the key challenges faced in transitioning differential privacy from theory to practice.

However, the paper does not delve deeply into some of the inherent limitations and trade-offs of differential privacy. For example, the technique can introduce significant noise or distortion into the data, which may limit its utility for certain applications. The paper could have discussed these potential downsides in more detail.

Additionally, the paper does not address the broader societal and ethical implications of differential privacy. As with any privacy-preserving technology, there are concerns around its potential misuse or unintended consequences that warrant further examination.

Overall, the paper provides a solid foundation for a research agenda around usable differential privacy, but future work should also consider the broader context and potential pitfalls of these techniques.

Conclusion

This paper highlights the critical need to "center" differential privacy as a core focus of data privacy research and policy. By addressing key gaps around practical deployment, including synthetic data generation, causal discovery, and transparency, the authors aim to make differential privacy a more widely adopted and useful tool for protecting personal information.

While the paper provides a thoughtful roadmap for advancing the state of the art in usable differential privacy, it could have delved deeper into some of the inherent limitations and societal implications of these techniques. Nonetheless, the paper makes a compelling case for prioritizing differential privacy as a vital component of the data privacy landscape.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

ATTAXONOMY: Unpacking Differential Privacy Guarantees Against Practical Adversaries

Rachel Cummings, Shlomi Hod, Jayshree Sarathy, Marika Swanberg

Differential Privacy (DP) is a mathematical framework that is increasingly deployed to mitigate privacy risks associated with machine learning and statistical analyses. Despite the growing adoption of DP, its technical privacy parameters do not lend themselves to an intelligible description of the real-world privacy risks associated with that deployment: the guarantee that most naturally follows from the DP definition is protection against membership inference by an adversary who knows all but one data record and has unlimited auxiliary knowledge. In many settings, this adversary is far too strong to inform how to set real-world privacy parameters. One approach for contextualizing privacy parameters is via defining and measuring the success of technical attacks, but doing so requires a systematic categorization of the relevant attack space. In this work, we offer a detailed taxonomy of attacks, showing the various dimensions of attacks and highlighting that many real-world settings have been understudied. Our taxonomy provides a roadmap for analyzing real-world deployments and developing theoretical bounds for more informative privacy attacks. We operationalize our taxonomy by using it to analyze a real-world case study, the Israeli Ministry of Health's recent release of a birth dataset using DP, showing how the taxonomy enables fine-grained threat modeling and provides insight towards making informed privacy parameter choices. Finally, we leverage the taxonomy towards defining a more realistic attack than previously considered in the literature, namely a distributional reconstruction attack: we generalize Balle et al.'s notion of reconstruction robustness to a less-informed adversary with distributional uncertainty, and extend the worst-case guarantees of DP to this average-case setting.

5/6/2024

cs.CR cs.CY

↗️

Causal Discovery Under Local Privacy

R=uta Binkyt.e, Carlos Pinz'on, Szilvia Lesty'an, Kangsoo Jung, H'eber H. Arcolezi, Catuscia Palamidessi

Differential privacy is a widely adopted framework designed to safeguard the sensitive information of data providers within a data set. It is based on the application of controlled noise at the interface between the server that stores and processes the data, and the data consumers. Local differential privacy is a variant that allows data providers to apply the privatization mechanism themselves on their data individually. Therefore it provides protection also in contexts in which the server, or even the data collector, cannot be trusted. The introduction of noise, however, inevitably affects the utility of the data, particularly by distorting the correlations between individual data components. This distortion can prove detrimental to tasks such as causal discovery. In this paper, we consider various well-known locally differentially private mechanisms and compare the trade-off between the privacy they provide, and the accuracy of the causal structure produced by algorithms for causal learning when applied to data obfuscated by these mechanisms. Our analysis yields valuable insights for selecting appropriate local differentially private protocols for causal discovery tasks. We foresee that our findings will aid researchers and practitioners in conducting locally private causal discovery.

5/6/2024

cs.CR cs.AI cs.LG

From Theory to Comprehension: A Comparative Study of Differential Privacy and $k$-Anonymity

Saskia Nu~nez von Voigt, Luise Mehner, Florian Tschorsch

The notion of $varepsilon$-differential privacy is a widely used concept of providing quantifiable privacy to individuals. However, it is unclear how to explain the level of privacy protection provided by a differential privacy mechanism with a set $varepsilon$. In this study, we focus on users' comprehension of the privacy protection provided by a differential privacy mechanism. To do so, we study three variants of explaining the privacy protection provided by differential privacy: (1) the original mathematical definition; (2) $varepsilon$ translated into a specific privacy risk; and (3) an explanation using the randomized response technique. We compare users' comprehension of privacy protection employing these explanatory models with their comprehension of privacy protection of $k$-anonymity as baseline comprehensibility. Our findings suggest that participants' comprehension of differential privacy protection is enhanced by the privacy risk model and the randomized response-based model. Moreover, our results confirm our intuition that privacy protection provided by $k$-anonymity is more comprehensible.

4/8/2024

cs.CR cs.HC

Differentially Private Synthetic Data with Private Density Estimation

Nikolija Bojkovic, Po-Ling Loh

The need to analyze sensitive data, such as medical records or financial data, has created a critical research challenge in recent years. In this paper, we adopt the framework of differential privacy, and explore mechanisms for generating an entire dataset which accurately captures characteristics of the original data. We build upon the work of Boedihardjo et al, which laid the foundations for a new optimization-based algorithm for generating private synthetic data. Importantly, we adapt their algorithm by replacing a uniform sampling step with a private distribution estimator; this allows us to obtain better computational guarantees for discrete distributions, and develop a novel algorithm suitable for continuous distributions. We also explore applications of our work to several statistical tasks.

5/9/2024

cs.CR cs.IT cs.LG stat.ML