AAA: an Adaptive Mechanism for Locally Differential Private Mean Estimation

2404.01625

Published 4/4/2024 by Fei Wei, Ergute Bao, Xiaokui Xiao, Yin Yang, Bolin Ding

🔎

Abstract

Local differential privacy (LDP) is a strong privacy standard that has been adopted by popular software systems. The main idea is that each individual perturbs their own data locally, and only submits the resulting noisy version to a data aggregator. Although much effort has been devoted to computing various types of aggregates and building machine learning applications under LDP, research on fundamental perturbation mechanisms has not achieved significant improvement in recent years. Towards a more refined result utility, existing works mainly focus on improving the worst-case guarantee. However, this approach does not necessarily promise a better average performance given the fact that the data in practice obey a certain distribution, which is not known beforehand. In this paper, we propose the advanced adaptive additive (AAA) mechanism, which is a distribution-aware approach that addresses the average utility and tackles the classical mean estimation problem. AAA is carried out in a two-step approach: first, as the global data distribution is not available beforehand, the data aggregator selects a random subset of individuals to compute a (noisy) quantized data descriptor; then, the data aggregator collects data from the remaining individuals, which are perturbed in a distribution-aware fashion. The perturbation involved in the latter step is obtained by solving an optimization problem, which is formulated with the data descriptor obtained in the former step and the desired properties of task-determined utilities. We provide rigorous privacy proofs, utility analyses, and extensive experiments comparing AAA with state-of-the-art mechanisms. The evaluation results demonstrate that the AAA mechanism consistently outperforms existing solutions with a clear margin in terms of result utility, on a wide range of privacy constraints and real-world and synthetic datasets.

Create account to get full access

Overview

Local differential privacy (LDP) is a strong privacy standard used by popular software systems.
The main idea is that individuals privately perturb their own data before submitting it to a data aggregator.
Prior research has focused on improving the worst-case guarantee of perturbation mechanisms, but this does not necessarily lead to better average performance.

Plain English Explanation

Local differential privacy (LDP) is a way to protect people's privacy when sharing data. Imagine you're part of a survey, and the survey organizers want to know the average age of all the participants. Rather than telling the organizers your exact age, you would add a small amount of "noise" or randomness to your age before sharing it. This way, the organizers can still calculate an approximate average age, but they can't identify your individual age.

The key benefit of LDP is that it gives individuals control over their own data. Instead of a central authority collecting and protecting everyone's sensitive information, each person decides how much to obscure their personal data before sending it in. This can help build trust and encourage more people to participate in data-driven initiatives.

However, the challenge is finding the right balance between privacy and utility. Adding too much noise can make the aggregated results less accurate and useful. The paper proposes a new approach called the Advanced Adaptive Additive (AAA) mechanism that aims to improve the average accuracy of LDP-protected data, while still preserving strong privacy guarantees.

Technical Explanation

The paper introduces the AAA mechanism, which takes a two-step approach to mean estimation under LDP. First, the data aggregator selects a random subset of individuals to compute a noisy quantized data descriptor. This provides the aggregator with some initial information about the underlying data distribution, without requiring full access to people's private data.

In the second step, the remaining individuals perturb their data in a distribution-aware fashion, by solving an optimization problem. This optimization balances the desired properties of the final estimate (e.g. accuracy) with the level of privacy protection. The perturbation process leverages the initial data descriptor to tailor the noise addition to the specific data distribution.

The paper provides formal privacy analysis and utility guarantees for the AAA mechanism. Extensive experiments on real-world and synthetic datasets demonstrate that AAA consistently outperforms state-of-the-art LDP mechanisms in terms of result utility, across a range of privacy constraints.

Critical Analysis

The AAA mechanism represents an interesting advance in LDP research, as it moves beyond the standard focus on worst-case utility guarantees. By incorporating data distribution information, the approach can potentially achieve better average-case performance without sacrificing strong privacy protections.

However, a key limitation is that the initial data descriptor relies on a random sample of participants. In practice, this subset may not be fully representative of the overall population, which could introduce biases into the distribution estimate. Additionally, the optimization-based perturbation process adds computational complexity that may be challenging to deploy at large scale.

Further research could explore alternative techniques for estimating the data distribution, or investigate the robustness of the AAA mechanism to sampling biases and other real-world effects. It would also be valuable to study the application of AAA to a wider range of data analysis tasks beyond just mean estimation.

Conclusion

The AAA mechanism proposed in this paper demonstrates how distribution-aware perturbation can improve the utility of LDP-protected data, compared to prior approaches focused on worst-case guarantees. By tailoring the noise addition to the specific data distribution, AAA is able to achieve better average-case performance while still providing strong privacy safeguards.

As privacy-preserving data sharing becomes increasingly important, innovations like AAA can help enable more accurate and useful data analysis under rigorous privacy constraints. However, practical deployment will likely require addressing challenges around representational bias and computational complexity. Overall, this work represents a promising step forward in the field of local differential privacy.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🛠️

Muffliato: Peer-to-Peer Privacy Amplification for Decentralized Optimization and Averaging

Edwige Cyffers, Mathieu Even, Aur'elien Bellet, Laurent Massouli'e

Decentralized optimization is increasingly popular in machine learning for its scalability and efficiency. Intuitively, it should also provide better privacy guarantees, as nodes only observe the messages sent by their neighbors in the network graph. But formalizing and quantifying this gain is challenging: existing results are typically limited to Local Differential Privacy (LDP) guarantees that overlook the advantages of decentralization. In this work, we introduce pairwise network differential privacy, a relaxation of LDP that captures the fact that the privacy leakage from a node $u$ to a node $v$ may depend on their relative position in the graph. We then analyze the combination of local noise injection with (simple or randomized) gossip averaging protocols on fixed and random communication graphs. We also derive a differentially private decentralized optimization algorithm that alternates between local gradient descent steps and gossip averaging. Our results show that our algorithms amplify privacy guarantees as a function of the distance between nodes in the graph, matching the privacy-utility trade-off of the trusted curator, up to factors that explicitly depend on the graph topology. Finally, we illustrate our privacy gains with experiments on synthetic and real-world datasets.

6/12/2024

cs.CR cs.DC cs.LG

🛠️

Differential Privacy via Distributionally Robust Optimization

Aras Selvi, Huikang Liu, Wolfram Wiesemann

In recent years, differential privacy has emerged as the de facto standard for sharing statistics of datasets while limiting the disclosure of private information about the involved individuals. This is achieved by randomly perturbing the statistics to be published, which in turn leads to a privacy-accuracy trade-off: larger perturbations provide stronger privacy guarantees, but they result in less accurate statistics that offer lower utility to the recipients. Of particular interest are therefore optimal mechanisms that provide the highest accuracy for a pre-selected level of privacy. To date, work in this area has focused on specifying families of perturbations a priori and subsequently proving their asymptotic and/or best-in-class optimality. In this paper, we develop a class of mechanisms that enjoy non-asymptotic and unconditional optimality guarantees. To this end, we formulate the mechanism design problem as an infinite-dimensional distributionally robust optimization problem. We show that the problem affords a strong dual, and we exploit this duality to develop converging hierarchies of finite-dimensional upper and lower bounding problems. Our upper (primal) bounds correspond to implementable perturbations whose suboptimality can be bounded by our lower (dual) bounds. Both bounding problems can be solved within seconds via cutting plane techniques that exploit the inherent problem structure. Our numerical experiments demonstrate that our perturbations can outperform the previously best results from the literature on artificial as well as standard benchmark problems.

5/24/2024

cs.CR cs.LG

Enhancing Federated Learning with Adaptive Differential Privacy and Priority-Based Aggregation

Mahtab Talaei, Iman Izadi

Federated learning (FL), a novel branch of distributed machine learning (ML), develops global models through a private procedure without direct access to local datasets. However, it is still possible to access the model updates (gradient updates of deep neural networks) transferred between clients and servers, potentially revealing sensitive local information to adversaries using model inversion attacks. Differential privacy (DP) offers a promising approach to addressing this issue by adding noise to the parameters. On the other hand, heterogeneities in data structure, storage, communication, and computational capabilities of devices can cause convergence problems and delays in developing the global model. A personalized weighted averaging of local parameters based on the resources of each device can yield a better aggregated model in each round. In this paper, to efficiently preserve privacy, we propose a personalized DP framework that injects noise based on clients' relative impact factors and aggregates parameters while considering heterogeneities and adjusting properties. To fulfill the DP requirements, we first analyze the convergence boundary of the FL algorithm when impact factors are personalized and fixed throughout the learning process. We then further study the convergence property considering time-varying (adaptive) impact factors.

6/27/2024

cs.LG cs.CR cs.DC

Adaptive Online Bayesian Estimation of Frequency Distributions with Local Differential Privacy

Soner Aydin, Sinan Yildirim

We propose a novel Bayesian approach for the adaptive and online estimation of the frequency distribution of a finite number of categories under the local differential privacy (LDP) framework. The proposed algorithm performs Bayesian parameter estimation via posterior sampling and adapts the randomization mechanism for LDP based on the obtained posterior samples. We propose a randomized mechanism for LDP which uses a subset of categories as an input and whose performance depends on the selected subset and the true frequency distribution. By using the posterior sample as an estimate of the frequency distribution, the algorithm performs a computationally tractable subset selection step to maximize the utility of the privatized response of the next user. We propose several utility functions related to well-known information metrics, such as (but not limited to) Fisher information matrix, total variation distance, and information entropy. We compare each of these utility metrics in terms of their computational complexity. We employ stochastic gradient Langevin dynamics for posterior sampling, a computationally efficient approximate Markov chain Monte Carlo method. We provide a theoretical analysis showing that (i) the posterior distribution targeted by the algorithm converges to the true parameter even for approximate posterior sampling, and (ii) the algorithm selects the optimal subset with high probability if posterior sampling is performed exactly. We also provide numerical results that empirically demonstrate the estimation accuracy of our algorithm where we compare it with nonadaptive and semi-adaptive approaches under experimental settings with various combinations of privacy parameters and population distribution parameters.

5/14/2024

cs.LG cs.CR stat.ML