On Differentially Private U Statistics

Read original: arXiv:2407.04945 - Published 7/9/2024 by Kamalika Chaudhuri, Po-Ling Loh, Shourya Pandey, Purnamrita Sarkar

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

On Differentially Private U Statistics

Kamalika Chaudhuri, Po-Ling Loh, Shourya Pandey, Purnamrita Sarkar

We consider the problem of privately estimating a parameter $mathbb{E}[h(X_1,dots,X_k)]$, where $X_1$, $X_2$, $dots$, $X_k$ are i.i.d. data from some distribution and $h$ is a permutation-invariant function. Without privacy constraints, standard estimators are U-statistics, which commonly arise in a wide range of problems, including nonparametric signed rank tests, symmetry testing, uniformity testing, and subgraph counts in random networks, and can be shown to be minimum variance unbiased estimators under mild conditions. Despite the recent outpouring of interest in private mean estimation, privatizing U-statistics has received little attention. While existing private mean estimation algorithms can be applied to obtain confidence intervals, we show that they can lead to suboptimal private error, e.g., constant-factor inflation in the leading term, or even $Theta(1/n)$ rather than $O(1/n^2)$ in degenerate settings. To remedy this, we propose a new thresholding-based approach using emph{local H'ajek projections} to reweight different subsets of the data. This leads to nearly optimal private error for non-degenerate U-statistics and a strong indication of near-optimality for degenerate U-statistics.

7/9/2024

🎯

Nonparametric extensions of randomized response for private confidence sets

Ian Waudby-Smith, Zhiwei Steven Wu, Aaditya Ramdas

This work derives methods for performing nonparametric, nonasymptotic statistical inference for population means under the constraint of local differential privacy (LDP). Given bounded observations $(X_1, dots, X_n)$ with mean $mu^star$ that are privatized into $(Z_1, dots, Z_n)$, we present confidence intervals (CI) and time-uniform confidence sequences (CS) for $mu^star$ when only given access to the privatized data. To achieve this, we study a nonparametric and sequentially interactive generalization of Warner's famous ``randomized response'' mechanism, satisfying LDP for arbitrary bounded random variables, and then provide CIs and CSs for their means given access to the resulting privatized observations. For example, our results yield private analogues of Hoeffding's inequality in both fixed-time and time-uniform regimes. We extend these Hoeffding-type CSs to capture time-varying (non-stationary) means, and conclude by illustrating how these methods can be used to conduct private online A/B tests.

7/26/2024

🚀

Private Mean Estimation with Person-Level Differential Privacy

Sushant Agarwal, Gautam Kamath, Mahbod Majid, Argyris Mouzakis, Rose Silver, Jonathan Ullman

We study person-level differentially private (DP) mean estimation in the case where each person holds multiple samples. DP here requires the usual notion of distributional stability when $textit{all}$ of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that [n = tilde Thetaleft(frac{d}{alpha^2 m} + frac{d}{alpha m^{1/2} varepsilon} + frac{d}{alpha^{k/(k-1)} m varepsilon} + frac{d}{varepsilon}right)] people are necessary and sufficient to estimate the mean up to distance $alpha$ in $ell_2$-norm under $varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the standard clip-and-noise framework, but the analysis for our setting requires both new algorithmic techniques and new analyses. In particular, our new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables may be of interest.

7/22/2024

🎲

Robustness Implies Privacy in Statistical Estimation

Samuel B. Hopkins, Gautam Kamath, Mahbod Majid, Shyam Narayanan

We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics. We give the first black-box reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental high-dimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearly-optimal polynomial-time robust estimators for the mean and covariance of high-dimensional Gaussians which are based on the Sum-of-Squares method, we design the first polynomial-time private estimators for these problems with nearly-optimal samples-accuracy-privacy tradeoffs. Our algorithms are also robust to a nearly optimal fraction of adversarially-corrupted samples.

6/18/2024