Learning with User-Level Local Differential Privacy

2405.17079

Published 5/28/2024 by Puning Zhao, Li Shen, Rongfei Fan, Qingming Li, Huiwen Wu, Jiafei Wu, Zhe Liu

Learning with User-Level Local Differential Privacy

Abstract

User-level privacy is important in distributed systems. Previous research primarily focuses on the central model, while the local models have received much less attention. Under the central model, user-level DP is strictly stronger than the item-level one. However, under the local model, the relationship between user-level and item-level LDP becomes more complex, thus the analysis is crucially different. In this paper, we first analyze the mean estimation problem and then apply it to stochastic optimization, classification, and regression. In particular, we propose adaptive strategies to achieve optimal performance at all privacy levels. Moreover, we also obtain information-theoretic lower bounds, which show that the proposed methods are minimax optimal up to logarithmic factors. Unlike the central DP model, where user-level DP always leads to slower convergence, our result shows that under the local model, the convergence rates are nearly the same between user-level and item-level cases for distributions with bounded support. For heavy-tailed distributions, the user-level rate is even faster than the item-level one.

Create account to get full access

Overview

This paper presents a study on learning with user-level local differential privacy, a technique to protect individual privacy in data-driven applications.
The research explores the challenges and tradeoffs involved in balancing privacy and accuracy when training machine learning models with privatized user data.
The authors propose new algorithms and analyze their theoretical properties, as well as evaluate their empirical performance on real-world datasets.

Plain English Explanation

In today's data-driven world, many applications collect and use personal information to provide customized services. However, this raises concerns about individual privacy. Local differential privacy is a technique that allows users to share data while keeping their personal information private.

This paper investigates how to train accurate machine learning models using data that has been privatized in this way. The researchers developed new algorithms that can learn from privatized user data, and they analyzed how well these approaches work in practice.

The key idea is to find a balance between preserving individual privacy and maintaining the accuracy of the machine learning models. The authors explore the theoretical properties of their algorithms and test them on real-world datasets to see how they perform.

The findings from this research could help enable the development of data-driven applications that respect user privacy, such as personalized recommendation systems or estimating public statistics from privatized data.

Technical Explanation

The paper focuses on the problem of learning with user-level local differential privacy (ULLP). In this setting, each user privatizes their own data using a local privacy mechanism before sharing it with a central server. The goal is to train an accurate machine learning model on the privatized user data while ensuring strong privacy guarantees for each individual.

The authors propose new algorithms for ULLP learning, including a novel Private Gradient Descent (PGD) method. They provide a thorough theoretical analysis of these algorithms, deriving bounds on the excess risk and the privacy loss. The key technical contributions include:

Introducing the ULLP framework and formalizing the problem of learning from privatized user data.
Developing new ULLP learning algorithms, such as PGD, and analyzing their theoretical properties.
Evaluating the empirical performance of the proposed algorithms on real-world datasets, including tasks like linear regression and logistic regression.

The experiments demonstrate that the new ULLP algorithms can achieve a good balance between model accuracy and individual privacy, outperforming previous approaches in many settings.

Critical Analysis

The paper provides a comprehensive study of learning with user-level local differential privacy, addressing an important problem in the field of privacy-preserving machine learning. The authors' theoretical analysis offers valuable insights into the tradeoffs between privacy and accuracy, and the empirical evaluation on real-world datasets helps validate the practical relevance of the proposed methods.

However, the paper also acknowledges several limitations and areas for future work. For example, the current ULLP framework assumes that each user's data is independent and identically distributed, which may not hold in many real-world scenarios. Additionally, the paper does not explore the impact of different privacy budgets or the robustness of the algorithms to adversarial attacks.

Further research could investigate more realistic data distributions, study the effects of adaptive privacy budgets, and explore techniques to improve the robustness of ULLP models. Additionally, it would be interesting to see how the proposed algorithms perform on a wider range of machine learning tasks and datasets.

Conclusion

This paper presents a detailed study of learning with user-level local differential privacy, a critical topic in the field of privacy-preserving machine learning. The authors develop new algorithms and provide a thorough theoretical and empirical analysis of their performance.

The findings from this research contribute to our understanding of the tradeoffs between privacy and accuracy in data-driven applications. The proposed ULLP methods could enable the development of privacy-preserving machine learning models for a variety of real-world use cases, such as personalized recommendations, estimating public statistics, and beyond.

As the demand for data-driven applications that respect individual privacy continues to grow, this work represents an important step towards reconciling the tension between the benefits of data-driven technologies and the need to protect user privacy.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are `almost indistinguishable' with or without any particular privacy unit, current evaluations on LLMs mostly treat each example (text record) as the privacy unit. This leads to uneven user privacy guarantees when contributions per user vary. We therefore study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users. We present a systematic evaluation of user-level DP for LLM fine-tuning on natural language generation tasks. Focusing on two mechanisms for achieving user-level DP guarantees, Group Privacy and User-wise DP-SGD, we investigate design choices like data selection strategies and parameter tuning for the best privacy-utility tradeoff.

6/21/2024

cs.CL cs.CR cs.LG

🤷

A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness: Preliminary Results

Karima Makhlouf, Tamara Stefanovic, Heber H. Arcolezi, Catuscia Palamidessi

Machine learning (ML) algorithms rely primarily on the availability of training data, and, depending on the domain, these data may include sensitive information about the data providers, thus leading to significant privacy issues. Differential privacy (DP) is the predominant solution for privacy-preserving ML, and the local model of DP is the preferred choice when the server or the data collector are not trusted. Recent experimental studies have shown that local DP can impact ML prediction for different subgroups of individuals, thus affecting fair decision-making. However, the results are conflicting in the sense that some studies show a positive impact of privacy on fairness while others show a negative one. In this work, we conduct a systematic and formal study of the effect of local DP on fairness. Specifically, we perform a quantitative study of how the fairness of the decisions made by the ML model changes under local DP for different levels of privacy and data distributions. In particular, we provide bounds in terms of the joint distributions and the privacy level, delimiting the extent to which local DP can impact the fairness of the model. We characterize the cases in which privacy reduces discrimination and those with the opposite effect. We validate our theoretical findings on synthetic and real-world datasets. Our results are preliminary in the sense that, for now, we study only the case of one sensitive attribute, and only statistical disparity, conditional statistical disparity, and equal opportunity difference.

5/24/2024

cs.LG cs.CR

🏷️

Optimal Locally Private Nonparametric Classification with Public Data

Yuheng Ma, Hanfang Yang

In this work, we investigate the problem of public data assisted non-interactive Local Differentially Private (LDP) learning with a focus on non-parametric classification. Under the posterior drift assumption, we for the first time derive the mini-max optimal convergence rate with LDP constraint. Then, we present a novel approach, the locally differentially private classification tree, which attains the mini-max optimal convergence rate. Furthermore, we design a data-driven pruning procedure that avoids parameter tuning and provides a fast converging estimator. Comprehensive experiments conducted on synthetic and real data sets show the superior performance of our proposed methods. Both our theoretical and experimental findings demonstrate the effectiveness of public data compared to private data, which leads to practical suggestions for prioritizing non-private data collection.

6/4/2024

stat.ML cs.CR cs.LG

🚀

Private Mean Estimation with Person-Level Differential Privacy

Sushant Agarwal, Gautam Kamath, Mahbod Majid, Argyris Mouzakis, Rose Silver, Jonathan Ullman

We study differentially private (DP) mean estimation in the case where each person holds multiple samples. Commonly referred to as the user-level setting, DP here requires the usual notion of distributional stability when all of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that [n = tilde Thetaleft(frac{d}{alpha^2 m} + frac{d }{ alpha m^{1/2} varepsilon} + frac{d}{alpha^{k/(k-1)} m varepsilon} + frac{d}{varepsilon}right)] people are necessary and sufficient to estimate the mean up to distance $alpha$ in $ell_2$-norm under $varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate DP (with slightly degraded sample complexity) and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the well known noisy-clipped-mean approach, but the analysis for our setting requires new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables, and a new argument for bounding the bias introduced by clipping.

6/3/2024

cs.DS cs.CR cs.IT cs.LG stat.ML