Uncertainty quantification by block bootstrap for differentially private stochastic gradient descent

2405.12553

Published 5/22/2024 by Holger Dette, Carina Graw

👨‍🏫

Abstract

Stochastic Gradient Descent (SGD) is a widely used tool in machine learning. In the context of Differential Privacy (DP), SGD has been well studied in the last years in which the focus is mainly on convergence rates and privacy guarantees. While in the non private case, uncertainty quantification (UQ) for SGD by bootstrap has been addressed by several authors, these procedures cannot be transferred to differential privacy due to multiple queries to the private data. In this paper, we propose a novel block bootstrap for SGD under local differential privacy that is computationally tractable and does not require an adjustment of the privacy budget. The method can be easily implemented and is applicable to a broad class of estimation problems. We prove the validity of our approach and illustrate its finite sample properties by means of a simulation study. As a by-product, the new method also provides a simple alternative numerical tool for UQ for non-private SGD.

Create account to get full access

Overview

Stochastic Gradient Descent (SGD) is a widely used tool in machine learning.
In the context of Differential Privacy (DP), SGD has been well studied, focusing on convergence rates and privacy guarantees.
While uncertainty quantification (UQ) for SGD by bootstrap has been addressed in the non-private case, these procedures cannot be transferred to differential privacy due to multiple queries to the private data.
The paper proposes a novel block bootstrap for SGD under local differential privacy that is computationally tractable and does not require an adjustment of the privacy budget.

Plain English Explanation

Stochastic Gradient Descent (SGD) is a fundamental technique used in machine learning to train models. In recent years, researchers have been studying how to use SGD while also protecting the privacy of the data being used to train the models. This is known as Differential Privacy (DP).

One important aspect of using models is being able to quantify the uncertainty in the results, which is called Uncertainty Quantification (UQ). For non-private SGD, researchers have developed techniques using something called "bootstrap" to do UQ. However, these techniques cannot be used when the data is protected by DP, because they require too many queries to the private data.

This paper proposes a new method for doing UQ for SGD under DP. The key idea is to use a "block bootstrap," which is a way of resampling the data in blocks instead of individually. This makes the method computationally efficient and doesn't require any additional adjustment to the privacy budget. The authors show that this new method works well both in theory and in practical simulations.

As a bonus, the new method can also be used to do UQ for regular, non-private SGD, providing a simple alternative to existing techniques.

Technical Explanation

The paper proposes a novel block bootstrap method for performing uncertainty quantification (UQ) on the outputs of Stochastic Gradient Descent (SGD) under local differential privacy (LDP). In the non-private setting, bootstrap-based UQ for SGD has been well-studied, but these techniques cannot be directly applied to the DP setting due to the need for multiple queries to the private data.

The key idea of the proposed method is to use a block bootstrap approach, where the data is resampled in blocks rather than individually. This makes the method computationally tractable and does not require any adjustment to the privacy budget. The authors prove the theoretical validity of this approach and demonstrate its finite-sample performance through simulation studies.

Importantly, the new block bootstrap method also provides a simple alternative numerical tool for UQ in the non-private SGD setting, complementing existing techniques. The authors position this work in the broader context of differentially private variance-reduced stochastic optimization, differentially private causal inference with clustered outcomes, differentially private graphon estimation, and the growing body of research on uncertainty quantification in deep learning.

Critical Analysis

The paper presents a valuable contribution to the field of differentially private machine learning by proposing a novel block bootstrap method for uncertainty quantification of SGD outputs. The authors carefully address the key challenge of adapting bootstrap-based UQ techniques to the DP setting, where multiple queries to the private data are not feasible.

One potential limitation of the proposed method is that it assumes the data can be partitioned into independent blocks, which may not always be the case in practice. Additionally, the authors do not provide guidance on how to choose the appropriate block size, which could impact the method's performance.

While the simulation results demonstrate the effectiveness of the block bootstrap approach, it would be helpful to see the method applied to real-world DP machine learning problems to further assess its practical utility and limitations. Additionally, the authors could explore the performance of the method under different privacy regimes (e.g., central DP, concentrated DP) and its scalability to larger datasets.

Overall, the paper makes a valuable contribution to the field of differentially private density estimation and uncertainty quantification in machine learning. The proposed block bootstrap method provides a promising alternative to existing techniques and opens up new research directions in this important area.

Conclusion

This paper presents a novel block bootstrap method for performing uncertainty quantification on the outputs of Stochastic Gradient Descent (SGD) under local differential privacy. By leveraging a block-based resampling approach, the authors develop a computationally tractable technique that does not require any additional adjustment to the privacy budget.

The proposed method addresses a key challenge in adapting bootstrap-based UQ techniques to the differentially private setting, where multiple queries to the private data are not feasible. The authors demonstrate the theoretical validity and practical effectiveness of their approach through simulation studies, and also show that it can be used as a simple alternative for UQ in the non-private SGD setting.

This work contributes to the growing body of research on differentially private machine learning and uncertainty quantification, with potential applications in a variety of domains where privacy-preserving data analysis is of paramount importance.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

How Private are DP-SGD Implementations?

Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ. While shuffling-based DP-SGD is more commonly used in practical implementations, it has not been amenable to easy privacy analysis, either analytically or even numerically. On the other hand, Poisson subsampling-based DP-SGD is challenging to scalably implement, but has a well-understood privacy analysis, with multiple open-source numerically tight privacy accountants available. This has led to a common practice of using shuffling-based DP-SGD in practice, but using the privacy analysis for the corresponding Poisson subsampling version. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling, and thus advises caution in reporting privacy parameters for DP-SGD.

6/7/2024

cs.LG cs.CR cs.DS

👀

Nearly Tight Black-Box Auditing of Differentially Private Machine Learning

Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro

This paper presents a nearly tight audit of the Differentially Private Stochastic Gradient Descent (DP-SGD) algorithm in the black-box model. Our auditing procedure empirically estimates the privacy leakage from DP-SGD using membership inference attacks; unlike prior work, the estimates are appreciably close to the theoretical DP bounds. The main intuition is to craft worst-case initial model parameters, as DP-SGD's privacy analysis is agnostic to the choice of the initial model parameters. For models trained with theoretical $varepsilon=10.0$ on MNIST and CIFAR-10, our auditing procedure yields empirical estimates of $7.21$ and $6.95$, respectively, on 1,000-record samples and $6.48$ and $4.96$ on the full datasets. By contrast, previous work achieved tight audits only in stronger (i.e., less realistic) white-box models that allow the adversary to access the model's inner parameters and insert arbitrary gradients. Our auditing procedure can be used to detect bugs and DP violations more easily and offers valuable insight into how the privacy analysis of DP-SGD can be further improved.

5/24/2024

cs.CR cs.LG

QMGeo: Differentially Private Federated Learning via Stochastic Quantization with Mixed Truncated Geometric Distribution

Zixi Wang, M. Cenk Gursoy

Federated learning (FL) is a framework which allows multiple users to jointly train a global machine learning (ML) model by transmitting only model updates under the coordination of a parameter server, while being able to keep their datasets local. One key motivation of such distributed frameworks is to provide privacy guarantees to the users. However, preserving the users' datasets locally is shown to be not sufficient for privacy. Several differential privacy (DP) mechanisms have been proposed to provide provable privacy guarantees by introducing randomness into the framework, and majority of these mechanisms rely on injecting additive noise. FL frameworks also face the challenge of communication efficiency, especially as machine learning models grow in complexity and size. Quantization is a commonly utilized method, reducing the communication cost by transmitting compressed representation of the underlying information. Although there have been several studies on DP and quantization in FL, the potential contribution of the quantization method alone in providing privacy guarantees has not been extensively analyzed yet. We in this paper present a novel stochastic quantization method, utilizing a mixed geometric distribution to introduce the randomness needed to provide DP, without any additive noise. We provide convergence analysis for our framework and empirically study its performance.

6/12/2024

cs.LG cs.AI

🤯

Resampling methods for Private Statistical Inference

Karan Chadha, John Duchi, Rohith Kuditipudi

We consider the task of constructing confidence intervals with differential privacy. We propose two private variants of the non-parametric bootstrap, which privately compute the median of the results of multiple little bootstraps run on partitions of the data and give asymptotic bounds on the coverage error of the resulting confidence intervals. For a fixed differential privacy parameter $epsilon$, our methods enjoy the same error rates as that of the non-private bootstrap to within logarithmic factors in the sample size $n$. We empirically validate the performance of our methods for mean estimation, median estimation, and logistic regression with both real and synthetic data. Our methods achieve similar coverage accuracy to existing methods (and non-private baselines) while providing notably shorter ($gtrsim 10$ times) confidence intervals than previous approaches.

6/5/2024

stat.ML cs.CR cs.LG