Differentially Private Non-Convex Optimization under the KL Condition with Optimal Rates

2311.13447

Published 4/4/2024 by Michael Menart, Enayat Ullah, Raman Arora, Raef Bassily, Crist'obal Guzm'an

🛠️

Abstract

We study private empirical risk minimization (ERM) problem for losses satisfying the $(gamma,kappa)$-Kurdyka-{L}ojasiewicz (KL) condition. The Polyak-{L}ojasiewicz (PL) condition is a special case of this condition when $kappa=2$. Specifically, we study this problem under the constraint of $rho$ zero-concentrated differential privacy (zCDP). When $kappain[1,2]$ and the loss function is Lipschitz and smooth over a sufficiently large region, we provide a new algorithm based on variance reduced gradient descent that achieves the rate $tilde{O}big(big(frac{sqrt{d}}{nsqrt{rho}}big)^kappabig)$ on the excess empirical risk, where $n$ is the dataset size and $d$ is the dimension. We further show that this rate is nearly optimal. When $kappa geq 2$ and the loss is instead Lipschitz and weakly convex, we show it is possible to achieve the rate $tilde{O}big(big(frac{sqrt{d}}{nsqrt{rho}}big)^kappabig)$ with a private implementation of the proximal point method. When the KL parameters are unknown, we provide a novel modification and analysis of the noisy gradient descent algorithm and show that this algorithm achieves a rate of $tilde{O}big(big(frac{sqrt{d}}{nsqrt{rho}}big)^{frac{2kappa}{4-kappa}}big)$ adaptively, which is nearly optimal when $kappa = 2$. We further show that, without assuming the KL condition, the same gradient descent algorithm can achieve fast convergence to a stationary point when the gradient stays sufficiently large during the run of the algorithm. Specifically, we show that this algorithm can approximate stationary points of Lipschitz, smooth (and possibly nonconvex) objectives with rate as fast as $tilde{O}big(frac{sqrt{d}}{nsqrt{rho}}big)$ and never worse than $tilde{O}big(big(frac{sqrt{d}}{nsqrt{rho}}big)^{1/2}big)$. The latter rate matches the best known rate for methods that do not rely on variance reduction.

Create account to get full access

Overview

The researchers study a type of machine learning optimization problem called private empirical risk minimization (ERM).
They focus on loss functions that satisfy a specific mathematical condition called the Kurdyka-Łojasiewicz (KL) condition.
The researchers impose a privacy constraint on the optimization problem, requiring it to satisfy zero-concentrated differential privacy (zCDP).
They develop new algorithms and analyze the convergence rates for these algorithms under different assumptions about the loss function.

Plain English Explanation

The researchers are looking at a machine learning problem where the goal is to find the best model parameters by minimizing a loss function on a dataset. However, they want to do this in a way that protects the privacy of the individuals in the dataset.

The loss function they consider satisfies a specific mathematical property called the Kurdyka-Łojasiewicz (KL) condition. This means the loss function has a certain "well-behavedness" that makes it easier to optimize. The Polyak-Łojasiewicz (PL) condition is a special case of this.

The researchers impose a privacy constraint called zero-concentrated differential privacy (zCDP). This ensures the optimization process doesn't reveal too much about any individual in the dataset.

Under different assumptions about the loss function, the researchers develop new optimization algorithms and analyze how quickly these algorithms can converge to the optimal model parameters, while still satisfying the privacy constraint. They show that their algorithms can achieve nearly optimal convergence rates in many cases.

The key ideas are balancing the competing goals of optimizing the machine learning model and preserving the privacy of the individuals in the dataset. The technical analysis provides theoretical guarantees on the performance of the new algorithms.

Technical Explanation

The researchers study the private empirical risk minimization (ERM) problem, where the goal is to find the model parameters that minimize a loss function on a dataset, subject to a zero-concentrated differential privacy (zCDP) constraint.

They consider loss functions that satisfy the $(γ,κ)$-Kurdyka-Łojasiewicz (KL) condition, which includes the Polyak-Łojasiewicz (PL) condition as a special case when $κ=2$. Under this condition, the researchers develop new algorithms and analyze their convergence rates.

When $κ \in [1,2]$ and the loss function is Lipschitz and smooth, they propose a variance-reduced gradient descent algorithm that achieves a rate of $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^κ\big)$ on the excess empirical risk, where $n$ is the dataset size and $d$ is the dimension. They show this rate is nearly optimal.

When $κ \geq 2$ and the loss is Lipschitz and weakly convex, the researchers demonstrate that a private implementation of the proximal point method can achieve the rate $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^κ\big)$.

For the case where the KL parameters are unknown, the researchers provide a modified noisy gradient descent algorithm that adaptively achieves a rate of $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^{\frac{2κ}{4-κ}}\big)$, which is nearly optimal when $κ = 2$.

Finally, the researchers show that their gradient descent algorithm can also achieve fast convergence to a stationary point, without assuming the KL condition, as long as the gradient stays sufficiently large during the optimization process. The convergence rate in this case is as fast as $\tilde{O}\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)$ and never worse than $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^{1/2}\big)$, matching the best known rate for methods that do not rely on variance reduction.

Critical Analysis

The researchers make several important theoretical contributions in this work, providing new algorithms and convergence rate analyses for private ERM under the KL condition. The technical analysis is rigorous and the results are promising.

However, the paper does not address some practical considerations, such as the choice of the KL and zCDP parameters in real-world applications, or the implementation details of the proposed algorithms. Additionally, the paper does not include any empirical evaluation of the algorithms, which would be helpful to understand their performance in practice.

Furthermore, the assumptions made about the loss function, such as Lipschitzness and smoothness, may not hold for all machine learning problems. It would be valuable to explore the robustness of the algorithms to violations of these assumptions.

Overall, this research advances the theoretical understanding of private ERM, but further work is needed to bridge the gap between theory and practice and to explore the broader applicability of the techniques.

Conclusion

This paper presents new algorithms and analysis for the private empirical risk minimization (ERM) problem under the Kurdyka-Łojasiewicz (KL) condition, with a focus on achieving optimal convergence rates while satisfying a zero-concentrated differential privacy (zCDP) constraint.

The researchers develop several algorithms that can provably optimize a broad class of loss functions, including those satisfying the Polyak-Łojasiewicz (PL) condition as a special case. They show that their algorithms can achieve nearly optimal convergence rates, providing strong theoretical guarantees on the performance of private ERM.

These results contribute to the understanding of how to balance the competing goals of model optimization and privacy preservation in machine learning. The techniques developed in this paper could have important implications for the deployment of privacy-preserving machine learning systems in sensitive domains such as healthcare, finance, and personal data management.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🛠️

Private Stochastic Convex Optimization with Heavy Tails: Near-Optimality from Simple Reductions

Hilal Asi, Daogao Liu, Kevin Tian

We study the problem of differentially private stochastic convex optimization (DP-SCO) with heavy-tailed gradients, where we assume a $k^{text{th}}$-moment bound on the Lipschitz constants of sample functions rather than a uniform bound. We propose a new reduction-based approach that enables us to obtain the first optimal rates (up to logarithmic factors) in the heavy-tailed setting, achieving error $G_2 cdot frac 1 {sqrt n} + G_k cdot (frac{sqrt d}{nepsilon})^{1 - frac 1 k}$ under $(epsilon, delta)$-approximate differential privacy, up to a mild $textup{polylog}(frac{1}{delta})$ factor, where $G_2^2$ and $G_k^k$ are the $2^{text{nd}}$ and $k^{text{th}}$ moment bounds on sample Lipschitz constants, nearly-matching a lower bound of [Lowy and Razaviyayn 2023]. We further give a suite of private algorithms in the heavy-tailed setting which improve upon our basic result under additional assumptions, including an optimal algorithm under a known-Lipschitz constant assumption, a near-linear time algorithm for smooth functions, and an optimal linear time algorithm for smooth generalized linear models.

6/6/2024

cs.DS cs.CR cs.LG stat.ML

🧠

The Power of Sampling: Dimension-free Risk Bounds in Private ERM

Yin Tat Lee, Daogao Liu, Zhou Lu

Differentially private empirical risk minimization (DP-ERM) is a fundamental problem in private optimization. While the theory of DP-ERM is well-studied, as large-scale models become prevalent, traditional DP-ERM methods face new challenges, including (1) the prohibitive dependence on the ambient dimension, (2) the highly non-smooth objective functions, (3) costly first-order gradient oracles. Such challenges demand rethinking existing DP-ERM methodologies. In this work, we show that the regularized exponential mechanism combined with existing samplers can address these challenges altogether: under the standard unconstrained domain and low-rank gradients assumptions, our algorithm can achieve rank-dependent risk bounds for non-smooth convex objectives using only zeroth order oracles, which was not accomplished by prior methods. This highlights the power of sampling in differential privacy. We further construct lower bounds, demonstrating that when gradients are full-rank, there is no separation between the constrained and unconstrained settings. Our lower bound is derived from a general black-box reduction from unconstrained to the constrained domain and an improved lower bound in the constrained setting, which might be of independent interest.

6/5/2024

cs.LG cs.CR

🛠️

New!On Convex Optimization with Semi-Sensitive Features

Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Raghu Meka, Chiyuan Zhang

We study the differentially private (DP) empirical risk minimization (ERM) problem under the semi-sensitive DP setting where only some features are sensitive. This generalizes the Label DP setting where only the label is sensitive. We give improved upper and lower bounds on the excess risk for DP-ERM. In particular, we show that the error only scales polylogarithmically in terms of the sensitive domain size, improving upon previous results that scale polynomially in the sensitive domain size (Ghazi et al., 2021).

6/28/2024

cs.LG cs.CR cs.DS

🏷️

Optimal Locally Private Nonparametric Classification with Public Data

Yuheng Ma, Hanfang Yang

In this work, we investigate the problem of public data assisted non-interactive Local Differentially Private (LDP) learning with a focus on non-parametric classification. Under the posterior drift assumption, we for the first time derive the mini-max optimal convergence rate with LDP constraint. Then, we present a novel approach, the locally differentially private classification tree, which attains the mini-max optimal convergence rate. Furthermore, we design a data-driven pruning procedure that avoids parameter tuning and provides a fast converging estimator. Comprehensive experiments conducted on synthetic and real data sets show the superior performance of our proposed methods. Both our theoretical and experimental findings demonstrate the effectiveness of public data compared to private data, which leads to practical suggestions for prioritizing non-private data collection.

6/4/2024

stat.ML cs.CR cs.LG