The Computational Complexity of Finding Stationary Points in Non-Convex Optimization

Read original: arXiv:2310.09157 - Published 9/14/2024 by Alexandros Hollender, Manolis Zampetakis

🛠️

Overview

The paper investigates the computational and query complexity of finding approximate stationary points of non-convex but smooth objective functions over unrestricted domains.
It presents several key results, including the PLS-completeness of this problem, and efficient algorithms with tight query complexity bounds for the 2D case.
The paper also establishes a connection between finding approximate stationary points in unconstrained optimization and approximate KKT points in constrained optimization.

Plain English Explanation

In the world of optimization, finding points where the gradient (the direction of steepest change) of a function is approximately zero is a fundamental challenge, especially when the function is non-convex (not bowl-shaped) but still smooth. This problem arises in many real-world applications, such as machine learning and non-linear programming.

The paper tackles this problem from a theoretical perspective, aiming to understand the inherent computational and query complexity (the number of times the function needs to be evaluated) of finding these approximate stationary points. The key findings are:

The problem of finding approximate stationary points is PLS-complete, meaning it is computationally difficult and likely takes a long time to solve in the worst case.
For 2D problems (where the domain has only two dimensions), the paper provides an efficient algorithm that can find approximate stationary points using only a small number of function evaluations.
It also shows that any algorithm needs at least a certain number of function evaluations to find approximate stationary points in 2D, which matches the upper bound of the efficient algorithm, thereby characterizing the optimal query complexity for this problem.
For constrained optimization problems in 2D, the paper provides an algorithm to find approximate KKT points, which are a generalization of stationary points, and shows that the query complexity is optimal.
Finally, the paper establishes a connection between finding approximate stationary points in unconstrained optimization and approximate KKT points in constrained optimization, showing that the former is a more general problem.

Technical Explanation

The paper first shows that the problem of finding approximate stationary points of non-convex but smooth objective functions over unrestricted domains is PLS-complete. This means that the problem is computationally difficult and likely takes a long time to solve in the worst case.

Next, the paper focuses on the 2D case (where the domain has only two dimensions) and provides a zero-order algorithm (an algorithm that only requires the function values, not the gradients) that can find ε-approximate stationary points using at most O(1/ε) function evaluations. This means that as the approximation error ε gets smaller, the algorithm needs a larger number of function evaluations to find the approximate stationary point.

The paper then proves a matching lower bound, showing that any algorithm needs at least Ω(1/ε) function evaluations and/or gradient evaluations to find ε-approximate stationary points in 2D. This means that the algorithm provided is optimal in terms of the number of function evaluations required.

For constrained optimization problems in 2D, the paper provides a zero-order algorithm to find ε-KKT points, which are a generalization of stationary points that take the constraints into account. This algorithm requires at most O(1/√ε) function evaluations, matching the lower bound shown by previous work.

Finally, the paper combines its results with a recent finding to show that the problem of finding approximate KKT points in constrained optimization is reducible to the problem of finding approximate stationary points in unconstrained optimization, but not the other way around. This establishes a connection between these two fundamental problems in optimization.

Critical Analysis

The paper provides a comprehensive understanding of the computational and query complexity of finding approximate stationary points of non-convex but smooth objective functions over unrestricted domains. The PLS-completeness result highlights the inherent difficulty of this problem, while the tight complexity bounds for the 2D case offer valuable insights.

One potential limitation of the paper is that the results are primarily focused on the 2D case, and it remains to be seen how the algorithms and complexity bounds scale to higher dimensions. Additionally, the paper does not address the practical performance of the proposed algorithms, which could be an area for future research.

Furthermore, the connection between finding approximate stationary points in unconstrained optimization and approximate KKT points in constrained optimization is an interesting theoretical insight, but its practical implications for solving real-world optimization problems could be explored further.

Conclusion

This paper makes significant contributions to the field of non-convex optimization by providing a thorough analysis of the computational and query complexity of finding approximate stationary points. The key results include the PLS-completeness of the problem, tight complexity bounds for the 2D case, and the relationship between unconstrained and constrained optimization problems.

These findings advance our understanding of the inherent challenges and limitations in this fundamental area of optimization, which has important applications in machine learning, engineering, and many other fields. The insights provided by this paper can inform the design of more efficient algorithms and guide future research in this direction.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🛠️

The Computational Complexity of Finding Stationary Points in Non-Convex Optimization

Alexandros Hollender, Manolis Zampetakis

Finding approximate stationary points, i.e., points where the gradient is approximately zero, of non-convex but smooth objective functions $f$ over unrestricted $d$-dimensional domains is one of the most fundamental problems in classical non-convex optimization. Nevertheless, the computational and query complexity of this problem are still not well understood when the dimension $d$ of the problem is independent of the approximation error. In this paper, we show the following computational and query complexity results: 1. The problem of finding approximate stationary points over unrestricted domains is PLS-complete. 2. For $d = 2$, we provide a zero-order algorithm for finding $varepsilon$-approximate stationary points that requires at most $O(1/varepsilon)$ value queries to the objective function. 3. We show that any algorithm needs at least $Omega(1/varepsilon)$ queries to the objective function and/or its gradient to find $varepsilon$-approximate stationary points when $d=2$. Combined with the above, this characterizes the query complexity of this problem to be $Theta(1/varepsilon)$. 4. For $d = 2$, we provide a zero-order algorithm for finding $varepsilon$-KKT points in constrained optimization problems that requires at most $O(1/sqrt{varepsilon})$ value queries to the objective function. This closes the gap between the works of Bubeck and Mikulincer [2020] and Vavasis [1993] and characterizes the query complexity of this problem to be $Theta(1/sqrt{varepsilon})$. 5. Combining our results with the recent result of Fearnley et al. [2022], we show that finding approximate KKT points in constrained optimization is reducible to finding approximate stationary points in unconstrained optimization but the converse is impossible.

9/14/2024

🛠️

Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees

Zhaosong Lu, Sanyou Mei, Yifeng Xiao

In this paper, we study a class of deterministically constrained stochastic optimization problems. Existing methods typically aim to find an $epsilon$-stochastic stationary point, where the expected violations of both constraints and first-order stationarity are within a prescribed accuracy $epsilon$. However, in many practical applications, it is crucial that the constraints be nearly satisfied with certainty, making such an $epsilon$-stochastic stationary point potentially undesirable due to the risk of significant constraint violations. To address this issue, we propose single-loop variance-reduced stochastic first-order methods, where the stochastic gradient of the stochastic component is computed using either a truncated recursive momentum scheme or a truncated Polyak momentum scheme for variance reduction, while the gradient of the deterministic component is computed exactly. Under the error bound condition with a parameter $theta geq 1$ and other suitable assumptions, we establish that the proposed methods achieve a sample complexity and first-order operation complexity of $widetilde O(epsilon^{-max{4, 2theta}})$ for finding a stronger $epsilon$-stochastic stationary point, where the constraint violation is within $epsilon$ with certainty, and the expected violation of first-order stationarity is within $epsilon$. To the best of our knowledge, this is the first work to develop methods with provable complexity guarantees for finding an approximate stochastic stationary point of such problems that nearly satisfies all constraints with certainty.

9/18/2024

🔍

An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization

Guy Kornowski, Ohad Shamir

We study the complexity of producing $(delta,epsilon)$-stationary points of Lipschitz objectives which are possibly neither smooth nor convex, using only noisy function evaluations. Recent works proposed several stochastic zero-order algorithms that solve this task, all of which suffer from a dimension-dependence of $Omega(d^{3/2})$ where $d$ is the dimension of the problem, which was conjectured to be optimal. We refute this conjecture by providing a faster algorithm that has complexity $O(ddelta^{-1}epsilon^{-3})$, which is optimal (up to numerical constants) with respect to $d$ and also optimal with respect to the accuracy parameters $delta,epsilon$, thus solving an open question due to Lin et al. (NeurIPS'22). Moreover, the convergence rate achieved by our algorithm is also optimal for smooth objectives, proving that in the nonconvex stochastic zero-order setting, nonsmooth optimization is as easy as smooth optimization. We provide algorithms that achieve the aforementioned convergence rate in expectation as well as with high probability. Our analysis is based on a simple yet powerful lemma regarding the Goldstein-subdifferential set, which allows utilizing recent advancements in first-order nonsmooth nonconvex optimization.

4/16/2024

How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization

Andrew Lowy, Jonathan Ullman, Stephen J. Wright

We provide a simple and flexible framework for designing differentially private algorithms to find approximate stationary points of non-convex loss functions. Our framework is based on using a private approximate risk minimizer to warm start another private algorithm for finding stationary points. We use this framework to obtain improved, and sometimes optimal, rates for several classes of non-convex loss functions. First, we obtain improved rates for finding stationary points of smooth non-convex empirical loss functions. Second, we specialize to quasar-convex functions, which generalize star-convex functions and arise in learning dynamical systems and training some neural nets. We achieve the optimal rate for this class. Third, we give an optimal algorithm for finding stationary points of functions satisfying the Kurdyka-Lojasiewicz (KL) condition. For example, over-parameterized neural networks often satisfy this condition. Fourth, we provide new state-of-the-art rates for stationary points of non-convex population loss functions. Fifth, we obtain improved rates for non-convex generalized linear models. A modification of our algorithm achieves nearly the same rates for second-order stationary points of functions with Lipschitz Hessian, improving over the previous state-of-the-art for each of the above problems.

8/21/2024