Spurious Stationarity and Hardness Results for Mirror Descent

Read original: arXiv:2404.08073 - Published 4/15/2024 by He Chen, Jiajin Li, Anthony Man-Cho So

Spurious Stationarity and Hardness Results for Mirror Descent

Overview

This paper examines the convergence properties of the mirror descent optimization algorithm, which is widely used in machine learning and optimization.
The authors show that mirror descent can exhibit "spurious stationarity," where the algorithm converges to a point that is not a true stationary point of the optimization problem.
They also prove hardness results, demonstrating that it is computationally difficult to determine whether a given point is a true stationary point for certain classes of optimization problems.

Plain English Explanation

In optimization, researchers often use algorithms like mirror descent to find the best solution to a problem. These algorithms work by iteratively updating a candidate solution, gradually moving it closer to the optimal point.

The authors of this paper found that the mirror descent algorithm can sometimes get "stuck" at a point that seems good, but is actually not the true optimal solution. They call this "spurious stationarity" - the algorithm thinks it has reached a good stopping point, but it's actually not the best possible solution.

The authors also showed that it's computationally very difficult to determine whether a given point is truly the optimal solution or just a "spurious" one that the algorithm got stuck at. This means that even if you run the mirror descent algorithm and it seems to have converged, it can be very hard to tell if you've actually found the best possible solution.

These findings have important implications for the use of mirror descent and other optimization algorithms in real-world applications, like machine learning and data analysis. Researchers and practitioners need to be aware of the potential for these algorithms to converge to suboptimal points, and may need to use additional techniques, such as sharpness-aware minimization, to ensure they are finding the true optimal solution.

Technical Explanation

The paper analyzes the convergence properties of the mirror descent optimization algorithm, which is widely used in machine learning and optimization. The authors show that mirror descent can exhibit "spurious stationarity," where the algorithm converges to a point that is not a true stationary point of the optimization problem.

Specifically, the authors prove that for certain classes of optimization problems, it is computationally difficult to determine whether a given point is a true stationary point or a "spurious" one that the mirror descent algorithm has converged to. This hardness result holds even for simple, convex optimization problems.

The authors' findings have important implications for the use of mirror descent and other optimization algorithms in real-world applications, such as machine learning and data analysis. Researchers and practitioners need to be aware of the potential for these algorithms to converge to suboptimal points, and may need to use additional techniques, such as sharpness-aware minimization, to ensure they are finding the true optimal solution.

Critical Analysis

The paper provides a rigorous theoretical analysis of the convergence properties of mirror descent, highlighting the important issue of "spurious stationarity." This is a valuable contribution to the optimization literature, as it cautions against blindly relying on the convergence of mirror descent to find optimal solutions.

However, the authors' hardness results are limited to certain classes of optimization problems, and it's not clear how widely applicable they are in practice. Additionally, the paper does not provide concrete strategies for addressing the issue of spurious stationarity, aside from suggesting the use of techniques like sharpness-aware minimization.

Further research is needed to understand the practical implications of these findings and to develop more robust optimization algorithms that can reliably converge to true optimal solutions, even in the presence of spurious stationary points. Comparing the performance of mirror descent to other optimization methods, such as persistent classification, could also provide valuable insights.

Conclusion

This paper uncovers a fundamental issue with the mirror descent optimization algorithm, showing that it can converge to "spurious" stationary points that are not true optimal solutions. The authors also prove that it is computationally difficult to determine whether a given point is a true stationary point or a spurious one.

These findings have important implications for the use of mirror descent and other optimization algorithms in machine learning, data analysis, and other real-world applications. Researchers and practitioners need to be aware of the potential for these algorithms to converge to suboptimal points, and may need to use additional techniques to ensure they are finding the true optimal solution.

While the paper provides a valuable theoretical analysis, more research is needed to understand the practical implications of these issues and to develop more robust optimization methods that can reliably converge to true optimal solutions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Spurious Stationarity and Hardness Results for Mirror Descent

He Chen, Jiajin Li, Anthony Man-Cho So

Despite the considerable success of Bregman proximal-type algorithms, such as mirror descent, in machine learning, a critical question remains: Can existing stationarity measures, often based on Bregman divergence, reliably distinguish between stationary and non-stationary points? In this paper, we present a groundbreaking finding: All existing stationarity measures necessarily imply the existence of spurious stationary points. We further establish an algorithmic independent hardness result: Bregman proximal-type algorithms are unable to escape from a spurious stationary point in finite steps when the initial point is unfavorable, even for convex problems. Our hardness result points out the inherent distinction between Euclidean and Bregman geometries, and introduces both fundamental theoretical and numerical challenges to both machine learning and optimization communities.

4/15/2024

🛠️

The Computational Complexity of Finding Stationary Points in Non-Convex Optimization

Alexandros Hollender, Manolis Zampetakis

Finding approximate stationary points, i.e., points where the gradient is approximately zero, of non-convex but smooth objective functions $f$ over unrestricted $d$-dimensional domains is one of the most fundamental problems in classical non-convex optimization. Nevertheless, the computational and query complexity of this problem are still not well understood when the dimension $d$ of the problem is independent of the approximation error. In this paper, we show the following computational and query complexity results: 1. The problem of finding approximate stationary points over unrestricted domains is PLS-complete. 2. For $d = 2$, we provide a zero-order algorithm for finding $varepsilon$-approximate stationary points that requires at most $O(1/varepsilon)$ value queries to the objective function. 3. We show that any algorithm needs at least $Omega(1/varepsilon)$ queries to the objective function and/or its gradient to find $varepsilon$-approximate stationary points when $d=2$. Combined with the above, this characterizes the query complexity of this problem to be $Theta(1/varepsilon)$. 4. For $d = 2$, we provide a zero-order algorithm for finding $varepsilon$-KKT points in constrained optimization problems that requires at most $O(1/sqrt{varepsilon})$ value queries to the objective function. This closes the gap between the works of Bubeck and Mikulincer [2020] and Vavasis [1993] and characterizes the query complexity of this problem to be $Theta(1/sqrt{varepsilon})$. 5. Combining our results with the recent result of Fearnley et al. [2022], we show that finding approximate KKT points in constrained optimization is reducible to finding approximate stationary points in unconstrained optimization but the converse is impossible.

9/14/2024

Stable Phase Retrieval with Mirror Descent

Jean-Jacques Godeme, Jalal Fadili, Claude Amra, Myriam Zerrad

In this paper, we aim to reconstruct an n-dimensional real vector from m phaseless measurements corrupted by an additive noise. We extend the noiseless framework developed in [15], based on mirror descent (or Bregman gradient descent), to deal with noisy measurements and prove that the procedure is stable to (small enough) additive noise. In the deterministic case, we show that mirror descent converges to a critical point of the phase retrieval problem, and if the algorithm is well initialized and the noise is small enough, the critical point is near the true vector up to a global sign change. When the measurements are i.i.d Gaussian and the signal-to-noise ratio is large enough, we provide global convergence guarantees that ensure that with high probability, mirror descent converges to a global minimizer near the true vector (up to a global sign change), as soon as the number of measurements m is large enough. The sample complexity bound can be improved if a spectral method is used to provide a good initial guess. We complement our theoretical study with several numerical results showing that mirror descent is both a computationally and statistically efficient scheme to solve the phase retrieval problem.

6/21/2024

🛠️

Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees

Zhaosong Lu, Sanyou Mei, Yifeng Xiao

In this paper, we study a class of deterministically constrained stochastic optimization problems. Existing methods typically aim to find an $epsilon$-stochastic stationary point, where the expected violations of both constraints and first-order stationarity are within a prescribed accuracy $epsilon$. However, in many practical applications, it is crucial that the constraints be nearly satisfied with certainty, making such an $epsilon$-stochastic stationary point potentially undesirable due to the risk of significant constraint violations. To address this issue, we propose single-loop variance-reduced stochastic first-order methods, where the stochastic gradient of the stochastic component is computed using either a truncated recursive momentum scheme or a truncated Polyak momentum scheme for variance reduction, while the gradient of the deterministic component is computed exactly. Under the error bound condition with a parameter $theta geq 1$ and other suitable assumptions, we establish that the proposed methods achieve a sample complexity and first-order operation complexity of $widetilde O(epsilon^{-max{4, 2theta}})$ for finding a stronger $epsilon$-stochastic stationary point, where the constraint violation is within $epsilon$ with certainty, and the expected violation of first-order stationarity is within $epsilon$. To the best of our knowledge, this is the first work to develop methods with provable complexity guarantees for finding an approximate stochastic stationary point of such problems that nearly satisfies all constraints with certainty.

9/18/2024