Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming

Read original: arXiv:2205.13687 - Published 4/16/2024 by Sen Na, Michael W. Mahoney

🤯

Overview

This paper investigates online statistical inference for constrained stochastic nonlinear optimization problems.
The researchers apply the Stochastic Sequential Quadratic Programming (StoSQP) method to solve these optimization problems.
They use an iterative sketching solver to reduce the computational cost of the StoSQP method.
The paper shows that the rescaled primal-dual sequence of the StoSQP method converges to a mean-zero Gaussian distribution.
The researchers also analyze a plug-in covariance matrix estimator to perform inference in practice.
They illustrate the asymptotic normality result on benchmark nonlinear problems and linearly/nonlinearly constrained regression problems.

Plain English Explanation

In this paper, the researchers are looking at a particular type of optimization problem that involves dealing with uncertainty and constraints. These types of problems can arise in Cyber-Physical Robotic Systems or in Gaussian Process Regression with Soft Inequality and Monotonicity Constraints.

The researchers use a method called Stochastic Sequential Quadratic Programming (StoSQP) to solve these optimization problems. This method works by breaking the problem down into a series of smaller, more manageable problems that can be solved one after the other. The key idea is to use an approximate, or "sketched," version of the original problem, which reduces the amount of computation required.

The paper shows that as the StoSQP method progresses, the solution it finds gradually gets closer to the true optimal solution. In fact, the researchers prove that the difference between the StoSQP solution and the true optimal solution follows a Gaussian, or "bell-shaped," distribution. This means that the StoSQP method not only finds a good solution, but it also provides information about how confident we can be in that solution.

The researchers also develop a way to estimate the covariance matrix of this Gaussian distribution, which can be useful for Primal-Dual iLQR and Stochastic Constrained Decentralized Optimization problems. They test their method on some standard benchmark problems, as well as on some regression problems with constraints, and show that it works well in practice.

Technical Explanation

The researchers consider online statistical inference of constrained stochastic nonlinear optimization problems. They apply the Stochastic Sequential Quadratic Programming (StoSQP) method to solve these problems, which can be regarded as applying second-order Newton's method to the Karush-Kuhn-Tucker (KKT) conditions.

In each iteration, the StoSQP method computes the Newton direction by solving a quadratic program, and then selects a proper adaptive stepsize bar{alpha}_t to update the primal-dual iterate. To reduce the dominant computational cost of the method, the researchers inexactly solve the quadratic program in each iteration by employing an iterative sketching solver. Notably, the approximation error of the sketching solver need not vanish as iterations proceed, meaning that the per-iteration computational cost does not blow up.

For the above StoSQP method, the researchers show that under mild assumptions, the rescaled primal-dual sequence 1/sqrt{bar{alpha}_t}cdot (x_t - x^star, lambda_t - lambda^star) converges to a mean-zero Gaussian distribution with a nontrivial covariance matrix depending on the underlying sketching distribution. To perform inference in practice, they also analyze a plug-in covariance matrix estimator.

The researchers illustrate the asymptotic normality result of the method both on benchmark nonlinear problems in the CUTEst test set and on linearly/nonlinearly constrained regression problems.

Critical Analysis

The paper provides a thorough theoretical analysis of the StoSQP method and its statistical properties. The researchers make several assumptions, such as the smoothness of the objective function and the constraints, as well as the existence of a unique solution to the optimization problem. While these assumptions are reasonable, they may not hold in all practical situations, and the method's performance may degrade if these assumptions are violated.

Additionally, the paper does not address the choice of the sketching distribution or the impact of this choice on the method's performance. The researchers state that the covariance matrix of the limiting Gaussian distribution depends on the sketching distribution, but they do not provide guidance on how to select an appropriate sketching distribution in practice.

Another potential limitation is the scalability of the method, particularly for large-scale optimization problems. While the use of the sketching solver reduces the computational cost of each iteration, the overall computational burden may still be high for problems with a large number of variables and constraints.

Despite these potential limitations, the paper makes a valuable contribution to the field of stochastic optimization by providing a rigorous theoretical analysis of the StoSQP method and demonstrating its practical applicability on a range of benchmark problems. The results and insights presented in the paper can be useful for researchers and practitioners working on Stochastic Hessian Fittings on Lie Groups and other constrained optimization problems in various domains.

Conclusion

This paper introduces a novel approach to solving constrained stochastic nonlinear optimization problems using the Stochastic Sequential Quadratic Programming (StoSQP) method. The key contribution of the paper is the theoretical analysis of the statistical properties of the StoSQP method, showing that the rescaled primal-dual sequence converges to a mean-zero Gaussian distribution.

The use of an iterative sketching solver to reduce the computational cost of the method is an important practical innovation, as it allows the StoSQP method to be applied to larger-scale problems without a significant increase in computational burden. The paper also provides a way to estimate the covariance matrix of the limiting Gaussian distribution, which can be useful for Primal-Dual iLQR and Stochastic Constrained Decentralized Optimization problems.

Overall, this paper makes a significant contribution to the field of stochastic optimization and provides a valuable tool for researchers and practitioners working on Cyber-Physical Robotic Systems and Gaussian Process Regression with Soft Inequality and Monotonicity Constraints.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤯

Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming

Sen Na, Michael W. Mahoney

We consider online statistical inference of constrained stochastic nonlinear optimization problems. We apply the Stochastic Sequential Quadratic Programming (StoSQP) method to solve these problems, which can be regarded as applying second-order Newton's method to the Karush-Kuhn-Tucker (KKT) conditions. In each iteration, the StoSQP method computes the Newton direction by solving a quadratic program, and then selects a proper adaptive stepsize $bar{alpha}_t$ to update the primal-dual iterate. To reduce dominant computational cost of the method, we inexactly solve the quadratic program in each iteration by employing an iterative sketching solver. Notably, the approximation error of the sketching solver need not vanish as iterations proceed, meaning that the per-iteration computational cost does not blow up. For the above StoSQP method, we show that under mild assumptions, the rescaled primal-dual sequence $1/sqrt{bar{alpha}_t}cdot (x_t - x^star, lambda_t - lambda^star)$ converges to a mean-zero Gaussian distribution with a nontrivial covariance matrix depending on the underlying sketching distribution. To perform inference in practice, we also analyze a plug-in covariance matrix estimator. We illustrate the asymptotic normality result of the method both on benchmark nonlinear problems in CUTEst test set and on linearly/nonlinearly constrained regression problems.

4/16/2024

Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models

Yuchen Fang, Sen Na, Michael W. Mahoney, Mladen Kolar

In this work, we consider solving optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Sequential Quadratic Programming method to find both first- and second-order stationary points. Our method utilizes a random model to represent the objective function, which is constructed from stochastic observations of the objective and is designed to satisfy proper adaptive accuracy conditions with a high but fixed probability. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a quadratic approximation of the objective subject to a (relaxed) linear approximation of the problem constraints and a trust-region constraint. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature of the reduced Hessian matrix, as well as a second-order correction step to address the potential Maratos effect, which arises due to the nonlinearity of the problem constraints. Such an effect may impede the method from moving away from saddle points. Both gradient and eigen step computations leverage a novel parameter-free decomposition of the step and the trust-region radius, accounting for the proportions among the feasibility residual, optimality residual, and negative curvature. We establish global almost sure first- and second-order convergence guarantees for our method, and present computational results on CUTEst problems, regression problems, and saddle-point problems to demonstrate its superiority over existing line-search-based stochastic methods.

9/27/2024

The Stochastic Conjugate Subgradient Algorithm For Kernel Support Vector Machines

Di Zhang, Suvrajeet Sen

Stochastic First-Order (SFO) methods have been a cornerstone in addressing a broad spectrum of modern machine learning (ML) challenges. However, their efficacy is increasingly questioned, especially in large-scale applications where empirical evidence indicates potential performance limitations. In response, this paper proposes an innovative method specifically designed for kernel support vector machines (SVMs). This method not only achieves faster convergence per iteration but also exhibits enhanced scalability when compared to conventional SFO techniques. Diverging from traditional sample average approximation strategies that typically frame kernel SVM as an 'all-in-one' Quadratic Program (QP), our approach adopts adaptive sampling. This strategy incrementally refines approximation accuracy on an 'as-needed' basis. Crucially, this approach also inspires a decomposition-based algorithm, effectively decomposing parameter selection from error estimation, with the latter being independently determined for each data point. To exploit the quadratic nature of the kernel matrix, we introduce a stochastic conjugate subgradient method. This method preserves many benefits of first-order approaches while adeptly handling both nonlinearity and non-smooth aspects of the SVM problem. Thus, it extends beyond the capabilities of standard SFO algorithms for non-smooth convex optimization. The convergence rate of this novel method is thoroughly analyzed within this paper. Our experimental results demonstrate that the proposed algorithm not only maintains but potentially exceeds the scalability of SFO methods. Moreover, it significantly enhances both speed and accuracy of the optimization process.

8/1/2024

🛠️

Zero-Order Optimization for Gaussian Process-based Model Predictive Control

Amon Lahr, Andrea Zanelli, Andrea Carron, Melanie N. Zeilinger

By enabling constraint-aware online model adaptation, model predictive control using Gaussian process (GP) regression has exhibited impressive performance in real-world applications and received considerable attention in the learning-based control community. Yet, solving the resulting optimal control problem in real-time generally remains a major challenge, due to i) the increased number of augmented states in the optimization problem, as well as ii) computationally expensive evaluations of the posterior mean and covariance and their respective derivatives. To tackle these challenges, we employ i) a tailored Jacobian approximation in a sequential quadratic programming (SQP) approach, and combine it with ii) a parallelizable GP inference and automatic differentiation framework. Reducing the numerical complexity with respect to the state dimension $n_x$ for each SQP iteration from $mathcal{O}(n_x^6)$ to $mathcal{O}(n_x^3)$, and accelerating GP evaluations on a graphical processing unit, the proposed algorithm computes suboptimal, yet feasible solutions at drastically reduced computation times and exhibits favorable local convergence properties. Numerical experiments verify the scaling properties and investigate the runtime distribution across different parts of the algorithm.

9/17/2024