First-order methods for Stochastic Variational Inequality problems with Function Constraints

2304.04778

Published 5/27/2024 by Digvijay Boob, Qi Deng, Mohammad Khalafi

🧪

Abstract

The monotone Variational Inequality (VI) is a general model with important applications in various engineering and scientific domains. In numerous instances, the VI problems are accompanied by function constraints that can be data-driven, making the usual projection operator challenging to compute. This paper presents novel first-order methods for the function-constrained Variational Inequality (FCVI) problem in smooth or nonsmooth settings with possibly stochastic operators and constraints. We introduce the AdOpEx method, which employs an operator extrapolation on the KKT operator of the FCVI in a smooth deterministic setting. Since this operator is not uniformly Lipschitz continuous in the Lagrange multipliers, we employ an adaptive two-timescale algorithm leading to bounded multipliers and achieving the optimal $O(1/T)$ convergence rate. For the nonsmooth and stochastic VIs, we introduce design changes to the AdOpEx method and propose a novel P-OpEx method that takes partial extrapolation. It converges at the rate of $O(1/sqrt{T})$ when both the operator and constraints are stochastic or nonsmooth. This method has suboptimal dependence on the noise and Lipschitz constants of function constraints. We propose a constraint extrapolation approach leading to the OpConEx method that improves this dependence by an order of magnitude. All our algorithms easily extend to saddle point problems with function constraints that couple the primal and dual variables while maintaining the same complexity results. To the best of our knowledge, all our complexity results are new in the literature

Create account to get full access

Overview

The paper introduces novel first-order methods for solving the Function-Constrained Variational Inequality (FCVI) problem, which is a generalization of the standard Variational Inequality (VI) problem.
The FCVI problem arises when the VI is accompanied by function constraints that can be data-driven, making the standard projection operator challenging to compute.
The authors propose three new algorithms to solve FCVI problems in smooth, nonsmooth, and stochastic settings, achieving optimal or near-optimal convergence rates.
The algorithms can also be extended to saddle point problems with function constraints that couple the primal and dual variables.

Plain English Explanation

The paper focuses on a type of mathematical optimization problem called the Variational Inequality (VI) problem. VIs are general models with many applications in fields like engineering and science. However, in some cases, the VI problem can be complicated by additional constraints on the functions involved. These function constraints may be based on data, making them difficult to work with.

The authors introduce three new algorithms to solve these Function-Constrained Variational Inequality (FCVI) problems:

AdOpEx: This method uses an "operator extrapolation" technique to efficiently solve smooth, deterministic FCVI problems. It addresses the challenge that the key operator in the problem is not uniformly Lipschitz continuous.
P-OpEx: This algorithm is designed for nonsmooth and stochastic VI problems. It uses a "partial extrapolation" approach to achieve a good convergence rate, even when both the operator and constraints are uncertain or non-smooth.
OpConEx: This is an extension of P-OpEx that further improves the algorithm's dependence on the noise and Lipschitz constants of the function constraints.

All three algorithms can also be applied to more general saddle point problems with function constraints, while maintaining their strong theoretical guarantees.

Technical Explanation

The paper focuses on solving the Function-Constrained Variational Inequality (FCVI) problem, which is a generalization of the standard Variational Inequality (VI) problem. In FCVI, the VI is accompanied by function constraints that can be data-driven, making the usual projection operator challenging to compute.

The authors introduce three novel first-order methods to solve FCVI problems:

AdOpEx: This method employs an "operator extrapolation" technique on the KKT (Karush-Kuhn-Tucker) operator of the FCVI in a smooth, deterministic setting. Since the KKT operator is not uniformly Lipschitz continuous in the Lagrange multipliers, the authors use an adaptive two-timescale algorithm to achieve the optimal $O(1/T)$ convergence rate.
P-OpEx: For nonsmooth and stochastic VIs, the authors make design changes to the AdOpEx method and propose the P-OpEx algorithm, which uses a "partial extrapolation" approach. P-OpEx converges at the rate of $O(1/\sqrt{T})$ when both the operator and constraints are stochastic or nonsmooth, but has suboptimal dependence on the noise and Lipschitz constants of the function constraints.
OpConEx: To address the suboptimal dependence in P-OpEx, the authors propose the OpConEx method, which uses a "constraint extrapolation" approach. OpConEx improves the dependence on the noise and Lipschitz constants of the function constraints by an order of magnitude.

All three algorithms can be extended to solve saddle point problems with function constraints that couple the primal and dual variables, while maintaining the same complexity results.

Critical Analysis

The paper presents a comprehensive set of new algorithms for solving FCVI problems, which are a generalization of standard VI problems and have important applications in various fields. The authors have provided thorough theoretical analysis, including convergence guarantees and optimal or near-optimal complexity results.

One potential limitation is that the algorithms may still have suboptimal dependence on certain problem parameters, such as the noise and Lipschitz constants of the function constraints. The authors have acknowledged this and proposed the OpConEx method to address this issue, but there may still be room for further improvement.

Additionally, the practical implementation and performance of these algorithms on real-world problems are not extensively evaluated in the paper. It would be valuable to see how the proposed methods compare to existing approaches on a variety of benchmarks or application-specific datasets.

Nonetheless, the research presented in this paper makes a significant contribution to the field of optimization, particularly in the context of Variational Inequality and constrained optimization problems with function constraints. The algorithms developed in this work could find widespread use in domains where such problems arise, such as distributed optimization and variational inference.

Conclusion

The paper presents novel first-order methods for solving Function-Constrained Variational Inequality (FCVI) problems, which generalize standard Variational Inequality problems by incorporating data-driven function constraints. The authors introduce three algorithms - AdOpEx, P-OpEx, and OpConEx - that can handle smooth, nonsmooth, and stochastic settings, respectively, and achieve optimal or near-optimal convergence rates.

These algorithms extend to more general saddle point problems with function constraints, demonstrating their versatility and potential impact on a wide range of optimization problems in various scientific and engineering domains. While some limitations remain in terms of parameter dependence, the research makes a significant contribution to the field of constrained optimization and provides a solid foundation for further advancements in this area.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Projection-Free Variance Reduction Methods for Stochastic Constrained Multi-Level Compositional Optimization

Wei Jiang, Sifan Yang, Wenhao Yang, Yibo Wang, Yuanyu Wan, Lijun Zhang

This paper investigates projection-free algorithms for stochastic constrained multi-level optimization. In this context, the objective function is a nested composition of several smooth functions, and the decision set is closed and convex. Existing projection-free algorithms for solving this problem suffer from two limitations: 1) they solely focus on the gradient mapping criterion and fail to match the optimal sample complexities in unconstrained settings; 2) their analysis is exclusively applicable to non-convex functions, without considering convex and strongly convex objectives. To address these issues, we introduce novel projection-free variance reduction algorithms and analyze their complexities under different criteria. For gradient mapping, our complexities improve existing results and match the optimal rates for unconstrained problems. For the widely-used Frank-Wolfe gap criterion, we provide theoretical guarantees that align with those for single-level problems. Additionally, by using a stage-wise adaptation, we further obtain complexities for convex and strongly convex functions. Finally, numerical experiments on different tasks demonstrate the effectiveness of our methods.

6/7/2024

cs.LG

A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization

Andrzej Ruszczy'nski, Shangzhe Yang

We consider stochastic optimization problems involving an expected value of a nonlinear function of a base random vector and a conditional expectation of another function depending on the base random vector, a dependent random vector, and the decision variables. We call such problems conditional stochastic optimization problems. They arise in many applications, such as uplift modeling, reinforcement learning, and contextual optimization. We propose a specialized single time-scale stochastic method for nonconvex constrained conditional stochastic optimization problems with a Lipschitz smooth outer function and a generalized differentiable inner function. In the method, we approximate the inner conditional expectation with a rich parametric model whose mean squared error satisfies a stochastic version of a {L}ojasiewicz condition. The model is used by an inner learning algorithm. The main feature of our approach is that unbiased stochastic estimates of the directions used by the method can be generated with one observation from the joint distribution per iteration, which makes it applicable to real-time learning. The directions, however, are not gradients or subgradients of any overall objective function. We prove the convergence of the method with probability one, using the method of differential inclusions and a specially designed Lyapunov function, involving a stochastic generalization of the Bregman distance. Finally, a numerical illustration demonstrates the viability of our approach.

5/20/2024

cs.LG stat.ML

Stochastic First-Order Methods with Non-smooth and Non-Euclidean Proximal Terms for Nonconvex High-Dimensional Stochastic Optimization

Yue Xie, Jiawen Bi, Hongcheng Liu

When the nonconvex problem is complicated by stochasticity, the sample complexity of stochastic first-order methods may depend linearly on the problem dimension, which is undesirable for large-scale problems. In this work, we propose dimension-insensitive stochastic first-order methods (DISFOMs) to address nonconvex optimization with expected-valued objective function. Our algorithms allow for non-Euclidean and non-smooth distance functions as the proximal terms. Under mild assumptions, we show that DISFOM using minibatches to estimate the gradient enjoys sample complexity of $ mathcal{O} ( (log d) / epsilon^4 ) $ to obtain an $epsilon$-stationary point. Furthermore, we prove that DISFOM employing variance reduction can sharpen this bound to $mathcal{O} ( (log d)^{2/3}/epsilon^{10/3} )$, which perhaps leads to the best-known sample complexity result in terms of $d$. We provide two choices of the non-smooth distance functions, both of which allow for closed-form solutions to the proximal step. Numerical experiments are conducted to illustrate the dimension insensitive property of the proposed frameworks.

7/1/2024

cs.LG

First-Order Methods for Linearly Constrained Bilevel Optimization

Guy Kornowski, Swati Padmanabhan, Kai Wang, Zhe Zhang, Suvrit Sra

Algorithms for bilevel optimization often encounter Hessian computations, which are prohibitive in high dimensions. While recent works offer first-order methods for unconstrained bilevel problems, the constrained setting remains relatively underexplored. We present first-order linearly constrained optimization methods with finite-time hypergradient stationarity guarantees. For linear equality constraints, we attain $epsilon$-stationarity in $widetilde{O}(epsilon^{-2})$ gradient oracle calls, which is nearly-optimal. For linear inequality constraints, we attain $(delta,epsilon)$-Goldstein stationarity in $widetilde{O}(d{delta^{-1} epsilon^{-3}})$ gradient oracle calls, where $d$ is the upper-level dimension. Finally, we obtain for the linear inequality setting dimension-free rates of $widetilde{O}({delta^{-1} epsilon^{-4}})$ oracle complexity under the additional assumption of oracle access to the optimal dual variable. Along the way, we develop new nonsmooth nonconvex optimization methods with inexact oracles. We verify these guarantees with preliminary numerical experiments.

6/19/2024

cs.LG