Constrained Online Two-stage Stochastic Optimization: Near Optimal Algorithms via Adversarial Learning

2302.00997

Published 5/21/2024 by Jiashuo Jiang

🗣️

Abstract

We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of $T$ periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guaranteeing that the long-term average second-stage decision belongs to a set. We develop online algorithms for the online two-stage problem from adversarial learning algorithms. Also, the regret bound of our algorithm cam be reduced to the regret bound of embedded adversarial learning algorithms. Based on our framework, we obtain new results under various settings. When the model parameter at each period is drawn from identical distributions, we derive textit{state-of-art} $O(sqrt{T})$ regret that improves previous bounds under special cases. Our algorithm is also robust to adversarial corruptions of model parameter realizations. When the model parameters are drawn from unknown non-stationary distributions and we are given machine-learned predictions of the distributions, we develop a new algorithm from our framework with a regret $O(W_T+sqrt{T})$, where $W_T$ measures the total inaccuracy of the machine-learned predictions.

Create account to get full access

Overview

This paper addresses an online two-stage stochastic optimization problem with long-term constraints over a finite horizon.
The authors develop online algorithms based on adversarial learning techniques and provide regret bounds for their algorithms.
They consider several settings, including when the model parameters are drawn from identical or non-stationary distributions, and when the parameters are subject to adversarial corruption.
The authors derive state-of-the-art regret bounds that improve upon previous results in some cases.

Plain English Explanation

The paper looks at a type of optimization problem where decisions need to be made in two stages over time. In the first stage, an initial action is taken. Then, after observing some information about the problem, a second-stage action is chosen from a set of feasible options. The goal is to minimize the cumulative objective value while ensuring the long-term average of the second-stage decisions belongs to a specified set.

The authors develop online algorithms that can solve this problem effectively. These algorithms are based on techniques from adversarial learning, which allow the algorithms to adapt to changing or unknown conditions.

Key Findings:

When the model parameters are drawn from identical distributions, the authors' algorithm achieves a state-of-the-art regret bound of O(√T), improving upon previous results.
The algorithm is also robust to adversarial corruptions of the model parameter realizations.
For the case where the model parameters come from unknown non-stationary distributions, the authors develop a new algorithm with a regret bound of O(W_T + √T), where W_T measures the inaccuracy of machine-learned predictions of the distributions.

Technical Explanation

The paper considers an online two-stage stochastic optimization problem over a finite horizon of T periods. At each period:

The decision-maker takes a first-stage action.
A model parameter realization is observed.
A second-stage action is chosen from a feasible set that depends on the first-stage decision and the model parameter.

The goal is to minimize the cumulative objective value while ensuring the long-term average of the second-stage decisions belongs to a specified set.

The authors develop online algorithms for this problem using techniques from adversarial learning. They show that the regret bounds of their algorithms can be reduced to the regret bounds of the underlying adversarial learning algorithms.

Under various settings, the authors obtain new results:

When the model parameters are drawn from identical distributions, they derive a state-of-the-art O(√T) regret bound, improving upon previous results in special cases.
The algorithm is robust to adversarial corruptions of the model parameter realizations.
For non-stationary model parameter distributions with machine-learned predictions, they develop a new algorithm with a regret bound of O(W_T + √T), where W_T measures the total inaccuracy of the predictions.

Critical Analysis

The paper presents a thorough and rigorous analysis of the online two-stage stochastic optimization problem with long-term constraints. The authors' use of adversarial learning techniques to develop their algorithms is a clever and effective approach.

One potential limitation is that the paper focuses on the theoretical analysis and does not include any empirical validation of the proposed algorithms. It would be interesting to see how the algorithms perform on real-world datasets or simulations.

Additionally, the paper does not address the computational complexity of the proposed algorithms. As the problem setting becomes more complex, the computational requirements may become a practical concern.

Furthermore, the authors do not discuss the potential challenges in obtaining accurate machine-learned predictions of the model parameter distributions, which could impact the performance of their algorithm in the non-stationary setting.

Overall, the paper makes a significant contribution to the literature on online optimization with long-term constraints and stochastic optimization in cyber-physical and robotic systems. The authors' theoretical analysis and new algorithms provide a solid foundation for further research in this area.

Conclusion

This paper addresses an important online two-stage stochastic optimization problem with long-term constraints. The authors develop novel algorithms based on adversarial learning techniques and derive state-of-the-art regret bounds for various settings, including when the model parameters are subject to adversarial corruption or drawn from non-stationary distributions.

The paper's contributions have the potential to significantly advance the field of online optimization and decision-making, with applications in areas such as resource allocation, planning, and control of cyber-physical and robotic systems. While the theoretical analysis is thorough, future empirical validation and exploration of the computational aspects could further enhance the practical relevance of the proposed approaches.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🛠️

A Generalized Approach to Online Convex Optimization

Mohammad Pedramfar, Vaneet Aggarwal

In this paper, we analyze the problem of online convex optimization in different settings. We show that any algorithm for online linear optimization with fully adaptive adversaries is an algorithm for online convex optimization. We also show that any such algorithm that requires full-information feedback may be transformed to an algorithm with semi-bandit feedback with comparable regret bound. We further show that algorithms that are designed for fully adaptive adversaries using deterministic semi-bandit feedback can obtain similar bounds using only stochastic semi-bandit feedback when facing oblivious adversaries. We use this to describe general meta-algorithms to convert first order algorithms to zeroth order algorithms with comparable regret bounds. Our framework allows us to analyze online optimization in various settings, such full-information feedback, bandit feedback, stochastic regret, adversarial regret and various forms of non-stationary regret.

5/15/2024

cs.LG stat.ML

🛠️

Online Long-run Constrained Optimization

Shijie Pan, Wenjie Huang

A novel Follow-the-Perturbed-Leader type algorithm is proposed and analyzed for solving general long-term constrained optimization problems in online manner, where the objective and constraints are arbitrarily generated and not necessarily convex. In each period, random linear perturbation and strongly concave perturbation are incorporated in primal and dual directions, respectively, to the offline oracle, and a global minimax point is searched as the solution. Based on a proposed expected static cumulative regret, we derive the first sublinear $O(T^{8/9})$ regret complexity for this class of problems. The proposed algorithm is applied to tackle a long-term (extreme value) constrained river pollutant source identification problem, validate the theoretical results and exhibit superior performance compared to existing methods.

5/14/2024

cs.LG

🛠️

Online Stackelberg Optimization via Nonlinear Control

William Brown, Christos Papadimitriou, Tim Roughgarden

In repeated interaction problems with adaptive agents, our objective often requires anticipating and optimizing over the space of possible agent responses. We show that many problems of this form can be cast as instances of online (nonlinear) control which satisfy textit{local controllability}, with convex losses over a bounded state space which encodes agent behavior, and we introduce a unified algorithmic framework for tractable regret minimization in such cases. When the instance dynamics are known but otherwise arbitrary, we obtain oracle-efficient $O(sqrt{T})$ regret by reduction to online convex optimization, which can be made computationally efficient if dynamics are locally textit{action-linear}. In the presence of adversarial disturbances to the state, we give tight bounds in terms of either the cumulative or per-round disturbance magnitude (for textit{strongly} or textit{weakly} locally controllable dynamics, respectively). Additionally, we give sublinear regret results for the cases of unknown locally action-linear dynamics as well as for the bandit feedback setting. Finally, we demonstrate applications of our framework to well-studied problems including performative prediction, recommendations for adaptive agents, adaptive pricing of real-valued goods, and repeated gameplay against no-regret learners, directly yielding extensions beyond prior results in each case.

6/28/2024

cs.LG cs.GT

🧠

Non-stochastic Bandits With Evolving Observations

Yogev Bar-On, Yishay Mansour

We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the observed loss is arbitrary and may not correlate with the true loss incurred, with each round updating previous observations adversarially. We propose regret minimization algorithms for both the full-information and bandit settings, with regret bounds quantified by the average feedback accuracy relative to the true loss. Our algorithms match the known regret bounds across many special cases, while also introducing previously unknown bounds.

5/28/2024

cs.LG