Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

2310.14085

Published 4/1/2024 by Michael I. Jordan, Tianyi Lin, Zhengyuan Zhou

✨

Abstract

Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of $Theta(log T)$ for strongly convex cost functions; and (2) in the multi-agent setting of strongly monotone games, with each agent employing OGD, we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of $Theta(frac{1}{T})$. While these finite-time guarantees highlight its merits, OGD has the drawback that it requires knowing the strong convexity/monotonicity parameters. In this paper, we design a fully adaptive OGD algorithm, textsf{AdaOGD}, that does not require a priori knowledge of these parameters. In the single-agent setting, our algorithm achieves $O(log^2(T))$ regret under strong convexity, which is optimal up to a log factor. Further, if each agent employs textsf{AdaOGD} in strongly monotone games, the joint action converges in a last-iterate sense to a unique Nash equilibrium at a rate of $O(frac{log^3 T}{T})$, again optimal up to log factors. We illustrate our algorithms in a learning version of the classical newsvendor problem, where due to lost sales, only (noisy) gradient feedback can be observed. Our results immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multi-retailer settings. We also extend our results to the more general setting of exp-concave cost functions and games, using the online Newton step (ONS) algorithm.

Create account to get full access

Online gradient descent (OGD) is known to achieve optimal regret and convergence rates in strongly convex single-agent settings and strongly monotone multi-agent games, respectively. However, OGD requires prior knowledge of the strong convexity or monotonicity parameters. The paper introduces AdaOGD, a fully adaptive OGD algorithm that does not require a priori knowledge of these parameters. In the single-agent setting, AdaOGD achieves a near-optimal regret of O(log^2(T)) under strong convexity. When employed by each agent in strongly monotone games, AdaOGD ensures last-iterate convergence to a unique Nash equilibrium at a near-optimal rate of O(log^3(T)/T). The algorithms are demonstrated in a learning version of the newsvendor problem, where they provide the first feasible and near-optimal solutions for both single-retailer and multi-retailer settings. The results are also extended to the more general setting of exp-concave cost functions and games using the online Newton step (ONS) algorithm.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

✅

Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

Wenjia Ba, Tianyi Lin, Jiawei Zhang, Zhengyuan Zhou

We consider online no-regret learning in unknown games with bandit feedback, where each player can only observe its reward at each time -- determined by all players' current joint action -- rather than its gradient. We focus on the class of textit{smooth and strongly monotone} games and study optimal no-regret learning therein. Leveraging self-concordant barrier functions, we first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $tilde{Theta}(nsqrt{T})$ under smooth and strongly concave reward functions ($n geq 1$ is the problem dimension). We then show that if each player applies this no-regret learning algorithm in strongly monotone games, the joint action converges in the textit{last iterate} to the unique Nash equilibrium at a rate of $tilde{Theta}(nT^{-1/2})$. Prior to our work, the best-known convergence rate in the same class of games is $tilde{O}(n^{2/3}T^{-1/3})$ (achieved by a different algorithm), thus leaving open the problem of optimal no-regret learning algorithms (since the known lower bound is $Omega(nT^{-1/2})$). Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning by identifying the first doubly optimal bandit learning algorithm, in that it achieves (up to log factors) both optimal regret in the single-agent learning and optimal last-iterate convergence rate in the multi-agent learning. We also present preliminary numerical results on several application problems to demonstrate the efficacy of our algorithm in terms of iteration count.

4/1/2024

cs.LG cs.GT

📊

Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach

Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou

In this paper, we propose an online convex optimization approach with two different levels of adaptivity. On a higher level, our approach is agnostic to the unknown types and curvatures of the online functions, while at a lower level, it can exploit the unknown niceness of the environments and attain problem-dependent guarantees. Specifically, we obtain $mathcal{O}(log V_T)$, $mathcal{O}(d log V_T)$ and $hat{mathcal{O}}(sqrt{V_T})$ regret bounds for strongly convex, exp-concave and convex loss functions, respectively, where $d$ is the dimension, $V_T$ denotes problem-dependent gradient variations and the $hat{mathcal{O}}(cdot)$-notation omits $log V_T$ factors. Our result not only safeguards the worst-case guarantees but also directly implies the small-loss bounds in analysis. Moreover, when applied to adversarial/stochastic convex optimization and game theory problems, our result enhances the existing universal guarantees. Our approach is based on a multi-layer online ensemble framework incorporating novel ingredients, including a carefully designed optimism for unifying diverse function types and cascaded corrections for algorithmic stability. Notably, despite its multi-layer structure, our algorithm necessitates only one gradient query per round, making it favorable when the gradient evaluation is time-consuming. This is facilitated by a novel regret decomposition equipped with carefully designed surrogate losses.

4/17/2024

cs.LG stat.ML

🛠️

Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization

Peng Zhao, Yu-Jie Zhang, Lijun Zhang, Zhi-Hua Zhou

We investigate online convex optimization in non-stationary environments and choose dynamic regret as the performance measure, defined as the difference between cumulative loss incurred by the online algorithm and that of any feasible comparator sequence. Let $T$ be the time horizon and $P_T$ be the path length that essentially reflects the non-stationarity of environments, the state-of-the-art dynamic regret is $mathcal{O}(sqrt{T(1+P_T)})$. Although this bound is proved to be minimax optimal for convex functions, in this paper, we demonstrate that it is possible to further enhance the guarantee for some easy problem instances, particularly when online functions are smooth. Specifically, we introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities: the variation in gradients of loss functions, the cumulative loss of the comparator sequence, and the minimum of these two terms. These quantities are at most $mathcal{O}(T)$ while could be much smaller in benign environments. Therefore, our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and meanwhile safeguard the same rate in the worst case. Notably, our proposed algorithms can achieve favorable dynamic regret with only one gradient per iteration, sharing the same gradient query complexity as the static regret minimization methods. To accomplish this, we introduce the collaborative online ensemble framework. The proposed framework employs a two-layer online ensemble to handle non-stationarity, and uses optimistic online learning and further introduces crucial correction terms to enable effective collaboration within the meta-base two layers, thereby attaining adaptivity. We believe the framework can be useful for broader problems.

4/9/2024

cs.LG

Online Dynamic Submodular Optimization

Antoine Lesage-Landry, Julien Pallage

We propose new algorithms with provable performance for online binary optimization subject to general constraints and in dynamic settings. We consider the subset of problems in which the objective function is submodular. We propose the online submodular greedy algorithm (OSGA) which solves to optimality an approximation of the previous round loss function to avoid the NP-hardness of the original problem. We extend OSGA to a generic approximation function. We show that OSGA has a dynamic regret bound similar to the tightest bounds in online convex optimization with respect to the time horizon and the cumulative round optimum variation. For instances where no approximation exists or a computationally simpler implementation is desired, we design the online submodular projected gradient descent (OSPGD) by leveraging the Lova'sz extension. We obtain a regret bound that is akin to the conventional online gradient descent (OGD). Finally, we numerically test our algorithms in two power system applications: fast-timescale demand response and real-time distribution network reconfiguration.

5/3/2024

cs.LG