Adaptive Online Non-stochastic Control

2310.02261

Published 4/24/2024 by Naram Mhaisen, George Iosifidis

Abstract

We tackle the problem of Non-stochastic Control (NSC) with the aim of obtaining algorithms whose policy regret is proportional to the difficulty of the controlled environment. Namely, we tailor the Follow The Regularized Leader (FTRL) framework to dynamical systems by using regularizers that are proportional to the actual witnessed costs. The main challenge arises from using the proposed adaptive regularizers in the presence of a state, or equivalently, a memory, which couples the effect of the online decisions and requires new tools for bounding the regret. Via new analysis techniques for NSC and FTRL integration, we obtain novel disturbance action controllers (DAC) with sub-linear data adaptive policy regret bounds that shrink when the trajectory of costs has small gradients, while staying sub-linear even in the worst case.

Create account to get full access

Overview

This paper introduces a new adaptive online non-stochastic control algorithm that can handle unknown, time-varying dynamics and disturbances.
The algorithm achieves strong theoretical guarantees on its dynamic regret, which measures the performance compared to the best fixed control policy in hindsight.
The paper builds on and extends several previous works on online optimization and adaptive control, including Optimistic Online Non-stochastic Control via FTRL, Learning Decentralized Linear Quadratic Regulator, and Adaptivity to Non-Stationarity in Problem-Dependent Dynamic Regret.

Plain English Explanation

The paper presents a new algorithm for controlling a system, like a robot or a manufacturing process, in an uncertain environment. Traditional control techniques often assume the system dynamics are known and static, but in many real-world situations, the dynamics can change over time in unpredictable ways.

The key idea of the new algorithm is to adaptively adjust the control strategy as the system's behavior changes, without requiring any prior knowledge about how the dynamics might evolve. By doing this, the algorithm can achieve strong performance guarantees, measured by a concept called "dynamic regret," which compares the algorithm's performance to the best fixed control policy that could have been chosen with perfect hindsight.

The paper builds on several previous works that have tackled similar problems in the fields of online optimization and adaptive control. The new algorithm synthesizes and extends these earlier ideas to create a more powerful and flexible control framework that can handle a wide range of challenging real-world control problems.

Technical Explanation

The paper introduces a new online non-stochastic control algorithm that can adapt to unknown, time-varying system dynamics and disturbances. The algorithm achieves problem-dependent dynamic regret bounds, meaning its performance compared to the best fixed control policy in hindsight scales with the complexity of the underlying control problem.

The key technical contributions include:

A new control update rule that combines optimistic online learning with dual averaging, inspired by the Optimistic Online Non-stochastic Control via FTRL and Learning Decentralized Linear Quadratic Regulator papers.
A dynamic regret analysis that leverages the time-varying Lyapunov function approach from Adaptivity to Non-Stationarity in Problem-Dependent Dynamic Regret and Regret of Recursive Methods in Discrete-Time Adaptive Control.
Extensions to handle output feedback and distributionally robust settings, building on ideas from Distributionally Robust Policy with Lyapunov Certificate Learning.

The technical analysis shows that the proposed algorithm can achieve problem-dependent dynamic regret bounds that adapt to the underlying complexity of the control problem, without requiring any prior knowledge about the system dynamics or disturbances.

Critical Analysis

The paper makes several important contributions to the field of adaptive online control, but it also has some limitations and open questions:

Limitations:

The analysis assumes access to full state information, which may not be realistic in many practical control problems.
The algorithm relies on certain assumptions about the system dynamics and disturbances, such as bounded gradients and Hölder continuity, which may not hold in all real-world scenarios.
The paper does not provide extensive numerical experiments to validate the algorithm's performance in realistic control settings.

Open Questions:

How well does the algorithm perform compared to other state-of-the-art adaptive control methods, both theoretically and empirically?
Can the approach be extended to handle partially observed or distributionally robust settings more generally?
What are the computational and implementation challenges in deploying the algorithm in real-world control applications?

Overall, the paper presents a promising new adaptive control algorithm with strong theoretical guarantees, but more research is needed to fully understand its practical implications and limitations.

Conclusion

This paper introduces a new adaptive online non-stochastic control algorithm that can handle unknown, time-varying system dynamics and disturbances. The key innovation is the use of optimistic online learning and dual averaging to achieve strong problem-dependent dynamic regret bounds, without requiring any prior knowledge about the control problem.

The work builds on and extends several influential previous studies in online optimization and adaptive control, synthesizing these ideas into a more powerful and flexible control framework. While the paper has some limitations and open questions, it represents an important step forward in developing robust and adaptive control algorithms for real-world applications with uncertain and time-varying dynamics.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Optimistic Online Non-stochastic Control via FTRL

Naram Mhaisen, George Iosifidis

This paper brings the concept of optimism to the new and promising framework of online Non-stochastic Control (NSC). Namely, we study how can NSC benefit from a prediction oracle of unknown quality responsible for forecasting future costs. The posed problem is first reduced to an optimistic learning with delayed feedback problem, which is handled through the Optimistic Follow the Regularized Leader (OFTRL) algorithmic family. This reduction enables the design of OptFTRL-C, the first Disturbance Action Controller (DAC) with optimistic policy regret bounds. These new bounds are commensurate with the oracle's accuracy, ranging from $mathcal{O}(1)$ for perfect predictions to the order-optimal $mathcal{O}(sqrt{T})$ even when all predictions fail. By addressing the challenge of incorporating untrusted predictions into control systems, our work contributes to the advancement of the NSC framework and paves the way towards effective and robust learning-based controllers.

4/5/2024

cs.LG

🛠️

Online Stackelberg Optimization via Nonlinear Control

William Brown, Christos Papadimitriou, Tim Roughgarden

In repeated interaction problems with adaptive agents, our objective often requires anticipating and optimizing over the space of possible agent responses. We show that many problems of this form can be cast as instances of online (nonlinear) control which satisfy textit{local controllability}, with convex losses over a bounded state space which encodes agent behavior, and we introduce a unified algorithmic framework for tractable regret minimization in such cases. When the instance dynamics are known but otherwise arbitrary, we obtain oracle-efficient $O(sqrt{T})$ regret by reduction to online convex optimization, which can be made computationally efficient if dynamics are locally textit{action-linear}. In the presence of adversarial disturbances to the state, we give tight bounds in terms of either the cumulative or per-round disturbance magnitude (for textit{strongly} or textit{weakly} locally controllable dynamics, respectively). Additionally, we give sublinear regret results for the cases of unknown locally action-linear dynamics as well as for the bandit feedback setting. Finally, we demonstrate applications of our framework to well-studied problems including performative prediction, recommendations for adaptive agents, adaptive pricing of real-valued goods, and repeated gameplay against no-regret learners, directly yielding extensions beyond prior results in each case.

6/28/2024

cs.LG cs.GT

❗

Discounted Adaptive Online Learning: Towards Better Regularization

Zhiyu Zhang, David Bombara, Heng Yang

We study online learning in adversarial nonstationary environments. Since the future can be very different from the past, a critical challenge is to gracefully forget the history while new data comes in. To formalize this intuition, we revisit the discounted regret in online convex optimization, and propose an adaptive (i.e., instance optimal), FTRL-based algorithm that improves the widespread non-adaptive baseline -- gradient descent with a constant learning rate. From a practical perspective, this refines the classical idea of regularization in lifelong learning: we show that designing good regularizers can be guided by the principled theory of adaptive online optimization. Complementing this result, we also consider the (Gibbs and Cand`es, 2021)-style online conformal prediction problem, where the goal is to sequentially predict the uncertainty sets of a black-box machine learning model. We show that the FTRL nature of our algorithm can simplify the conventional gradient-descent-based analysis, leading to instance-dependent performance guarantees.

6/21/2024

cs.LG stat.ML

Adaptive Actor-Critic Based Optimal Regulation for Drift-Free Uncertain Nonlinear Systems

Ashwin P. Dani, Shubhendu Bhasin

In this paper, a continuous-time adaptive actor-critic reinforcement learning (RL) controller is developed for drift-free nonlinear systems. Practical examples of such systems are image-based visual servoing (IBVS) and wheeled mobile robots (WMR), where the system dynamics includes a parametric uncertainty in the control effectiveness matrix with no drift term. The uncertainty in the input term poses a challenge for developing a continuous-time RL controller using existing methods. In this paper, an actor-critic or synchronous policy iteration (PI)-based RL controller is presented with a concurrent learning (CL)-based parameter update law for estimating the unknown parameters of the control effectiveness matrix. An infinite-horizon value function minimization objective is achieved by regulating the current states to the desired with near-optimal control efforts. The proposed controller guarantees closed-loop stability and simulation results validate the proposed theory using IBVS and WMR examples.

6/14/2024

eess.SY cs.RO cs.SY