Suboptimality analysis of receding horizon quadratic control with unknown linear systems and its applications in learning-based control

2301.07876

Published 4/10/2024 by Shengling Shi, Anastasios Tsiamis, Bart De Schutter

🌿

Abstract

In this work, we aim to analyze how the trade-off between the modeling error, the terminal value function error, and the prediction horizon affects the performance of a nominal receding-horizon linear quadratic (LQ) controller. By developing a novel perturbation result of the Riccati difference equation, a novel performance upper bound is obtained and suggests that for many cases, the prediction horizon can be either one or infinity to improve the control performance, depending on the relative difference between the modeling error and the terminal value function error. The result also shows that when an infinite horizon is desired, a finite prediction horizon that is larger than the controllability index can be sufficient for achieving a near-optimal performance, revealing a close relation between the prediction horizon and controllability. The obtained suboptimality performance bound is also applied to provide novel sample complexity and regret guarantees for nominal receding-horizon LQ controllers in a learning-based setting.

Create account to get full access

Overview

This paper analyzes the trade-off between modeling error, terminal value function error, and prediction horizon in the performance of a nominal receding-horizon linear quadratic (LQ) controller.
The researchers develop a novel perturbation result of the Riccati difference equation, which provides a new performance upper bound.
The result suggests that for many cases, using a prediction horizon of either one or infinity can improve control performance, depending on the relative difference between modeling error and terminal value function error.
The paper also shows that when an infinite horizon is desired, a finite prediction horizon larger than the controllability index can be sufficient for near-optimal performance, revealing a connection between prediction horizon and controllability.
The performance bound is applied to provide novel sample complexity and regret guarantees for nominal receding-horizon LQ controllers in a learning-based setting.

Plain English Explanation

The paper looks at the factors that affect the performance of a type of controller called a nominal receding-horizon linear quadratic (LQ) controller. This controller is used to control systems, like a robot or a factory process, by repeatedly making predictions about the future and adjusting the control inputs accordingly.

The key factors the paper examines are:

Modeling error: How well the mathematical model of the system matches reality.
Terminal value function error: How accurately the controller can predict the "end value" of the system.
Prediction horizon: How far into the future the controller tries to predict.

The researchers developed a new mathematical analysis technique that lets them find an upper limit on how well the controller can perform, given these factors. The analysis suggests that in many cases, using either a very short (one-step) prediction horizon or an infinitely long prediction horizon can lead to the best control performance, depending on the relative sizes of the modeling error and terminal value function error.

The paper also shows that when an infinitely long prediction horizon is desired, a finite prediction horizon larger than a certain "controllability" measure can be enough to achieve near-optimal performance. This reveals an interesting connection between the prediction horizon and how "controllable" the system is.

Finally, the researchers apply their performance bound to provide guarantees on how well the controller can learn to control the system, based on observed data, in a "learning-based" setting.

Technical Explanation

The core of this work is a novel perturbation analysis of the Riccati difference equation, which is a key component of the nominal receding-horizon LQ control problem. This analysis yields a new upper bound on the suboptimality of the controller's performance.

The key insight from this analysis is that the performance of the nominal receding-horizon LQ controller is fundamentally governed by a trade-off between the modeling error, the terminal value function error, and the prediction horizon. Specifically, the researchers show that for many cases, using either a prediction horizon of one (the shortest possible) or infinity (the longest possible) can lead to the best control performance, depending on the relative difference between the modeling error and the terminal value function error.

Additionally, the paper demonstrates that when an infinite prediction horizon is desired, a finite prediction horizon larger than the system's controllability index can be sufficient to achieve near-optimal performance. This reveals an interesting connection between the prediction horizon and the underlying system's controllability properties.

The researchers then apply their novel performance bound to provide sample complexity and regret guarantees for nominal receding-horizon LQ controllers in a learning-based setting, where the system model must be learned from data. This extends the applicability of their analysis to the important practical case of learning-based control.

Critical Analysis

The paper provides a rigorous and insightful analysis of the factors affecting the performance of nominal receding-horizon LQ controllers. The novel perturbation result and the resulting performance bound represent a significant theoretical contribution to the understanding of these types of controllers.

One potential limitation is that the analysis assumes the system is linear and the cost function is quadratic, which may not always be the case in practical applications. It would be valuable to explore whether the key insights from this work extend to more general nonlinear systems and cost functions.

Additionally, the paper does not consider the effects of disturbances or uncertainties beyond the modeling error. In real-world settings, controllers must often be robust to various forms of uncertainty, and it would be interesting to see how the analysis could be extended to incorporate these considerations.

Finally, while the learning-based control guarantees are valuable, the paper does not provide any empirical validation of the performance of nominal receding-horizon LQ controllers in practical learning-based settings. Experimental studies demonstrating the applicability and benefits of the theoretical results would further strengthen the impact of this work.

Overall, this paper makes an important contribution to the understanding of nominal receding-horizon LQ controllers and points to promising directions for future research in this area. The critical analysis suggests opportunities to expand the scope and practical relevance of the findings.

Conclusion

This paper presents a novel theoretical analysis of the factors affecting the performance of nominal receding-horizon linear quadratic (LQ) controllers. The key insights are:

The control performance is fundamentally governed by a trade-off between modeling error, terminal value function error, and prediction horizon.
In many cases, using either a very short (one-step) or infinitely long prediction horizon can lead to the best control performance, depending on the relative sizes of the modeling and terminal value function errors.
When an infinite prediction horizon is desired, a finite horizon larger than the system's controllability index can be sufficient for near-optimal performance, revealing a connection between prediction horizon and controllability.
The performance bound developed in the paper can be used to provide sample complexity and regret guarantees for nominal receding-horizon LQ controllers in a learning-based setting.

These findings contribute to a deeper understanding of the theoretical limits and design principles for this important class of controllers, with potential implications for their practical application in a variety of domains, from robotics to industrial process control.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

↗️

Regret Lower Bounds for Learning Linear Quadratic Gaussian Systems

Ingvar Ziemann, Henrik Sandberg

TWe establish regret lower bounds for adaptively controlling an unknown linear Gaussian system with quadratic costs. We combine ideas from experiment design, estimation theory and a perturbation bound of certain information matrices to derive regret lower bounds exhibiting scaling on the order of magnitude $sqrt{T}$ in the time horizon $T$. Our bounds accurately capture the role of control-theoretic parameters and we are able to show that systems that are hard to control are also hard to learn to control; when instantiated to state feedback systems we recover the dimensional dependency of earlier work but with improved scaling with system-theoretic constants such as system costs and Gramians. Furthermore, we extend our results to a class of partially observed systems and demonstrate that systems with poor observability structure also are hard to learn to control.

6/13/2024

cs.LG stat.ML

📈

Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification

Bruce D. Lee, Anders Rantzer, Nikolai Matni

The strategy of pre-training a large model on a diverse dataset, then fine-tuning for a particular application has yielded impressive results in computer vision, natural language processing, and robotic control. This strategy has vast potential in adaptive control, where it is necessary to rapidly adapt to changing conditions with limited data. Toward concretely understanding the benefit of pre-training for adaptive control, we study the adaptive linear quadratic control problem in the setting where the learner has prior knowledge of a collection of basis matrices for the dynamics. This basis is misspecified in the sense that it cannot perfectly represent the dynamics of the underlying data generating process. We propose an algorithm that uses this prior knowledge, and prove upper bounds on the expected regret after $T$ interactions with the system. In the regime where $T$ is small, the upper bounds are dominated by a term that scales with either $texttt{poly}(log T)$ or $sqrt{T}$, depending on the prior knowledge available to the learner. When $T$ is large, the regret is dominated by a term that grows with $delta T$, where $delta$ quantifies the level of misspecification. This linear term arises due to the inability to perfectly estimate the underlying dynamics using the misspecified basis, and is therefore unavoidable unless the basis matrices are also adapted online. However, it only dominates for large $T$, after the sublinear terms arising due to the error in estimating the weights for the basis matrices become negligible. We provide simulations that validate our analysis. Our simulations also show that offline data from a collection of related systems can be used as part of a pre-training stage to estimate a misspecified dynamics basis, which is in turn used by our adaptive controller.

5/24/2024

eess.SY cs.LG cs.SY

🔍

Fully Adaptive Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems

Jafar Abbaszadeh Chekan, Cedric Langbort

The first algorithm for the Linear Quadratic (LQ) control problem with an unknown system model, featuring a regret of $mathcal{O}(sqrt{T})$, was introduced by Abbasi-Yadkori and Szepesv'ari (2011). Recognizing the computational complexity of this algorithm, subsequent efforts (see Cohen et al. (2019), Mania et al. (2019), Faradonbeh et al. (2020a), and Kargin et al.(2022)) have been dedicated to proposing algorithms that are computationally tractable while preserving this order of regret. Although successful, the existing works in the literature lack a fully adaptive exploration-exploitation trade-off adjustment and require a user-defined value, which can lead to overall regret bound growth with some factors. In this work, noticing this gap, we propose the first fully adaptive algorithm that controls the number of policy updates (i.e., tunes the exploration-exploitation trade-off) and optimizes the upper-bound of regret adaptively. Our proposed algorithm builds on the SDP-based approach of Cohen et al. (2019) and relaxes its need for a horizon-dependant warm-up phase by appropriately tuning the regularization parameter and adding an adaptive input perturbation. We further show that through careful exploration-exploitation trade-off adjustment there is no need to commit to the widely-used notion of strong sequential stability, which is restrictive and can introduce complexities in initialization.

6/13/2024

stat.ML cs.LG cs.SY eess.SY

Predictive Linear Online Tracking for Unknown Targets

Anastasios Tsiamis, Aren Karapetyan, Yueshan Li, Efe C. Balta, John Lygeros

In this paper, we study the problem of online tracking in linear control systems, where the objective is to follow a moving target. Unlike classical tracking control, the target is unknown, non-stationary, and its state is revealed sequentially, thus, fitting the framework of online non-stochastic control. We consider the case of quadratic costs and propose a new algorithm, called predictive linear online tracking (PLOT). The algorithm uses recursive least squares with exponential forgetting to learn a time-varying dynamic model of the target. The learned model is used in the optimal policy under the framework of receding horizon control. We show the dynamic regret of PLOT scales with $mathcal{O}(sqrt{TV_T})$, where $V_T$ is the total variation of the target dynamics and $T$ is the time horizon. Unlike prior work, our theoretical results hold for non-stationary targets. We implement PLOT on a real quadrotor and provide open-source software, thus, showcasing one of the first successful applications of online control methods on real hardware.

6/14/2024

eess.SY cs.LG cs.SY