Actor-Critic Model Predictive Control

2306.09852

Published 4/15/2024 by Angel Romero, Yunlong Song, Davide Scaramuzza

Abstract

An open research question in robotics is how to combine the benefits of model-free reinforcement learning (RL) - known for its strong task performance and flexibility in optimizing general reward formulations - with the robustness and online replanning capabilities of model predictive control (MPC). This paper provides an answer by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an actor-critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in both simulation and the real world with a quadcopter platform across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out of distribution behaviour.

Create account to get full access

Overview

Introduces a novel approach called Actor-Critic Model Predictive Control (AC-MPC) that combines the benefits of Model Predictive Control (MPC) and Reinforcement Learning (RL)
MPC requires extensive task-specific engineering and tuning, while RL minimizes this effort but needs large amounts of data and lacks interpretability
AC-MPC aims to address the limitations of both approaches by leveraging the strengths of each

Plain English Explanation

MPC is a powerful technique for controlling dynamic systems, but it often requires a lot of manual effort to set up and fine-tune for a specific task. On the other hand, RL can automate this process, but it needs a lot of data to train and the resulting models can be difficult to understand.

The authors of this paper propose a hybrid approach called Actor-Critic Model Predictive Control (AC-MPC). The key idea is to combine the benefits of MPC and RL to create a system that is easier to set up and tune than traditional MPC, while still providing interpretability and sample efficiency compared to pure RL.

The AC-MPC approach uses an actor-critic RL architecture, where the actor network learns to predict the optimal control actions, and the critic network learns to estimate the value function. This allows the system to learn from experience and adapt to changes in the environment, while still maintaining the interpretability and stability of an MPC-based approach.

Technical Explanation

The key elements of the AC-MPC approach are:

Actor-Critic Architecture: The system consists of two neural networks - an actor network that predicts the optimal control actions, and a critic network that estimates the value function.
Model Predictive Control: The actor network is trained to mimic the behavior of an MPC controller, which allows the system to leverage the interpretability and stability of MPC.
Reinforcement Learning: The actor and critic networks are trained using RL, which allows the system to adapt to changes in the environment and learn from experience.

The authors evaluate the AC-MPC approach on several benchmark tasks, including a simulated quadrotor control problem and a real-world wheeled robot navigation task. The results show that AC-MPC outperforms both traditional MPC and pure RL approaches in terms of sample efficiency, interpretability, and adaptability to changing environments.

Critical Analysis

The authors acknowledge several limitations and areas for future research:

The performance of AC-MPC is still dependent on the quality of the MPC controller used to initialize the actor network. Improving the robustness of the system to suboptimal MPC controllers is an important area for further research.
The AC-MPC approach assumes that the system dynamics are known or can be accurately approximated. Extending the approach to handle model uncertainty or unknown dynamics is another important challenge.
The authors note that the interpretability of the AC-MPC system is still an open question and requires further investigation, as the neural network-based components can be difficult to analyze.

Overall, the AC-MPC approach presents an interesting and promising direction for combining the strengths of MPC and RL for control and decision-making tasks. The authors have identified several important areas for further research to address the current limitations and expand the applicability of the approach.

Conclusion

The Actor-Critic Model Predictive Control (AC-MPC) approach proposed in this paper represents a novel and promising direction for combining the benefits of MPC and RL. By leveraging the strengths of both approaches, AC-MPC can provide a more efficient and adaptable control system that is also interpretable and stable.

The authors have demonstrated the effectiveness of AC-MPC on several benchmark tasks, and have identified key areas for future research to address the current limitations of the approach. As the field of control and decision-making continues to evolve, hybrid approaches like AC-MPC may play an increasingly important role in bridging the gap between traditional control methods and data-driven, learning-based approaches.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

AC4MPC: Actor-Critic Reinforcement Learning for Nonlinear Model Predictive Control

Rudolf Reiter, Andrea Ghezzi, Katrin Baumgartner, Jasper Hoffmann, Robert D. McAllister, Moritz Diehl

Ac{MPC} and ac{RL} are two powerful control strategies with, arguably, complementary advantages. In this work, we show how actor-critic ac{RL} techniques can be leveraged to improve the performance of ac{MPC}. The ac{RL} critic is used as an approximation of the optimal value function, and an actor roll-out provides an initial guess for primal variables of the ac{MPC}. A parallel control architecture is proposed where each ac{MPC} instance is solved twice for different initial guesses. Besides the actor roll-out initialization, a shifted initialization from the previous solution is used. Thereafter, the actor and the critic are again used to approximately evaluate the infinite horizon cost of these trajectories. The control actions from the lowest-cost trajectory are applied to the system at each time step. We establish that the proposed algorithm is guaranteed to outperform the original ac{RL} policy plus an error term that depends on the accuracy of the critic and decays with the horizon length of the ac{MPC} formulation. Moreover, we do not require globally optimal solutions for these guarantees to hold. The approach is demonstrated on an illustrative toy example and an ac{AD} overtaking scenario.

6/7/2024

eess.SY cs.AI cs.SY

Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming

Dimitri P. Bertsekas

In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in synergy through the powerful mechanism of Newton's method. We call them the off-line training and the on-line play algorithms. The names are borrowed from some of the major successes of RL involving games; primary examples are the recent (2017) AlphaZero program (which plays chess, [SHS17], [SSS17]), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon, [Tes94], [Tes95], [TeG96]). In these game contexts, the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position, while the on-line play algorithm is the method used to play in real time against human or computer opponents. Significantly, the synergy between off-line training and on-line play also underlies MPC (as well as other major classes of sequential decision problems), and indeed the MPC design architecture is very similar to the one of AlphaZero and TD-Gammon. This conceptual insight provides a vehicle for bridging the cultural gap between RL and MPC, and sheds new light on some fundamental issues in MPC. These include the enhancement of stability properties through rollout, the treatment of uncertainty through the use of certainty equivalence, the resilience of MPC in adaptive control settings that involve changing system parameters, and the insights provided by the superlinear performance bounds implied by Newton's method.

6/12/2024

eess.SY cs.AI cs.SY

🏅

MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control

Yiwen Lu, Zishuo Li, Yihan Zhou, Na Li, Yilin Mo

In this paper, we introduce a new class of parameterized controllers, drawing inspiration from Model Predictive Control (MPC). The controller resembles a Quadratic Programming (QP) solver of a linear MPC problem, with the parameters of the controller being trained via Deep Reinforcement Learning (DRL) rather than derived from system models. This approach addresses the limitations of common controllers with Multi-Layer Perceptron (MLP) or other general neural network architecture used in DRL, in terms of verifiability and performance guarantees, and the learned controllers possess verifiable properties like persistent feasibility and asymptotic stability akin to MPC. On the other hand, numerical examples illustrate that the proposed controller empirically matches MPC and MLP controllers in terms of control performance and has superior robustness against modeling uncertainty and noises. Furthermore, the proposed controller is significantly more computationally efficient compared to MPC and requires fewer parameters to learn than MLP controllers. Real-world experiments on vehicle drift maneuvering task demonstrate the potential of these controllers for robotics and other demanding control tasks.

4/10/2024

eess.SY cs.LG cs.RO cs.SY

📈

Model predictive control-based value estimation for efficient reinforcement learning

Qizhen Wu, Kexin Liu, Lei Chen

Reinforcement learning suffers from limitations in real practices primarily due to the number of required interactions with virtual environments. It results in a challenging problem because we are implausible to obtain a local optimal strategy with only a few attempts for many learning methods. Hereby, we design an improved reinforcement learning method based on model predictive control that models the environment through a data-driven approach. Based on the learned environment model, it performs multi-step prediction to estimate the value function and optimize the policy. The method demonstrates higher learning efficiency, faster convergent speed of strategies tending to the local optimal value, and less sample capacity space required by experience replay buffers. Experimental results, both in classic databases and in a dynamic obstacle avoidance scenario for an unmanned aerial vehicle, validate the proposed approaches.

4/12/2024

cs.LG