Real-time Control of Electric Autonomous Mobility-on-Demand Systems via Graph Reinforcement Learning

2311.05780

Published 4/5/2024 by Aaryan Singhal, Daniele Gammelli, Justin Luke, Karthik Gopalakrishnan, Dominik Helmreich, Marco Pavone

eess.SY cs.LG cs.RO cs.SY

🏅

Abstract

Operators of Electric Autonomous Mobility-on-Demand (E-AMoD) fleets need to make several real-time decisions such as matching available vehicles to ride requests, rebalancing idle vehicles to areas of high demand, and charging vehicles to ensure sufficient range. While this problem can be posed as a linear program that optimizes flows over a space-charge-time graph, the size of the resulting optimization problem does not allow for real-time implementation in realistic settings. In this work, we present the E-AMoD control problem through the lens of reinforcement learning and propose a graph network-based framework to achieve drastically improved scalability and superior performance over heuristics. Specifically, we adopt a bi-level formulation where we (1) leverage a graph network-based RL agent to specify a desired next state in the space-charge graph, and (2) solve more tractable linear programs to best achieve the desired state while ensuring feasibility. Experiments using real-world data from San Francisco and New York City show that our approach achieves up to 89% of the profits of the theoretically-optimal solution while achieving more than a 100x speedup in computational time. We further highlight promising zero-shot transfer capabilities of our learned policy on tasks such as inter-city generalization and service area expansion, thus showing the utility, scalability, and flexibility of our framework. Finally, our approach outperforms the best domain-specific heuristics with comparable runtimes, with an increase in profits by up to 3.2x.

Create account to get full access

Overview

Operators of Electric Autonomous Mobility-on-Demand (E-AMoD) fleets face real-time challenges like matching vehicles to ride requests, rebalancing idle vehicles, and ensuring sufficient vehicle charge.
While this problem can be formulated as a linear program optimizing flows over a space-charge-time graph, the resulting optimization problem is too large for real-time implementation.
This paper proposes a reinforcement learning-based framework using graph networks to drastically improve scalability and performance compared to heuristics.

Plain English Explanation

Operating a fleet of self-driving electric vehicles for on-demand transportation involves making many decisions in real-time. Operators need to figure out which vehicles should pick up which passengers, where to send idle vehicles to meet future demand, and make sure the vehicles have enough battery charge to complete their trips.

Mathematically, this problem can be set up as a large-scale optimization problem, but the size of the resulting calculations makes it impractical to solve quickly enough for real-world use. This paper presents an alternative approach using reinforcement learning and graph neural networks, which can make decisions much faster while still performing very well.

The key idea is to use a graph network-based AI agent to propose a desired future state for the fleet (e.g., where vehicles should be located and how charged they should be). Then, the system solves a simpler optimization problem to actually transition the fleet to that desired state in a feasible way. Experiments show this approach can achieve up to 89% of the theoretical maximum profit, while running over 100 times faster than the full optimization.

Importantly, the learned AI policy also demonstrates the ability to generalize to new situations, like operating the fleet in a different city or expanding the service area. This suggests the framework is scalable and flexible, outperforming specialized heuristic algorithms.

Technical Explanation

The paper frames the E-AMoD control problem through the lens of reinforcement learning, proposing a graph network-based framework to achieve improved scalability and performance over heuristic approaches.

The core idea is a bi-level formulation, where (1) a graph network-based RL agent specifies a desired next state in the space-charge graph, and (2) more tractable linear programs are solved to best achieve that desired state while ensuring feasibility. This builds on prior work in reinforcement learning for autonomous mobility systems.

Experiments using real-world data from San Francisco and New York City demonstrate the effectiveness of this approach. The proposed framework achieves up to 89% of the profits of the theoretically-optimal solution, while providing more than a 100x speedup in computational time compared to the full optimization problem.

The paper also highlights promising zero-shot transfer capabilities of the learned policy, showcasing its utility, scalability, and flexibility. The approach is shown to outperform the best domain-specific heuristics, with an increase in profits of up to 3.2x, while maintaining comparable runtimes.

Critical Analysis

The paper presents a compelling reinforcement learning-based solution to the challenging E-AMoD control problem, with strong empirical results. However, a few caveats and limitations are worth noting.

First, the paper does not delve into the specifics of the RL training process, such as the reward function design, exploration strategies, or hyperparameter tuning. Further research could explore how these factors impact the final performance.

Additionally, the paper only evaluates the framework on two specific cities, raising questions about the generalizability to other urban environments with different characteristics. Future work could assess the approach's performance in a wider range of scenarios, including rural or suburban areas.

Finally, while the zero-shot transfer results are promising, the paper does not provide a detailed analysis of the underlying reasons for this capability. Understanding the factors that enable such strong generalization would be valuable for further development and deployment of the framework.

Overall, this paper presents an innovative and effective solution to a crucial problem in the emerging field of electric autonomous mobility. The reinforcement learning approach offers significant performance improvements over traditional methods, with intriguing possibilities for real-world application and further research.

Conclusion

This paper tackles the complex challenge of real-time decision-making for operators of Electric Autonomous Mobility-on-Demand (E-AMoD) fleets. By framing the problem through the lens of reinforcement learning and leveraging graph network-based techniques, the authors have developed a scalable and high-performing framework that outperforms heuristic approaches.

The key innovation is a bi-level formulation where an RL agent specifies a desired future state, and then a simpler optimization problem is solved to transition the fleet to that state. This allows for drastically faster computation while still achieving near-optimal performance.

Importantly, the learned policy also demonstrates impressive generalization capabilities, suggesting the framework's utility, scalability, and flexibility. With the rapid growth of autonomous electric mobility, this research represents an important step forward in enabling efficient and responsive real-world operations.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤿

Global Rewards in Multi-Agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems

Heiko Hoppe, Tobias Enders, Quentin Cappart, Maximilian Schiffer

We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.

5/21/2024

cs.LG cs.MA cs.SY eess.SY

Centralized vs. Decentralized Multi-Agent Reinforcement Learning for Enhanced Control of Electric Vehicle Charging Networks

Amin Shojaeighadikolaei, Zsolt Talata, Morteza Hashemi

The widespread adoption of electric vehicles (EVs) poses several challenges to power distribution networks and smart grid infrastructure due to the possibility of significantly increasing electricity demands, especially during peak hours. Furthermore, when EVs participate in demand-side management programs, charging expenses can be reduced by using optimal charging control policies that fully utilize real-time pricing schemes. However, devising optimal charging methods and control strategies for EVs is challenging due to various stochastic and uncertain environmental factors. Currently, most EV charging controllers operate based on a centralized model. In this paper, we introduce a novel approach for distributed and cooperative charging strategy using a Multi-Agent Reinforcement Learning (MARL) framework. Our method is built upon the Deep Deterministic Policy Gradient (DDPG) algorithm for a group of EVs in a residential community, where all EVs are connected to a shared transformer. This method, referred to as CTDE-DDPG, adopts a Centralized Training Decentralized Execution (CTDE) approach to establish cooperation between agents during the training phase, while ensuring a distributed and privacy-preserving operation during execution. We theoretically examine the performance of centralized and decentralized critics for the DDPG-based MARL implementation and demonstrate their trade-offs. Furthermore, we numerically explore the efficiency, scalability, and performance of centralized and decentralized critics. Our theoretical and numerical results indicate that, despite higher policy gradient variances and training complexity, the CTDE-DDPG framework significantly improves charging efficiency by reducing total variation by approximately %36 and charging cost by around %9.1 on average...

4/22/2024

cs.AI

📈

Multi-Agent Soft Actor-Critic with Global Loss for Autonomous Mobility-on-Demand Fleet Control

Zeno Woywood, Jasper I. Wiltfang, Julius Luy, Tobias Enders, Maximilian Schiffer

We study a sequential decision-making problem for a profit-maximizing operator of an Autonomous Mobility-on-Demand system. Optimizing a central operator's vehicle-to-request dispatching policy requires efficient and effective fleet control strategies. To this end, we employ a multi-agent Soft Actor-Critic algorithm combined with weighted bipartite matching. We propose a novel vehicle-based algorithm architecture and adapt the critic's loss function to appropriately consider global actions. Furthermore, we extend our algorithm to incorporate rebalancing capabilities. Through numerical experiments, we show that our approach outperforms state-of-the-art benchmarks by up to 12.9% for dispatching and up to 38.9% with integrated rebalancing.

4/11/2024

eess.SY cs.LG cs.MA cs.SY

Reinforcement Learning Based Oscillation Dampening: Scaling up Single-Agent RL algorithms to a 100 AV highway field operational test

Kathy Jang, Nathan Lichtl'e, Eugene Vinitsky, Adit Shah, Matthew Bunting, Matthew Nice, Benedetto Piccoli, Benjamin Seibold, Daniel B. Work, Maria Laura Delle Monache, Jonathan Sprinkle, Jonathan W. Lee, Alexandre M. Bayen

In this article, we explore the technical details of the reinforcement learning (RL) algorithms that were deployed in the largest field test of automated vehicles designed to smooth traffic flow in history as of 2023, uncovering the challenges and breakthroughs that come with developing RL controllers for automated vehicles. We delve into the fundamental concepts behind RL algorithms and their application in the context of self-driving cars, discussing the developmental process from simulation to deployment in detail, from designing simulators to reward function shaping. We present the results in both simulation and deployment, discussing the flow-smoothing benefits of the RL controller. From understanding the basics of Markov decision processes to exploring advanced techniques such as deep RL, our article offers a comprehensive overview and deep dive of the theoretical foundations and practical implementations driving this rapidly evolving field. We also showcase real-world case studies and alternative research projects that highlight the impact of RL controllers in revolutionizing autonomous driving. From tackling complex urban environments to dealing with unpredictable traffic scenarios, these intelligent controllers are pushing the boundaries of what automated vehicles can achieve. Furthermore, we examine the safety considerations and hardware-focused technical details surrounding deployment of RL controllers into automated vehicles. As these algorithms learn and evolve through interactions with the environment, ensuring their behavior aligns with safety standards becomes crucial. We explore the methodologies and frameworks being developed to address these challenges, emphasizing the importance of building reliable control systems for automated vehicles.

5/15/2024

eess.SY cs.RO cs.SY