Multi-Agent Soft Actor-Critic with Global Loss for Autonomous Mobility-on-Demand Fleet Control

2404.06975

Published 4/11/2024 by Zeno Woywood, Jasper I. Wiltfang, Julius Luy, Tobias Enders, Maximilian Schiffer

📈

Abstract

We study a sequential decision-making problem for a profit-maximizing operator of an Autonomous Mobility-on-Demand system. Optimizing a central operator's vehicle-to-request dispatching policy requires efficient and effective fleet control strategies. To this end, we employ a multi-agent Soft Actor-Critic algorithm combined with weighted bipartite matching. We propose a novel vehicle-based algorithm architecture and adapt the critic's loss function to appropriately consider global actions. Furthermore, we extend our algorithm to incorporate rebalancing capabilities. Through numerical experiments, we show that our approach outperforms state-of-the-art benchmarks by up to 12.9% for dispatching and up to 38.9% with integrated rebalancing.

Create account to get full access

Overview

The paper explores a sequential decision-making problem for a profit-maximizing operator of an Autonomous Mobility-on-Demand (AMoD) system.
The researchers employ a multi-agent Soft Actor-Critic (SAC) algorithm combined with weighted bipartite matching to optimize the central operator's vehicle-to-request dispatching policy.
They propose a novel vehicle-based algorithm architecture and adapt the critic's loss function to appropriately consider global actions.
The algorithm is further extended to incorporate rebalancing capabilities.
Numerical experiments show that the proposed approach outperforms state-of-the-art benchmarks by up to 12.9% for dispatching and up to 38.9% with integrated rebalancing.

Plain English Explanation

The paper focuses on optimizing the operations of an Autonomous Mobility-on-Demand (AMoD) system, where a central operator manages a fleet of autonomous vehicles to meet customer transportation requests. The key challenge is to efficiently match vehicles to requests in a way that maximizes the operator's profits.

To tackle this problem, the researchers use a multi-agent reinforcement learning approach called Soft Actor-Critic (SAC). This algorithm allows the vehicles to learn how to make optimal decisions in real-time, without needing to plan every step in advance.

The researchers also introduce a novel vehicle-based algorithm architecture and modify the SAC algorithm to better account for the global impacts of the vehicle-to-request assignments. Additionally, they extend the algorithm to include the ability to rebalance the vehicle fleet, moving vehicles to areas where more requests are expected.

Through computer simulations, the researchers demonstrate that their approach outperforms other state-of-the-art methods, improving profitability by up to 12.9% for just the dispatching task, and up to 38.9% when the rebalancing capability is included.

Technical Explanation

The researchers formulate the AMoD system as a sequential decision-making problem, where the central operator aims to maximize profits by optimizing the vehicle-to-request dispatching policy. They employ a multi-agent Soft Actor-Critic (SAC) algorithm combined with weighted bipartite matching to solve this problem.

The proposed algorithm architecture is vehicle-based, meaning each vehicle is an independent agent that learns to make its own decisions. The researchers adapt the critic's loss function to better account for the global impacts of the vehicle-to-request assignments, as opposed to just considering the local rewards for each vehicle.

Furthermore, the researchers extend their algorithm to incorporate rebalancing capabilities, allowing the vehicles to proactively move to areas with expected high demand, similar to distributed autonomous swarm formation and active learning-based coverage control.

Through numerical experiments on a simulated AMoD system, the researchers demonstrate that their approach outperforms state-of-the-art benchmarks, including federated reinforcement learning for robot motion planning. The proposed method achieves up to 12.9% higher profits for just the dispatching task and up to 38.9% higher profits when the rebalancing capability is integrated.

Critical Analysis

The paper presents a novel and promising approach to optimizing the operations of an AMoD system, but it also acknowledges several limitations and areas for further research.

One key limitation is the reliance on a centralized operator model, which may not be scalable or practical in real-world deployments. The researchers suggest exploring decentralized or federated learning approaches to address this issue.

Additionally, the numerical experiments are conducted on simulated data, and the performance of the algorithm may be affected by the fidelity of the simulation model. Further validation on real-world data would be necessary to assess the algorithm's robustness and generalizability.

The paper also does not address potential ethical and societal impacts of widespread AMoD adoption, such as job displacement for traditional taxi and ride-hailing drivers. These considerations should be carefully examined in future research.

Despite these limitations, the researchers' innovative use of multi-agent reinforcement learning and the incorporation of rebalancing capabilities represent significant advancements in the field of AMoD optimization. The promising results suggest that further refinement and real-world testing of the proposed algorithm could lead to significant improvements in the efficiency and profitability of Autonomous Mobility-on-Demand systems.

Conclusion

This paper presents a novel approach to optimizing the operations of an Autonomous Mobility-on-Demand (AMoD) system, using a multi-agent Soft Actor-Critic algorithm combined with weighted bipartite matching. The researchers introduce a vehicle-based algorithm architecture and adapt the critic's loss function to better account for global impacts, while also extending the algorithm to incorporate rebalancing capabilities.

Through numerical experiments, the researchers demonstrate that their approach outperforms state-of-the-art benchmarks, improving profitability by up to 12.9% for dispatching and up to 38.9% with integrated rebalancing. While the paper acknowledges certain limitations and areas for further research, the innovative use of reinforcement learning and the promising results suggest that this work represents a significant step forward in the optimization of AMoD systems, with the potential to drive more efficient and profitable transportation services in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤿

Global Rewards in Multi-Agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems

Heiko Hoppe, Tobias Enders, Quentin Cappart, Maximilian Schiffer

We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.

5/21/2024

cs.LG cs.MA cs.SY eess.SY

🏅

Real-time Control of Electric Autonomous Mobility-on-Demand Systems via Graph Reinforcement Learning

Aaryan Singhal, Daniele Gammelli, Justin Luke, Karthik Gopalakrishnan, Dominik Helmreich, Marco Pavone

Operators of Electric Autonomous Mobility-on-Demand (E-AMoD) fleets need to make several real-time decisions such as matching available vehicles to ride requests, rebalancing idle vehicles to areas of high demand, and charging vehicles to ensure sufficient range. While this problem can be posed as a linear program that optimizes flows over a space-charge-time graph, the size of the resulting optimization problem does not allow for real-time implementation in realistic settings. In this work, we present the E-AMoD control problem through the lens of reinforcement learning and propose a graph network-based framework to achieve drastically improved scalability and superior performance over heuristics. Specifically, we adopt a bi-level formulation where we (1) leverage a graph network-based RL agent to specify a desired next state in the space-charge graph, and (2) solve more tractable linear programs to best achieve the desired state while ensuring feasibility. Experiments using real-world data from San Francisco and New York City show that our approach achieves up to 89% of the profits of the theoretically-optimal solution while achieving more than a 100x speedup in computational time. We further highlight promising zero-shot transfer capabilities of our learned policy on tasks such as inter-city generalization and service area expansion, thus showing the utility, scalability, and flexibility of our framework. Finally, our approach outperforms the best domain-specific heuristics with comparable runtimes, with an increase in profits by up to 3.2x.

4/5/2024

eess.SY cs.LG cs.RO cs.SY

🚀

ISAACS: Iterative Soft Adversarial Actor-Critic for Safety

Kai-Chieh Hsu, Duy Phuong Nguyen, Jaime Fern'andez Fisac

The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable deep methods lack guarantees and tend to exhibit little robustness to uncertain operating conditions. This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems with general nonlinear dynamics subject to bounded modeling error by combining game-theoretic safety analysis with adversarial reinforcement learning in simulation. Following a soft actor-critic scheme, a safety-seeking fallback policy is co-trained with an adversarial disturbance agent that aims to invoke the worst-case realization of model error and training-to-deployment discrepancy allowed by the designer's uncertainty. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter (or shield) with robust safety guarantees based on forward reachability rollouts. This shield can be used in conjunction with a safety-agnostic control policy, precluding any task-driven actions that could result in loss of safety. We evaluate our learning-based safety approach in a 5D race car simulator, compare the learned safety policy to the numerically obtained optimal solution, and empirically validate the robust safety guarantee of our proposed safety shield against worst-case model discrepancy.

6/11/2024

cs.LG cs.RO cs.SY eess.SY

🔍

Novel Actor-Critic Algorithm for Robust Decision Making of CAV under Delays and Loss of V2X Data

Zine el abidine Kherroubi

Current autonomous driving systems heavily rely on V2X communication data to enhance situational awareness and the cooperation between vehicles. However, a major challenge when using V2X data is that it may not be available periodically because of unpredictable delays and data loss during wireless transmission between road stations and the receiver vehicle. This issue should be considered when designing control strategies for connected and autonomous vehicles. Therefore, this paper proposes a novel 'Blind Actor-Critic' algorithm that guarantees robust driving performance in V2X environment with delayed and/or lost data. The novel algorithm incorporates three key mechanisms: a virtual fixed sampling period, a combination of Temporal-Difference and Monte Carlo learning, and a numerical approximation of immediate reward values. To address the temporal aperiodicity problem of V2X data, we first illustrate this challenge. Then, we provide a detailed explanation of the Blind Actor-Critic algorithm where we highlight the proposed components to compensate for the temporal aperiodicity problem of V2X data. We evaluate the performance of our algorithm in a simulation environment and compare it to benchmark approaches. The results demonstrate that training metrics are improved compared to conventional actor-critic algorithms. Additionally, testing results show that our approach provides robust control, even under low V2X network reliability levels.

5/9/2024

cs.LG cs.AI