Novel Actor-Critic Algorithm for Robust Decision Making of CAV under Delays and Loss of V2X Data

Read original: arXiv:2405.05072 - Published 5/9/2024 by Zine el abidine Kherroubi

🔍

Overview

Current autonomous driving systems heavily rely on V2X (vehicle-to-everything) communication data to enhance situational awareness and cooperation between vehicles.
A major challenge with using V2X data is that it may not be available periodically due to unpredictable delays and data loss during wireless transmission.
This paper proposes a novel 'Blind Actor-Critic' algorithm that aims to provide robust driving performance in V2X environments with delayed and/or lost data.

Plain English Explanation

Autonomous driving systems often use information shared between vehicles and infrastructure (V2X communication) to help the vehicles better understand their surroundings and coordinate with each other. However, this V2X data can sometimes be delayed or even lost entirely during wireless transmission, which can be a problem for the control algorithms that rely on it.

To address this issue, the researchers developed a new algorithm called 'Blind Actor-Critic' that is designed to work well even when the V2X data is not consistently available. The key ideas behind this algorithm include:

Using a virtual fixed sampling period to handle the irregular timing of the V2X data.
Combining two learning techniques, Temporal-Difference and Monte Carlo, to improve the algorithm's ability to learn.
Approximating the immediate reward values numerically rather than relying on the delayed V2X data.

By incorporating these mechanisms, the Blind Actor-Critic algorithm is able to maintain robust driving performance even when the V2X network is unreliable and the data is not always available.

Technical Explanation

The paper first illustrates the challenge of temporal aperiodicity (unpredictable timing) in V2X data, which can disrupt the control strategies of connected and autonomous vehicles. To address this issue, the authors propose the Blind Actor-Critic algorithm, which has three key components:

Virtual Fixed Sampling Period: To handle the irregular timing of the V2X data, the algorithm uses a virtual fixed sampling period. This means it operates on a consistent internal clock, regardless of when the actual V2X data arrives.
Temporal-Difference and Monte Carlo Learning: The algorithm combines Temporal-Difference and Monte Carlo learning techniques. Temporal-Difference learning allows the algorithm to update its policy based on incomplete information, while Monte Carlo learning provides more accurate updates when the full sequence of states and rewards is available.
Numerical Approximation of Immediate Reward: Instead of relying on the potentially delayed V2X data to calculate rewards, the algorithm uses a numerical approximation to estimate the immediate reward values. This helps compensate for the lack of timely V2X information.

The researchers evaluate the performance of the Blind Actor-Critic algorithm in a simulation environment and compare it to benchmark approaches. The results show that the training metrics are improved compared to conventional actor-critic algorithms. Furthermore, the testing results demonstrate that the Blind Actor-Critic algorithm provides robust control even under low V2X network reliability levels.

Critical Analysis

The paper addresses an important challenge in the field of autonomous driving and V2X cooperation - the issue of unreliable V2X data. The proposed Blind Actor-Critic algorithm provides a novel solution to this problem by incorporating mechanisms to handle the temporal aperiodicity of the V2X data.

One potential limitation of the research is that it is evaluated solely in a simulation environment. While the simulation results are promising, it would be valuable to see how the algorithm performs in real-world V2X-integrated autonomous driving scenarios. Additionally, the paper does not provide much detail on the specific numerical approximation used for the immediate reward values, which could be an area for further exploration and refinement.

Overall, the Blind Actor-Critic algorithm represents an important step forward in developing robust control strategies for autonomous driving systems that can operate reliably even when the supporting V2X infrastructure is not perfectly reliable.

Conclusion

This paper proposes a novel 'Blind Actor-Critic' algorithm that addresses the challenge of using V2X communication data in autonomous driving systems. The algorithm incorporates several key mechanisms to handle the temporal aperiodicity of V2X data, including a virtual fixed sampling period, a combination of Temporal-Difference and Monte Carlo learning, and a numerical approximation of immediate reward values.

The simulation results demonstrate that the Blind Actor-Critic algorithm can provide robust driving performance even under low V2X network reliability conditions. This research represents an important advancement in the field of connected and autonomous vehicles, paving the way for more reliable and resilient control strategies that can operate effectively in real-world environments with unpredictable V2X data availability.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔍

Novel Actor-Critic Algorithm for Robust Decision Making of CAV under Delays and Loss of V2X Data

Zine el abidine Kherroubi

Current autonomous driving systems heavily rely on V2X communication data to enhance situational awareness and the cooperation between vehicles. However, a major challenge when using V2X data is that it may not be available periodically because of unpredictable delays and data loss during wireless transmission between road stations and the receiver vehicle. This issue should be considered when designing control strategies for connected and autonomous vehicles. Therefore, this paper proposes a novel 'Blind Actor-Critic' algorithm that guarantees robust driving performance in V2X environment with delayed and/or lost data. The novel algorithm incorporates three key mechanisms: a virtual fixed sampling period, a combination of Temporal-Difference and Monte Carlo learning, and a numerical approximation of immediate reward values. To address the temporal aperiodicity problem of V2X data, we first illustrate this challenge. Then, we provide a detailed explanation of the Blind Actor-Critic algorithm where we highlight the proposed components to compensate for the temporal aperiodicity problem of V2X data. We evaluate the performance of our algorithm in a simulation environment and compare it to benchmark approaches. The results demonstrate that training metrics are improved compared to conventional actor-critic algorithms. Additionally, testing results show that our approach provides robust control, even under low V2X network reliability levels.

5/9/2024

🌐

DRL-Based RAT Selection in a Hybrid Vehicular Communication Network

Badreddine Yacine Yacheur (LaBRI), Toufik Ahmed (LaBRI), Mohamed Mosbah (LaBRI)

Cooperative intelligent transport systems rely on a set of Vehicle-to-Everything (V2X) applications to enhance road safety. Emerging new V2X applications like Advanced Driver Assistance Systems (ADASs) and Connected Autonomous Driving (CAD) applications depend on a significant amount of shared data and require high reliability, low end-to-end (E2E) latency, and high throughput. However, present V2X communication technologies such as ITS-G5 and C-V2X (Cellular V2X) cannot satisfy these requirements alone. In this paper, we propose an intelligent, scalable hybrid vehicular communication architecture that leverages the performance of multiple Radio Access Technologies (RATs) to meet the needs of these applications. Then, we propose a communication mode selection algorithm based on Deep Reinforcement Learning (DRL) to maximize the network's reliability while limiting resource consumption. Finally, we assess our work using the platooning scenario that requires high reliability. Numerical results reveal that the hybrid vehicular communication architecture has the potential to enhance the packet reception rate (PRR) by up to 30% compared to both the static RAT selection strategy and the multi-criteria decision-making (MCDM) selection algorithm. Additionally, it improves the efficiency of the redundant communication mode by 20% regarding resource consumption

7/2/2024

📈

Multi-Agent Soft Actor-Critic with Global Loss for Autonomous Mobility-on-Demand Fleet Control

Zeno Woywood, Jasper I. Wiltfang, Julius Luy, Tobias Enders, Maximilian Schiffer

We study a sequential decision-making problem for a profit-maximizing operator of an Autonomous Mobility-on-Demand system. Optimizing a central operator's vehicle-to-request dispatching policy requires efficient and effective fleet control strategies. To this end, we employ a multi-agent Soft Actor-Critic algorithm combined with weighted bipartite matching. We propose a novel vehicle-based algorithm architecture and adapt the critic's loss function to appropriately consider global actions. Furthermore, we extend our algorithm to incorporate rebalancing capabilities. Through numerical experiments, we show that our approach outperforms state-of-the-art benchmarks by up to 12.9% for dispatching and up to 38.9% with integrated rebalancing.

4/11/2024

End-to-End Autonomous Driving through V2X Cooperation

Haibao Yu, Wenxian Yang, Jiaru Zhong, Zhenwei Yang, Siqi Fan, Ping Luo, Zaiqing Nie

Cooperatively utilizing both ego-vehicle and infrastructure sensor data via V2X communication has emerged as a promising approach for advanced autonomous driving. However, current research mainly focuses on improving individual modules, rather than taking end-to-end learning to optimize final planning performance, resulting in underutilized data potential. In this paper, we introduce UniV2X, a pioneering cooperative autonomous driving framework that seamlessly integrates all key driving modules across diverse views into a unified network. We propose a sparse-dense hybrid data transmission and fusion mechanism for effective vehicle-infrastructure cooperation, offering three advantages: 1) Effective for simultaneously enhancing agent perception, online mapping, and occupancy prediction, ultimately improving planning performance. 2) Transmission-friendly for practical and limited communication conditions. 3) Reliable data fusion with interpretability of this hybrid data. We implement UniV2X, as well as reproducing several benchmark methods, on the challenging DAIR-V2X, the real-world cooperative driving dataset. Experimental results demonstrate the effectiveness of UniV2X in significantly enhancing planning performance, as well as all intermediate output performance. Code is at https://github.com/AIR-THU/UniV2X.

4/23/2024