Human-compatible driving partners through data-regularized self-play reinforcement learning

Read original: arXiv:2403.19648 - Published 6/26/2024 by Daphne Cornelisse, Eugene Vinitsky

Human-compatible driving partners through data-regularized self-play reinforcement learning

Overview

This paper proposes a novel approach to training autonomous driving agents to be compatible with human drivers through data-regularized self-play reinforcement learning.
The key ideas include using a multi-agent setup with human-driven vehicles, regularizing the agent's policy to match human behaviors, and leveraging self-play to optimize for human-compatibility.
The authors evaluate their approach in simulation and demonstrate improved safety and comfort for human drivers compared to standard RL policies.

Plain English Explanation

The paper is about training self-driving car algorithms to be better at interacting with human drivers on the road. The researchers used a simulation environment with both autonomous and human-driven vehicles. They had the autonomous vehicle learn through a process called "self-play," where it practiced driving and interacting with the human-driven cars.

To help the autonomous vehicle learn to drive in a way that is comfortable and safe for the humans, the researchers added a "regularization" step. This means they trained the algorithm to not just drive efficiently, but to mimic the patterns and behaviors of human drivers. [The paper discusses approaches like this in more detail, such as those from optimizing-autonomous-driving-safety-human-centric-approach, autonomous-algorithm-training-autonomous-vehicles-minimal-human, and context-learning-automated-driving-scenarios.]

By doing this, the autonomous vehicle could learn to drive in a way that human drivers would find natural and comfortable, rather than just optimizing for speed or efficiency. The researchers tested this approach in simulation and found that it led to improved safety and comfort for the human drivers sharing the road.

Technical Explanation

The authors propose a data-regularized self-play reinforcement learning (RL) framework for training autonomous driving agents to be compatible with human-driven vehicles. They create a multi-agent simulation environment with both autonomous and human-driven cars, and have the autonomous agent learn through self-play interactions with the human-driven vehicles.

To encourage human-compatibility, the authors introduce a data regularization term in the RL objective that encourages the autonomous agent's policy to match the observed behaviors of human drivers. This is inspired by approaches like haim-drl-enhanced-human-loop-reinforcement-learning and hierarchical-learned-risk-aware-planning-framework-human.

Through extensive simulation experiments, the authors demonstrate that their data-regularized self-play approach leads to autonomous agents that exhibit improved safety and comfort for human drivers, as measured by reduced number of collisions and smoother driving behaviors, compared to standard RL policies.

Critical Analysis

The authors provide a thorough evaluation of their proposed approach and acknowledge several limitations. One key limitation is that the study is conducted entirely in simulation, and it remains to be seen how well the trained autonomous agents would perform in the real world with all its complexities and unpredictabilities.

Additionally, the authors note that their approach relies on having access to high-quality human driving data, which may not always be readily available. There are also open questions around how to best design the reward function and regularization terms to achieve the desired human-compatible behaviors.

Further research could explore techniques for online adaptation and learning from limited real-world data, as well as investigating the ethical and social implications of deploying such human-compatible autonomous driving systems at scale.

Conclusion

This paper presents a novel approach to training autonomous driving agents that prioritizes compatibility with human drivers. By leveraging data-regularized self-play reinforcement learning, the authors demonstrate improved safety and comfort for human drivers sharing the road with their autonomous agents.

While the simulation-based results are promising, there remain important practical and ethical considerations to address before such systems can be deployed in the real world. Nonetheless, this work represents a valuable step forward in developing autonomous driving technologies that can seamlessly and safely integrate with human-driven vehicles.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Human-compatible driving partners through data-regularized self-play reinforcement learning

Daphne Cornelisse, Eugene Vinitsky

A central challenge for autonomous vehicles is coordinating with humans. Therefore, incorporating realistic human agents is essential for scalable training and evaluation of autonomous driving systems in simulation. Simulation agents are typically developed by imitating large-scale, high-quality datasets of human driving. However, pure imitation learning agents empirically have high collision rates when executed in a multi-agent closed-loop setting. To build agents that are realistic and effective in closed-loop settings, we propose Human-Regularized PPO (HR-PPO), a multi-agent algorithm where agents are trained through self-play with a small penalty for deviating from a human reference policy. In contrast to prior work, our approach is RL-first and only uses 30 minutes of imperfect human demonstrations. We evaluate agents in a large set of multi-agent traffic scenes. Results show our HR-PPO agents are highly effective in achieving goals, with a success rate of 93%, an off-road rate of 3.5%, and a collision rate of 3%. At the same time, the agents drive in a human-like manner, as measured by their similarity to existing human driving logs. We also find that HR-PPO agents show considerable improvements on proxy measures for coordination with human driving, particularly in highly interactive scenarios. We open-source our code and trained agents at https://github.com/Emerge-Lab/nocturne_lab and provide demonstrations of agent behaviors at https://sites.google.com/view/driving-partners.

6/26/2024

Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving

Zilin Huang, Zihao Sheng, Sikai Chen

In the field of autonomous driving, developing safe and trustworthy autonomous driving policies remains a significant challenge. Recently, Reinforcement Learning with Human Feedback (RLHF) has attracted substantial attention due to its potential to enhance training safety and sampling efficiency. Nevertheless, existing RLHF-enabled methods often falter when faced with imperfect human demonstrations, potentially leading to training oscillations or even worse performance than rule-based approaches. Inspired by the human learning process, we propose Physics-enhanced Reinforcement Learning with Human Feedback (PE-RLHF). This novel framework synergistically integrates human feedback (e.g., human intervention and demonstration) and physics knowledge (e.g., traffic flow model) into the training loop of reinforcement learning. The key advantage of PE-RLHF is its guarantee that the learned policy will perform at least as well as the given physics-based policy, even when human feedback quality deteriorates, thus ensuring trustworthy safety improvements. PE-RLHF introduces a Physics-enhanced Human-AI (PE-HAI) collaborative paradigm for dynamic action selection between human and physics-based actions, employs a reward-free approach with a proxy value function to capture human preferences, and incorporates a minimal intervention mechanism to reduce the cognitive load on human mentors. Extensive experiments across diverse driving scenarios demonstrate that PE-RLHF significantly outperforms traditional methods, achieving state-of-the-art (SOTA) performance in safety, efficiency, and generalizability, even with varying quality of human feedback. The philosophy behind PE-RLHF not only advances autonomous driving technology but can also offer valuable insights for other safety-critical domains. Demo video and code are available at: https://zilin-huang.github.io/PE-RLHF-website/

9/6/2024

📶

Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving

Zhenghao Peng, Wenjie Luo, Yiren Lu, Tianyi Shen, Cole Gulino, Ari Seff, Justin Fu

A major challenge in autonomous vehicle research is modeling agent behaviors, which has critical applications including constructing realistic and reliable simulations for off-board evaluation and forecasting traffic agents motion for onboard planning. While supervised learning has shown success in modeling agents across various domains, these models can suffer from distribution shift when deployed at test-time. In this work, we improve the reliability of agent behaviors by closed-loop fine-tuning of behavior models with reinforcement learning. Our method demonstrates improved overall performance, as well as improved targeted metrics such as collision rate, on the Waymo Open Sim Agents challenge. Additionally, we present a novel policy evaluation benchmark to directly assess the ability of simulated agents to measure the quality of autonomous vehicle planners and demonstrate the effectiveness of our approach on this new benchmark.

9/30/2024

Hybrid Imitation-Learning Motion Planner for Urban Driving

Cristian Gariboldi, Matteo Corno, Beng Jin

With the release of open source datasets such as nuPlan and Argoverse, the research around learning-based planners has spread a lot in the last years. Existing systems have shown excellent capabilities in imitating the human driver behaviour, but they struggle to guarantee safe closed-loop driving. Conversely, optimization-based planners offer greater security in short-term planning scenarios. To confront this challenge, in this paper we propose a novel hybrid motion planner that integrates both learning-based and optimization-based techniques. Initially, a multilayer perceptron (MLP) generates a human-like trajectory, which is then refined by an optimization-based component. This component not only minimizes tracking errors but also computes a trajectory that is both kinematically feasible and collision-free with obstacles and road boundaries. Our model effectively balances safety and human-likeness, mitigating the trade-off inherent in these objectives. We validate our approach through simulation experiments and further demonstrate its efficacy by deploying it in real-world self-driving vehicles.

9/5/2024