AD-H: Autonomous Driving with Hierarchical Agents

2406.03474

Published 6/6/2024 by Zaibin Zhang, Shiyu Tang, Yuanhang Zhang, Talas Fu, Yifan Wang, Yang Liu, Dong Wang, Jing Shao, Lijun Wang, Huchuan Lu

cs.CV

AD-H: Autonomous Driving with Hierarchical Agents

Abstract

Due to the impressive capabilities of multimodal large language models (MLLMs), recent works have focused on employing MLLM-based agents for autonomous driving in large-scale and dynamic environments. However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails to fully harness their emergent powers. As a result, the generalizability of these methods is highly restricted by autonomous driving datasets used during fine-tuning. To tackle this challenge, we propose to connect high-level instructions and low-level control signals with mid-level language-driven commands, which are more fine-grained than high-level instructions but more universal and explainable than control signals, and thus can effectively bridge the gap in between. We implement this idea through a hierarchical multi-agent driving system named AD-H, including a MLLM planner for high-level reasoning and a lightweight controller for low-level execution. The hierarchical design liberates the MLLM from low-level control signal decoding and therefore fully releases their emergent capability in high-level perception, reasoning, and planning. We build a new dataset with action hierarchy annotations. Comprehensive closed-loop evaluations demonstrate several key advantages of our proposed AD-H system. First, AD-H can notably outperform state-of-the-art methods in achieving exceptional driving performance, even exhibiting self-correction capabilities during vehicle operation, a scenario not encountered in the training dataset. Second, AD-H demonstrates superior generalization under long-horizon instructions and novel environmental conditions, significantly surpassing current state-of-the-art methods. We will make our data and code publicly accessible at https://github.com/zhangzaibin/AD-H

Create account to get full access

Overview

This paper presents a hierarchical approach for autonomous driving, called AD-H, which combines a high-level decision-making module with low-level controllers to navigate complex driving scenarios.
The key idea is to divide the autonomous driving task into a hierarchy of agents, each responsible for different aspects of decision-making and control.
The authors demonstrate the effectiveness of their approach through simulation experiments, showing improved performance compared to a monolithic end-to-end model.

Plain English Explanation

The paper describes a new way to build autonomous driving systems, called AD-H, that uses a hierarchical approach. Instead of having a single model that tries to handle all aspects of driving, AD-H splits the task into different "agents" or modules, each responsible for a specific part of the problem.

At the top level, there is a decision-making module that decides on high-level actions, like whether to change lanes or turn. Below that, there are lower-level controllers that handle the detailed controls, like steering, acceleration, and braking. By dividing the problem this way, the authors believe they can create a more efficient and effective autonomous driving system.

The key advantage of this hierarchical approach is that it allows the system to focus on specific sub-tasks, rather than trying to learn everything at once. For example, the high-level decision-maker can focus on navigating the overall route, while the lower-level controllers can specialize in the precise control of the vehicle.

The authors test their AD-H system in simulation and show that it outperforms a more traditional "end-to-end" autonomous driving model, where a single neural network tries to handle all aspects of the driving task. This suggests that the hierarchical approach may be a promising direction for building robust and capable autonomous driving systems.

Technical Explanation

The paper presents a hierarchical approach for autonomous driving, called AD-H, which combines a high-level decision-making module with low-level controllers to navigate complex driving scenarios. The key idea is to divide the autonomous driving task into a hierarchy of agents, each responsible for different aspects of decision-making and control.

At the top level, the high-level decision-making module is responsible for strategic planning, such as deciding on the overall route, lane changes, and turns. This module uses a neural network to process sensor inputs and environmental information to generate high-level actions.

Beneath the decision-making module are the low-level controllers, which are responsible for the precise control of the vehicle, such as steering, acceleration, and braking. These controllers use traditional control theory techniques to execute the high-level actions generated by the decision-making module.

The authors demonstrate the effectiveness of their AD-H approach through simulation experiments, comparing its performance to a monolithic end-to-end autonomous driving model. The results show that the hierarchical approach outperforms the end-to-end model in terms of task completion, safety, and other metrics.

The key advantage of the hierarchical approach is that it allows the system to focus on specific sub-tasks, rather than trying to learn everything at once. This can lead to more efficient and robust performance, as each module can specialize in its own domain.

Critical Analysis

The authors acknowledge several limitations and areas for further research in their paper. For example, they note that their simulation experiments do not fully capture the complexities of real-world driving, and that further testing in physical environments would be needed to fully validate the approach.

Additionally, the authors mention that their current implementation assumes perfect communication and coordination between the high-level decision-making module and the low-level controllers. In a real-world setting, there may be latency, errors, or other issues that could disrupt this coordination, and the authors suggest that addressing these challenges could be an important area for future work.

Another potential concern is the interpretability and transparency of the hierarchical approach. While the division into specialized modules may improve performance, it could also make it more difficult to understand and debug the system's decision-making process. The authors do not address this issue in depth, and it could be an important consideration for real-world deployment.

Overall, the AD-H approach presented in this paper is a promising direction for autonomous driving, but there are still several challenges and areas for further research that would need to be addressed before it could be deployed in real-world settings.

Conclusion

This paper presents a hierarchical approach to autonomous driving, called AD-H, which divides the driving task into a hierarchy of specialized agents. The authors demonstrate the effectiveness of this approach through simulation experiments, showing that it outperforms a more traditional end-to-end autonomous driving model.

While the authors acknowledge several limitations and areas for further research, the AD-H approach represents a promising direction for building more capable and reliable autonomous driving systems. As the field of autonomous driving continues to evolve, approaches like this that leverage specialized modules and hierarchical decision-making may play an important role in advancing the state of the art.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

In-context Learning for Automated Driving Scenarios

Ziqi Zhou, Jingyue Zhang, Jingyuan Zhang, Boyue Wang, Tianyu Shi, Alaa Khamis

One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively. This paper introduces an innovative approach utilizing Large Language Models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way. We developed a framework where instructions and dynamic environment descriptions are input into the LLM. The LLM then utilizes this information to assist in generating rewards, thereby steering the behavior of RL agents towards patterns that more closely resemble human driving. The experimental results demonstrate that this approach not only makes RL agents more anthropomorphic but also reaches better performance. Additionally, various strategies for reward-proxy and reward-shaping are investigated, revealing the significant impact of prompt design on shaping an AD vehicle's behavior. These findings offer a promising direction for the development of more advanced and human-like automated driving systems. Our experimental data and source code can be found here.

5/8/2024

cs.AI

GAD-Generative Learning for HD Map-Free Autonomous Driving

Weijian Sun, Yanbo Jia, Qi Zeng, Zihao Liu, Jiang Liao, Yue Li, Xianfeng Li

Deep-learning-based techniques have been widely adopted for autonomous driving software stacks for mass production in recent years, focusing primarily on perception modules, with some work extending this method to prediction modules. However, the downstream planning and control modules are still designed with hefty handcrafted rules, dominated by optimization-based methods such as quadratic programming or model predictive control. This results in a performance bottleneck for autonomous driving systems in that corner cases simply cannot be solved by enumerating hand-crafted rules. We present a deep-learning-based approach that brings prediction, decision, and planning modules together with the attempt to overcome the rule-based methods' deficiency in real-world applications of autonomous driving, especially for urban scenes. The DNN model we proposed is solely trained with 10 hours of human driver data, and it supports all mass-production ADAS features available on the market to date. This method is deployed onto a Jiyue test car with no modification to its factory-ready sensor set and compute platform. the feasibility, usability, and commercial potential are demonstrated in this article.

6/3/2024

cs.RO cs.CV

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao

Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitive process. Specifically, LeapAD emulates human attention by selecting critical objects relevant to driving decisions, simplifying environmental interpretation, and mitigating decision-making complexities. Additionally, LeapAD incorporates an innovative dual-process decision-making module, which consists of an Analytic Process (System-II) for thorough analysis and reasoning, along with a Heuristic Process (System-I) for swift and empirical processing. The Analytic Process leverages its logical reasoning to accumulate linguistic driving experience, which is then transferred to the Heuristic Process by supervised fine-tuning. Through reflection mechanisms and a growing memory bank, LeapAD continuously improves itself from past mistakes in a closed-loop environment. Closed-loop testing in CARLA shows that LeapAD outperforms all methods relying solely on camera input, requiring 1-2 orders of magnitude less labeled data. Experiments also demonstrate that as the memory bank expands, the Heuristic Process with only 1.8B parameters can inherit the knowledge from a GPT-4 powered Analytic Process and achieve continuous performance improvement. Code will be released at https://github.com/PJLab-ADG/LeapAD.

5/27/2024

cs.RO cs.AI cs.CV

💬

HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model

Mustafa Yildirim, Barkin Dagda, Saber Fallah

Autonomous driving is a complex task which requires advanced decision making and control algorithms. Understanding the rationale behind the autonomous vehicles' decision is crucial to ensure their safe and effective operation on highway driving. This study presents a novel approach, HighwayLLM, which harnesses the reasoning capabilities of large language models (LLMs) to predict the future waypoints for ego-vehicle's navigation. Our approach also utilizes a pre-trained Reinforcement Learning (RL) model to serve as a high-level planner, making decisions on appropriate meta-level actions. The HighwayLLM combines the output from the RL model and the current state information to make safe, collision-free, and explainable predictions for the next states, thereby constructing a trajectory for the ego-vehicle. Subsequently, a PID-based controller guides the vehicle to the waypoints predicted by the LLM agent. This integration of LLM with RL and PID enhances the decision-making process and provides interpretability for highway autonomous driving.

5/24/2024

cs.RO cs.AI