Hierarchical Generative Adversarial Imitation Learning with Mid-level Input Generation for Autonomous Driving on Urban Environments

2302.04823

Published 4/3/2024 by Gustavo Claudio Karl Couto, Eric Aislan Antonelo

🛸

Abstract

Deriving robust control policies for realistic urban navigation scenarios is not a trivial task. In an end-to-end approach, these policies must map high-dimensional images from the vehicle's cameras to low-level actions such as steering and throttle. While pure Reinforcement Learning (RL) approaches are based exclusively on engineered rewards, Generative Adversarial Imitation Learning (GAIL) agents learn from expert demonstrations while interacting with the environment, which favors GAIL on tasks for which a reward signal is difficult to derive, such as autonomous driving. However, training deep networks directly from raw images on RL tasks is known to be unstable and troublesome. To deal with that, this work proposes a hierarchical GAIL-based architecture (hGAIL) which decouples representation learning from the driving task to solve the autonomous navigation of a vehicle. The proposed architecture consists of two modules: a GAN (Generative Adversarial Net) which generates an abstract mid-level input representation, which is the Bird's-Eye View (BEV) from the surroundings of the vehicle; and the GAIL which learns to control the vehicle based on the BEV predictions from the GAN as input. hGAIL is able to learn both the policy and the mid-level representation simultaneously as the agent interacts with the environment. Our experiments made in the CARLA simulation environment have shown that GAIL exclusively from cameras without BEV) fails to even learn the task, while hGAIL, after training exclusively on one city, was able to autonomously navigate successfully in 98% of the intersections of a new city not used in training phase.

Create account to get full access

Overview

Deriving robust control policies for autonomous vehicle navigation in urban environments is challenging
Current end-to-end approaches that map camera images directly to low-level vehicle actions struggle with instability and unreliability
This paper proposes a hierarchical architecture that decouples representation learning from the driving task, using a Generative Adversarial Network (GAN) to generate a bird's-eye view (BEV) representation that is then used as input to a Generative Adversarial Imitation Learning (GAIL) module to learn the driving policy

Plain English Explanation

Autonomous vehicles need to be able to navigate complex urban environments safely and reliably. This is a difficult problem because the vehicle needs to make quick decisions based on the information it sees through its cameras. Directly translating those camera images into low-level actions like steering and throttle has proven to be unstable and prone to failure.

The key insight of this paper is to break the problem into two parts. First, a Generative Adversarial Network (GAN) is used to generate a more abstract, bird's-eye view representation of the vehicle's surroundings. This mid-level representation captures the essential elements the vehicle needs to navigate, without getting bogged down in the raw camera data.

Then, a Generative Adversarial Imitation Learning (GAIL) module uses this bird's-eye view as input to learn how to control the vehicle. By learning from expert demonstrations while interacting with the environment, GAIL can pick up on the nuances of good driving behavior without needing to explicitly define all the rules.

This hierarchical approach allows the system to learn both the representation and the control policy in a stable and reliable way, overcoming the limitations of prior end-to-end approaches. The researchers showed that their method, which they call hGAIL, was able to successfully navigate a new city it had not been trained on, demonstrating the potential for robust autonomous driving in complex urban environments.

Technical Explanation

The paper proposes a hierarchical Generative Adversarial Imitation Learning (hGAIL) architecture to address the challenges of autonomous navigation in urban environments. The key components are:

GAN for Representation Learning: A Generative Adversarial Network (GAN) is used to learn an abstract, bird's-eye view (BEV) representation of the vehicle's surroundings from the raw camera images. This mid-level representation captures the essential spatial and semantic information needed for navigation, without the instability of directly mapping images to actions.
GAIL for Policy Learning: The BEV representation generated by the GAN is then used as input to a Generative Adversarial Imitation Learning (GAIL) module. GAIL learns the driving policy by imitating expert demonstrations while interacting with the environment. This allows the system to learn nuanced driving behavior without an explicitly defined reward function.
Simultaneous Representation and Policy Learning: The hGAIL architecture trains both the GAN and GAIL modules concurrently, with the GAN providing the necessary input representation for the GAIL policy learner. This allows the system to learn both the mid-level representation and the driving policy in an integrated, end-to-end manner.

The researchers evaluated their hGAIL approach in the CARLA simulation environment. They found that using GAIL alone with only camera images as input failed to learn the task, while hGAIL was able to successfully navigate 98% of intersections in a new city that was not used during training, demonstrating the robustness of the hierarchical approach.

Critical Analysis

The paper presents a compelling approach to address the challenges of autonomous navigation in complex urban environments. The use of a hierarchical architecture that separates representation learning from policy learning is a clever way to overcome the instability and unreliability of end-to-end approaches.

One potential limitation is that the evaluation was conducted entirely in simulation, without any real-world testing. While the CARLA environment is a widely used platform for autonomous driving research, the fidelity of the simulation to real-world conditions remains an open question.

Additionally, the paper does not explore the potential for further optimization or refinement of the hGAIL architecture. For example, it may be possible to incorporate additional sensor modalities, such as LIDAR or radar, to enhance the vehicle's perception capabilities. The researchers also do not discuss the computational complexity or inference latency of the proposed system, which would be important considerations for real-world deployment.

Overall, the hGAIL approach represents a promising step forward in the development of robust and reliable autonomous driving systems. However, further research and real-world validation would be necessary to fully assess the practical implications and limitations of this work.

Conclusion

This paper presents a hierarchical Generative Adversarial Imitation Learning (hGAIL) architecture that decouples representation learning from policy learning to tackle the challenges of autonomous navigation in complex urban environments. By using a Generative Adversarial Network (GAN) to generate a bird's-eye view representation and then training a GAIL module to learn the driving policy from that input, the system is able to overcome the instability and unreliability of previous end-to-end approaches.

The researchers demonstrate the effectiveness of their hGAIL approach through simulation experiments, showing that it can successfully navigate a new city that was not used during training. This represents an important advance in the field of autonomous driving, with the potential to enable more robust and reliable navigation in real-world urban settings. However, further research and validation, including testing in the physical world, would be needed to fully assess the practicality and limitations of this technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Diffusion-Reward Adversarial Imitation Learning

Chun-Mao Lai, Hsiang-Chun Wang, Ping-Chun Hsieh, Yu-Chiang Frank Wang, Min-Hung Chen, Shao-Hua Sun

Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning, employing a generator policy learning to imitate expert behaviors and discriminator learning to distinguish the expert demonstrations from agent trajectories. Despite its encouraging results, GAIL training is often brittle and unstable. Inspired by the recent dominance of diffusion models in generative modeling, this work proposes Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL, aiming to yield more precise and smoother rewards for policy learning. Specifically, we propose a diffusion discriminative classifier to construct an enhanced discriminator; then, we design diffusion rewards based on the classifier's output for policy learning. We conduct extensive experiments in navigation, manipulation, and locomotion, verifying DRAIL's effectiveness compared to prior imitation learning methods. Moreover, additional experimental results demonstrate the generalizability and data efficiency of DRAIL. Visualized learned reward functions of GAIL and DRAIL suggest that DRAIL can produce more precise and smoother rewards.

5/28/2024

cs.LG cs.AI cs.RO

GAD-Generative Learning for HD Map-Free Autonomous Driving

Weijian Sun, Yanbo Jia, Qi Zeng, Zihao Liu, Jiang Liao, Yue Li, Xianfeng Li

Deep-learning-based techniques have been widely adopted for autonomous driving software stacks for mass production in recent years, focusing primarily on perception modules, with some work extending this method to prediction modules. However, the downstream planning and control modules are still designed with hefty handcrafted rules, dominated by optimization-based methods such as quadratic programming or model predictive control. This results in a performance bottleneck for autonomous driving systems in that corner cases simply cannot be solved by enumerating hand-crafted rules. We present a deep-learning-based approach that brings prediction, decision, and planning modules together with the attempt to overcome the rule-based methods' deficiency in real-world applications of autonomous driving, especially for urban scenes. The DNN model we proposed is solely trained with 10 hours of human driver data, and it supports all mass-production ADAS features available on the market to date. This method is deployed onto a Jiyue test car with no modification to its factory-ready sensor set and compute platform. the feasibility, usability, and commercial potential are demonstrated in this article.

6/3/2024

cs.RO cs.CV

AD-H: Autonomous Driving with Hierarchical Agents

Zaibin Zhang, Shiyu Tang, Yuanhang Zhang, Talas Fu, Yifan Wang, Yang Liu, Dong Wang, Jing Shao, Lijun Wang, Huchuan Lu

Due to the impressive capabilities of multimodal large language models (MLLMs), recent works have focused on employing MLLM-based agents for autonomous driving in large-scale and dynamic environments. However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails to fully harness their emergent powers. As a result, the generalizability of these methods is highly restricted by autonomous driving datasets used during fine-tuning. To tackle this challenge, we propose to connect high-level instructions and low-level control signals with mid-level language-driven commands, which are more fine-grained than high-level instructions but more universal and explainable than control signals, and thus can effectively bridge the gap in between. We implement this idea through a hierarchical multi-agent driving system named AD-H, including a MLLM planner for high-level reasoning and a lightweight controller for low-level execution. The hierarchical design liberates the MLLM from low-level control signal decoding and therefore fully releases their emergent capability in high-level perception, reasoning, and planning. We build a new dataset with action hierarchy annotations. Comprehensive closed-loop evaluations demonstrate several key advantages of our proposed AD-H system. First, AD-H can notably outperform state-of-the-art methods in achieving exceptional driving performance, even exhibiting self-correction capabilities during vehicle operation, a scenario not encountered in the training dataset. Second, AD-H demonstrates superior generalization under long-horizon instructions and novel environmental conditions, significantly surpassing current state-of-the-art methods. We will make our data and code publicly accessible at https://github.com/zhangzaibin/AD-H

6/6/2024

cs.CV

CIMRL: Combining IMitiation and Reinforcement Learning for Safe Autonomous Driving

Jonathan Booher, Khashayar Rohanimanesh, Junhong Xu, Vladislav Isenbaev, Ashwin Balakrishna, Ishan Gupta, Wei Liu, Aleksandr Petiushko

Modern approaches to autonomous driving rely heavily on learned components trained with large amounts of human driving data via imitation learning. However, these methods require large amounts of expensive data collection and even then face challenges with safely handling long-tail scenarios and compounding errors over time. At the same time, pure Reinforcement Learning (RL) methods can fail to learn performant policies in sparse, constrained, and challenging-to-define reward settings like driving. Both of these challenges make deploying purely cloned policies in safety critical applications like autonomous vehicles challenging. In this paper we propose Combining IMitation and Reinforcement Learning (CIMRL) approach - a framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. CIMRL does not require extensive reward specification and improves on the closed loop behavior of pure cloning methods. By combining RL and imitation, we demonstrate that our method achieves state-of-the-art results in closed loop simulation driving benchmarks.

6/27/2024

cs.LG