UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios

2405.03807

Published 5/8/2024 by Reza Mahjourian, Rongbing Mu, Valerii Likhosherstov, Paul Mougin, Xiukun Huang, Joao Messias, Shimon Whiteson

cs.RO cs.LG

UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios

Abstract

This paper introduces UniGen, a novel approach to generating new traffic scenarios for evaluating and improving autonomous driving software through simulation. Our approach models all driving scenario elements in a unified model: the position of new agents, their initial state, and their future motion trajectories. By predicting the distributions of all these variables from a shared global scenario embedding, we ensure that the final generated scenario is fully conditioned on all available context in the existing scene. Our unified modeling approach, combined with autoregressive agent injection, conditions the placement and motion trajectory of every new agent on all existing agents and their trajectories, leading to realistic scenarios with low collision rates. Our experimental results show that UniGen outperforms prior state of the art on the Waymo Open Motion Dataset.

Create account to get full access

Overview

The paper proposes a unified model called UniGen for generating autonomous driving scenarios by modeling the initial states and trajectories of agents (vehicles, pedestrians, etc.).
UniGen aims to capture the complex interactions and behaviors of agents in diverse traffic situations, enabling the generation of realistic and diverse scenarios for autonomous driving development and testing.
The paper introduces a novel generative model that can jointly model the initial state and trajectory of each agent, accounting for their interactions and environmental factors.
Experimental results on public datasets demonstrate the effectiveness of UniGen in generating diverse and challenging autonomous driving scenarios.

Plain English Explanation

The paper introduces a new model called UniGen that can generate realistic and diverse scenarios for autonomous driving. These scenarios are created by modeling the initial states (positions, speeds, etc.) and trajectories of different agents, such as vehicles and pedestrians, in a unified way.

The key idea behind UniGen is to capture the complex interactions and behaviors of these agents as they navigate through traffic. For example, a car may need to slow down to avoid a pedestrian crossing the road, or two cars may need to coordinate their movements to safely merge into the same lane.

By modeling these interactions, UniGen can create a wide variety of scenarios that autonomous driving systems can use for development and testing. This is important because real-world traffic situations can be very unpredictable and challenging, and autonomous vehicles need to be extensively tested in a wide range of scenarios to ensure they can handle them safely.

The paper demonstrates that UniGen can generate diverse and realistic autonomous driving scenarios that are more challenging and representative of real-world conditions than scenarios generated by previous methods. This could help accelerate the development and deployment of safer and more capable autonomous driving systems.

Technical Explanation

The paper proposes a unified generative model called UniGen for creating autonomous driving scenarios by jointly modeling the initial states and trajectories of different agents. UniGen builds on previous work in GENAD, TrajeGlish, and PregSU, which have addressed various aspects of traffic scenario generation.

The key innovation of UniGen is its ability to model the complex interactions and interdependencies between agents as they navigate through a traffic scene. This is achieved by using a novel generative model that can simultaneously predict the initial state and trajectory of each agent, conditioned on the states and trajectories of the other agents.

The authors evaluate UniGen on public datasets and compare its performance to existing methods, such as Versatile and AutoGenesisAgent. The results demonstrate that UniGen can generate more diverse and challenging autonomous driving scenarios that better reflect real-world traffic conditions.

Critical Analysis

The paper presents a compelling approach to generating realistic and diverse autonomous driving scenarios. By jointly modeling the initial states and trajectories of agents, UniGen captures the complex interactions and interdependencies that are central to realistic traffic situations.

One potential limitation of the work is that it relies on the availability of high-quality datasets to train the generative model. The performance of UniGen may be affected by the quality and diversity of the training data, and further research may be needed to address scenarios that are underrepresented in existing datasets.

Additionally, while the paper demonstrates the effectiveness of UniGen in generating challenging scenarios, it does not provide a comprehensive analysis of the suitability of these scenarios for autonomous driving development and testing. Further research may be needed to understand how well the generated scenarios translate to real-world performance and safety improvements for autonomous vehicles.

Overall, the UniGen model represents a significant step forward in the field of autonomous driving scenario generation, and the authors' work provides a strong foundation for future research in this area.

Conclusion

The UniGen paper presents a novel unified model for generating diverse and realistic autonomous driving scenarios by jointly modeling the initial states and trajectories of different agents. By capturing the complex interactions and behaviors of agents in traffic, UniGen can create scenarios that are more representative of real-world conditions, which is critical for the development and testing of safe and capable autonomous driving systems.

The paper's experimental results demonstrate the effectiveness of UniGen compared to existing methods, suggesting that it could be a valuable tool for accelerating the progress of autonomous driving technology. While the work has some limitations, it represents an important contribution to the field and opens up new avenues for further research and development in this area.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent

Yi Xu, Yun Fu

Understanding multi-agent behavior is critical across various fields. The conventional approach involves analyzing agent movements through three primary tasks: trajectory prediction, imputation, and spatial-temporal recovery. Considering the unique input formulation and constraint of these tasks, most existing methods are tailored to address only one specific task. However, in real-world applications, these scenarios frequently occur simultaneously. Consequently, methods designed for one task often fail to adapt to others, resulting in performance drops. To overcome this limitation, we propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs, adaptable to diverse scenarios. Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction. We further extend recent successful State Space Models (SSMs), particularly the Mamba model, into a Bidirectional Temporal Mamba to effectively capture temporal dependencies. Additionally, we incorporate a Bidirectional Temporal Scaled (BTS) module to comprehensively scan trajectories while maintaining the temporal missing relationships within the sequence. We curate and benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation. Extensive experiments demonstrate the superior performance of our model. To the best of our knowledge, this is the first work that addresses this unified problem through a versatile generative framework, thereby enhancing our understanding of multi-agent movement. Our datasets, code, and model weights are available at https://github.com/colorfulfuture/UniTraj-pytorch.

5/29/2024

cs.CV

GenAD: Generative End-to-End Autonomous Driving

Wenzhao Zheng, Ruiqi Song, Xianda Guo, Chenming Zhang, Long Chen

Directly producing planning results from raw sensors has been a long-desired solution for autonomous driving and has attracted increasing attention recently. Most existing end-to-end autonomous driving methods factorize this problem into perception, motion prediction, and planning. However, we argue that the conventional progressive pipeline still cannot comprehensively model the entire traffic evolution process, e.g., the future interaction between the ego car and other traffic participants and the structural trajectory prior. In this paper, we explore a new paradigm for end-to-end autonomous driving, where the key is to predict how the ego car and the surroundings evolve given past scenes. We propose GenAD, a generative framework that casts autonomous driving into a generative modeling problem. We propose an instance-centric scene tokenizer that first transforms the surrounding scenes into map-aware instance tokens. We then employ a variational autoencoder to learn the future trajectory distribution in a structural latent space for trajectory prior modeling. We further adopt a temporal model to capture the agent and ego movements in the latent space to generate more effective future trajectories. GenAD finally simultaneously performs motion prediction and planning by sampling distributions in the learned structural latent space conditioned on the instance tokens and using the learned temporal model to generate futures. Extensive experiments on the widely used nuScenes benchmark show that the proposed GenAD achieves state-of-the-art performance on vision-centric end-to-end autonomous driving with high efficiency. Code: https://github.com/wzzheng/GenAD.

4/9/2024

cs.CV

Trajeglish: Traffic Modeling as Next-Token Prediction

Jonah Philion, Xue Bin Peng, Sanja Fidler

A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. In pursuit of this functionality, we apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Using a simple data-driven tokenization scheme, we discretize trajectories to centimeter-level resolution using a small vocabulary. We then model the multi-agent sequence of discrete motion tokens with a GPT-like encoder-decoder that is autoregressive in time and takes into account intra-timestep interaction between agents. Scenarios sampled from our model exhibit state-of-the-art realism; our model tops the Waymo Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%. We ablate our modeling choices in full autonomy and partial autonomy settings, and show that the representations learned by our model can quickly be adapted to improve performance on nuScenes. We additionally evaluate the scalability of our model with respect to parameter count and dataset size, and use density estimates from our model to quantify the saliency of context length and intra-timestep interaction for the traffic modeling task.

4/16/2024

cs.LG cs.RO

📊

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, Chuang Gan

We present RoboGen, a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. RoboGen leverages the latest advancements in foundation and generative models. Instead of directly using or adapting these models to produce policies or low-level actions, we advocate for a generative scheme, which uses these models to automatically generate diversified tasks, scenes, and training supervisions, thereby scaling up robotic skill learning with minimal human supervision. Our approach equips a robotic agent with a self-guided propose-generate-learn cycle: the agent first proposes interesting tasks and skills to develop, and then generates corresponding simulation environments by populating pertinent objects and assets with proper spatial configurations. Afterwards, the agent decomposes the proposed high-level task into sub-tasks, selects the optimal learning approach (reinforcement learning, motion planning, or trajectory optimization), generates required training supervision, and then learns policies to acquire the proposed skill. Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics. Our fully generative pipeline can be queried repeatedly, producing an endless stream of skill demonstrations associated with diverse tasks and environments.

6/18/2024

cs.RO cs.AI cs.CV cs.LG