Trajeglish: Traffic Modeling as Next-Token Prediction

2312.04535

Published 4/16/2024 by Jonah Philion, Xue Bin Peng, Sanja Fidler

Trajeglish: Traffic Modeling as Next-Token Prediction

Abstract

A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. In pursuit of this functionality, we apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Using a simple data-driven tokenization scheme, we discretize trajectories to centimeter-level resolution using a small vocabulary. We then model the multi-agent sequence of discrete motion tokens with a GPT-like encoder-decoder that is autoregressive in time and takes into account intra-timestep interaction between agents. Scenarios sampled from our model exhibit state-of-the-art realism; our model tops the Waymo Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%. We ablate our modeling choices in full autonomy and partial autonomy settings, and show that the representations learned by our model can quickly be adapted to improve performance on nuScenes. We additionally evaluate the scalability of our model with respect to parameter count and dataset size, and use density estimates from our model to quantify the saliency of context length and intra-timestep interaction for the traffic modeling task.

Create account to get full access

Overview

Introduces a new approach called "Trajeglish" for learning the language of driving scenarios from traffic data
Aims to enable more realistic and diverse traffic simulation for autonomous driving applications
Proposes an end-to-end framework that can generate diverse traffic scenarios by learning the "grammar" of driving behaviors

Plain English Explanation

The paper presents a new method called "Trajeglish" that aims to capture the "language" of driving behaviors from real-world traffic data. The goal is to enable more realistic and varied traffic simulations, which are crucial for testing and training autonomous driving systems.

Traditionally, traffic simulations have relied on manually-defined rules and models, which can struggle to capture the full complexity and diversity of real-world driving. The researchers behind Trajeglish hypothesize that driving behaviors have an underlying "grammar" that can be learned from observational data, similar to how natural language processing techniques can learn the grammar of human language.

By learning this "grammar of driving," the Trajeglish framework can then generate new, diverse traffic scenarios that mimic the patterns and interactions observed in the real world. This could lead to more comprehensive and representative testing environments for autonomous vehicles, helping to improve their safety and performance in the real world.

Technical Explanation

The Trajeglish framework consists of several key components:

Trajectory Encoding: The researchers first encode vehicle trajectories from traffic data using a motion transformer-based model, which can capture both the spatial and temporal dynamics of vehicle movements.
Interaction Modeling: To model the interactions between vehicles, the framework employs a graph neural network that can learn the underlying "grammar" of how vehicles respond to each other's movements.
Scenario Generation: Using the learned interaction patterns, the framework can then generate new traffic scenarios that are consistent with the observed driving behaviors. This is done through a generative adversarial network that can produce diverse and realistic-looking traffic scenes.

The researchers evaluate the Trajeglish framework on several benchmarks, demonstrating its ability to generate traffic scenarios that are more diverse and representative than traditional rule-based approaches.

Critical Analysis

The Trajeglish framework presents a promising approach to learning the underlying "language" of driving behaviors from observational data. By capturing the complex interactions and patterns in real-world traffic, the system can generate more realistic and varied traffic simulations, which is a crucial need for the development of autonomous driving technologies.

However, the paper does not address some potential limitations of the approach. For example, the framework may struggle to capture rare or extreme driving behaviors that are not well-represented in the training data. Additionally, the reliance on graph neural networks and generative adversarial networks means the system may be sensitive to hyperparameter tuning and could be computationally intensive to train and deploy.

Further research is needed to explore the robustness and scalability of the Trajeglish framework, as well as its ability to generalize to diverse driving scenarios and environments. Incorporating domain knowledge or safety constraints into the scenario generation process could also be an interesting area for future work.

Conclusion

The Trajeglish framework represents an innovative approach to traffic simulation that seeks to learn the underlying "grammar" of driving behaviors from observational data. By capturing the complex interactions and patterns in real-world traffic, the system can generate more diverse and realistic traffic scenarios, which is a crucial need for the development of autonomous driving technologies.

While the paper presents promising results, further research is needed to address potential limitations and explore the broader applicability of the approach. Nonetheless, the Trajeglish framework is a compelling step towards more comprehensive and representative testing environments for autonomous vehicles, with the potential to contribute to the safe and responsible deployment of these transformative technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

SMART: Scalable Multi-agent Real-time Simulation via Next-token Prediction

Wei Wu, Xiaoxin Feng, Ziyan Gao, Yuheng Kan

Data-driven autonomous driving motion generation tasks are frequently impacted by the limitations of dataset size and the domain gap between datasets, which precludes their extensive application in real-world scenarios. To address this issue, we introduce SMART, a novel autonomous driving motion generation paradigm that models vectorized map and agent trajectory data into discrete sequence tokens. These tokens are then processed through a decoder-only transformer architecture to train for the next token prediction task across spatial-temporal series. This GPT-style method allows the model to learn the motion distribution in real driving scenarios. SMART achieves state-of-the-art performance across most of the metrics on the generative Sim Agents challenge, ranking 1st on the leaderboards of Waymo Open Motion Dataset (WOMD), demonstrating remarkable inference speed. Moreover, SMART represents the generative model in the autonomous driving motion domain, exhibiting zero-shot generalization capabilities: Using only the NuPlan dataset for training and WOMD for validation, SMART achieved a competitive score of 0.71 on the Sim Agents challenge. Lastly, we have collected over 1 billion motion tokens from multiple datasets, validating the model's scalability. These results suggest that SMART has initially emulated two important properties: scalability and zero-shot generalization, and preliminarily meets the needs of large-scale real-time simulation applications. We have released all the code to promote the exploration of models for motion generation in the autonomous driving field.

5/27/2024

cs.RO cs.CV

BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

Zikang Zhou, Haibo Hu, Xinhong Chen, Jianping Wang, Nan Guan, Kui Wu, Yung-Hui Li, Yu-Kai Huang, Chun Jason Xue

Simulating realistic interactions among traffic agents is crucial for efficiently validating the safety of autonomous driving systems. Existing leading simulators primarily use an encoder-decoder structure to encode the historical trajectories for future simulation. However, such a paradigm complicates the model architecture, and the manual separation of history and future trajectories leads to low data utilization. To address these challenges, we propose Behavior Generative Pre-trained Transformers (BehaviorGPT), a decoder-only, autoregressive architecture designed to simulate the sequential motion of multiple agents. Crucially, our approach discards the traditional separation between history and future, treating each time step as the current one, resulting in a simpler, more parameter- and data-efficient design that scales seamlessly with data and computation. Additionally, we introduce the Next-Patch Prediction Paradigm (NP3), which enables models to reason at the patch level of trajectories and capture long-range spatial-temporal interactions. BehaviorGPT ranks first across several metrics on the Waymo Sim Agents Benchmark, demonstrating its exceptional performance in multi-agent and agent-map interactions. We outperformed state-of-the-art models with a realism score of 0.741 and improved the minADE metric to 1.540, with an approximately 91.6% reduction in model parameters.

5/28/2024

cs.AI cs.LG cs.RO

🔄

TSDiT: Traffic Scene Diffusion Models With Transformers

Chen Yang, Tianyu Shi

In this paper, we introduce a novel approach to trajectory generation for autonomous driving, combining the strengths of Diffusion models and Transformers. First, we use the historical trajectory data for efficient preprocessing and generate action latent using a diffusion model with DiT(Diffusion with Transformers) Blocks to increase scene diversity and stochasticity of agent actions. Then, we combine action latent, historical trajectories and HD Map features and put them into different transformer blocks. Finally, we use a trajectory decoder to generate future trajectories of agents in the traffic scene. The method exhibits superior performance in generating smooth turning trajectories, enhancing the model's capability to fit complex steering patterns. The experimental results demonstrate the effectiveness of our method in producing realistic and diverse trajectories, showcasing its potential for application in autonomous vehicle navigation systems.

5/7/2024

cs.RO

🛸

Language-Driven Interactive Traffic Trajectory Generation

Junkai Xia, Chenxin Xu, Qingyao Xu, Chen Xie, Yanfeng Wang, Siheng Chen

Realistic trajectory generation with natural language control is pivotal for advancing autonomous vehicle technology. However, previous methods focus on individual traffic participant trajectory generation, thus failing to account for the complexity of interactive traffic dynamics. In this work, we propose InteractTraj, the first language-driven traffic trajectory generator that can generate interactive traffic trajectories. InteractTraj interprets abstract trajectory descriptions into concrete formatted interaction-aware numerical codes and learns a mapping between these formatted codes and the final interactive trajectories. To interpret language descriptions, we propose a language-to-code encoder with a novel interaction-aware encoding strategy. To produce interactive traffic trajectories, we propose a code-to-trajectory decoder with interaction-aware feature aggregation that synergizes vehicle interactions with the environmental map and the vehicle moves. Extensive experiments show our method demonstrates superior performance over previous SoTA methods, offering a more realistic generation of interactive traffic trajectories with high controllability via diverse natural language commands. Our code is available at https://github.com/X1a-jk/InteractTraj.git

5/27/2024

cs.AI cs.RO