ControlMTR: Control-Guided Motion Transformer with Scene-Compliant Intention Points for Feasible Motion Prediction

2404.10295

Published 4/17/2024 by Jiawei Sun, Chengran Yuan, Shuo Sun, Shanze Wang, Yuhang Han, Shuailei Ma, Zefan Huang, Anthony Wong, Keng Peng Tee, Marcelo H. Ang Jr

cs.RO

ControlMTR: Control-Guided Motion Transformer with Scene-Compliant Intention Points for Feasible Motion Prediction

Abstract

The ability to accurately predict feasible multimodal future trajectories of surrounding traffic participants is crucial for behavior planning in autonomous vehicles. The Motion Transformer (MTR), a state-of-the-art motion prediction method, alleviated mode collapse and instability during training and enhanced overall prediction performance by replacing conventional dense future endpoints with a small set of fixed prior motion intention points. However, the fixed prior intention points make the MTR multi-modal prediction distribution over-scattered and infeasible in many scenarios. In this paper, we propose the ControlMTR framework to tackle the aforementioned issues by generating scene-compliant intention points and additionally predicting driving control commands, which are then converted into trajectories by a simple kinematic model with soft constraints. These control-generated trajectories will guide the directly predicted trajectories by an auxiliary loss function. Together with our proposed scene-compliant intention points, they can effectively restrict the prediction distribution within the road boundaries and suppress infeasible off-road predictions while enhancing prediction performance. Remarkably, without resorting to additional model ensemble techniques, our method surpasses the baseline MTR model across all performance metrics, achieving notable improvements of 5.22% in SoftmAP and a 4.15% reduction in MissRate. Our approach notably results in a 41.85% reduction in the cross-boundary rate of the MTR, effectively ensuring that the prediction distribution is confined within the drivable area.

Create account to get full access

Overview

The paper presents a novel motion prediction model called ControlMTR that uses a control-guided motion transformer to generate feasible human motion trajectories that are compliant with the scene.
ControlMTR introduces "intention points" that guide the model to generate motion predictions that are aligned with the environment and obstacles.
The model is evaluated on several benchmark datasets and shows improved performance over previous state-of-the-art approaches.

Plain English Explanation

The researchers have developed a new AI system called ControlMTR that can predict how people will move through an environment. This is an important task for applications like self-driving cars, which need to anticipate pedestrian movements.

ControlMTR works by using a type of machine learning model called a transformer, which is good at processing sequence data like motion trajectories. The key innovation is that ControlMTR also takes into account information about the surrounding environment, like obstacles or walls. It does this by predicting not just the motion trajectory, but also "intention points" - key locations in the scene that the person is likely aiming for as they move.

By considering both the person's motion and their likely intentions based on the environment, ControlMTR is able to generate more realistic and feasible predictions of how someone will move through a given space. The researchers show that ControlMTR outperforms previous motion prediction models on standard benchmark tests.

This work could help enable more robust and reliable motion prediction capabilities for a variety of applications, from self-driving cars to robotics and animation. By considering how people's movements are shaped by their environment, ControlMTR represents an important advance in making motion prediction AI systems more practical and applicable to real-world scenarios.

Technical Explanation

The key innovation in the ControlMTR model is the inclusion of "intention points" - locations in the scene that the model predicts the person is aiming to reach as they move through the environment. This allows ControlMTR to generate motion trajectories that are more aligned with the surrounding scene and obstacles, rather than just extrapolating the person's current motion.

ControlMTR uses a transformer-based architecture, which has shown strong performance on sequential prediction tasks like human motion forecasting. The model takes as input the person's past motion trajectory, as well as a representation of the scene geometry and obstacles. It then outputs both the predicted future motion trajectory and the associated intention points.

The intention points are predicted using a separate sub-network that learns to identify key locations in the scene that the person is likely to be targeting. This scene-compliant intention information is then used to guide the motion trajectory prediction, ensuring that the generated motions are feasible and don't violate the constraints of the environment.

The researchers evaluate ControlMTR on several benchmark datasets for human motion forecasting, including [transfer-learning-study-motion-transformer-based-trajectory], [trailblazer-trajectory-control-diffusion-based-video-generation], [robust-human-motion-forecasting-using-transformer-based], and [trajeglish-traffic-modeling-as-next-token-prediction]. The results show that ControlMTR outperforms previous state-of-the-art methods, demonstrating the value of its control-guided, scene-aware approach to motion prediction.

Critical Analysis

One potential limitation of the ControlMTR approach is that it relies on having a detailed 3D representation of the scene geometry and obstacles. In real-world scenarios, this information may not always be readily available or easy to obtain. The researchers acknowledge this and suggest that incorporating techniques like [model-predictive-trajectory-generation-autonomous-aerial-search] to infer the scene layout from sensor data could help address this challenge.

Additionally, while the experiments show that ControlMTR can generate more feasible motion trajectories, the paper does not provide a thorough analysis of how these trajectories compare to actual human behavior. Further research may be needed to validate the realism and naturalness of the model's predictions, especially in complex, crowded environments.

Overall, the ControlMTR model represents an intriguing advance in motion prediction, demonstrating the value of incorporating scene-level understanding to generate more grounded and practical forecasts. As the researchers continue to refine and validate the approach, it could have important implications for a wide range of applications that rely on accurate human motion prediction.

Conclusion

The ControlMTR model presented in this paper introduces a novel approach to human motion forecasting that takes into account the surrounding scene and environment. By predicting not just the future motion trajectory, but also the person's likely "intention points" in the scene, ControlMTR is able to generate more feasible and realistic predictions compared to previous state-of-the-art methods.

This work highlights the importance of incorporating contextual information about the environment when tasked with modeling human behavior. By considering how people's movements are shaped by obstacles, boundaries, and other scene elements, ControlMTR represents a significant step forward in making motion prediction systems more robust and applicable to real-world scenarios.

As the researchers continue to refine and build upon this approach, it could have important implications for a wide range of applications, from self-driving cars and robotics to animation and virtual reality. The ability to accurately forecast human motion in a scene-aware manner is a critical capability for enabling more natural, responsive, and safe interactions between humans and intelligent systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

📈

New!Model Predictive Simulation Using Structured Graphical Models and Transformers

Xinghua Lou, Meet Dave, Shrinu Kushagra, Miguel Lazaro-Gredilla, Kevin Murphy

We propose an approach to simulating trajectories of multiple interacting agents (road users) based on transformers and probabilistic graphical models (PGMs), and apply it to the Waymo SimAgents challenge. The transformer baseline is based on the MTR model, which predicts multiple future trajectories conditioned on the past trajectories and static road layout features. We then improve upon these generated trajectories using a PGM, which contains factors which encode prior knowledge, such as a preference for smooth trajectories, and avoidance of collisions with static obstacles and other moving agents. We perform (approximate) MAP inference in this PGM using the Gauss-Newton method. Finally we sample $K=32$ trajectories for each of the $N sim 100$ agents for the next $T=8 Delta$ time steps, where $Delta=10$ is the sampling rate per second. Following the Model Predictive Control (MPC) paradigm, we only return the first element of our forecasted trajectories at each step, and then we replan, so that the simulation can constantly adapt to its changing environment. We therefore call our approach Model Predictive Simulation or MPS. We show that MPS improves upon the MTR baseline, especially in safety critical metrics such as collision rate. Furthermore, our approach is compatible with any underlying forecasting model, and does not require extra training, so we believe it is a valuable contribution to the community.

7/1/2024

cs.LG cs.CV

Transfer Learning Study of Motion Transformer-based Trajectory Predictions

Lars Ullrich, Alex McMaster, Knut Graichen

Trajectory planning in autonomous driving is highly dependent on predicting the emergent behavior of other road users. Learning-based methods are currently showing impressive results in simulation-based challenges, with transformer-based architectures technologically leading the way. Ultimately, however, predictions are needed in the real world. In addition to the shifts from simulation to the real world, many vehicle- and country-specific shifts, i.e. differences in sensor systems, fusion and perception algorithms as well as traffic rules and laws, are on the agenda. Since models that can cover all system setups and design domains at once are not yet foreseeable, model adaptation plays a central role. Therefore, a simulation-based study on transfer learning techniques is conducted on basis of a transformer-based model. Furthermore, the study aims to provide insights into possible trade-offs between computational time and performance to support effective transfers into the real world.

4/15/2024

cs.LG cs.RO

🔮

MFTraj: Map-Free, Behavior-Driven Trajectory Prediction for Autonomous Driving

Haicheng Liao, Zhenning Li, Chengyue Wang, Huanming Shen, Bonan Wang, Dongping Liao, Guofa Li, Chengzhong Xu

This paper introduces a trajectory prediction model tailored for autonomous driving, focusing on capturing complex interactions in dynamic traffic scenarios without reliance on high-definition maps. The model, termed MFTraj, harnesses historical trajectory data combined with a novel dynamic geometric graph-based behavior-aware module. At its core, an adaptive structure-aware interactive graph convolutional network captures both positional and behavioral features of road users, preserving spatial-temporal intricacies. Enhanced by a linear attention mechanism, the model achieves computational efficiency and reduced parameter overhead. Evaluations on the Argoverse, NGSIM, HighD, and MoCAD datasets underscore MFTraj's robustness and adaptability, outperforming numerous benchmarks even in data-challenged scenarios without the need for additional information such as HD maps or vectorized maps. Importantly, it maintains competitive performance even in scenarios with substantial missing data, on par with most existing state-of-the-art models. The results and methodology suggest a significant advancement in autonomous driving trajectory prediction, paving the way for safer and more efficient autonomous systems.

5/3/2024

cs.RO cs.AI

Motion Planning under Uncertainty: Integrating Learning-Based Multi-Modal Predictors into Branch Model Predictive Control

Mohamed-Khalil Bouzidi, Bojan Derajic, Daniel Goehring, Joerg Reichardt

In complex traffic environments, autonomous vehicles face multi-modal uncertainty about other agents' future behavior. To address this, recent advancements in learningbased motion predictors output multi-modal predictions. We present our novel framework that leverages Branch Model Predictive Control(BMPC) to account for these predictions. The framework includes an online scenario-selection process guided by topology and collision risk criteria. This efficiently selects a minimal set of predictions, rendering the BMPC realtime capable. Additionally, we introduce an adaptive decision postponing strategy that delays the planner's commitment to a single scenario until the uncertainty is resolved. Our comprehensive evaluations in traffic intersection and random highway merging scenarios demonstrate enhanced comfort and safety through our method.

5/7/2024

cs.RO cs.SY eess.SY