Synthetic Data Generation Framework, Dataset, and Efficient Deep Model for Pedestrian Intention Prediction

Read original: arXiv:2401.06757 - Published 6/18/2024 by Muhammad Naveed Riaz, Maciej Wielgosz, Abel Garcia Romera, Antonio M. Lopez

Synthetic Data Generation Framework, Dataset, and Efficient Deep Model for Pedestrian Intention Prediction

Introduction

This research paper presents a synthetic data generation framework, a dataset, and an efficient deep model for predicting pedestrian intentions. The goal is to address the challenge of accurately forecasting the future actions of pedestrians, which is crucial for developing safe and reliable autonomous vehicles.

Related Work

Pedestrian intention prediction

Predicting pedestrian intentions is a complex task that has been the focus of extensive research. Existing approaches have explored various techniques, such as Ki-GAN: Knowledge-Informed Generative Adversarial Networks, Attention-Aware Social Graph Transformer Networks, and evaluating pedestrian trajectory prediction methods. These methods aim to leverage contextual information, social dynamics, and other relevant cues to forecast pedestrian movements.

Technical Explanation

The researchers develop a synthetic data generation framework to create a large-scale dataset for pedestrian intention prediction. They utilize a Cognitive Internet of Vulnerable Road Users approach to model the complex interactions between pedestrians, vehicles, and the environment. The generated dataset is used to train an efficient deep learning model for predicting pedestrian intentions.

The proposed model incorporates an attention mechanism to capture relevant contextual information, such as the pedestrian's pose, velocity, and surrounding environment. The model is designed to be computationally efficient, making it suitable for real-time applications in autonomous vehicles.

Critical Analysis

The authors acknowledge the limitations of their approach, noting that the synthetic dataset may not fully capture the complexities of real-world pedestrian behavior. Additionally, the generalizability of the model to different environments and scenarios requires further evaluation.

While the proposed framework and dataset represent a valuable contribution to the field, there is still room for improvement in terms of incorporating more realistic pedestrian behaviors and expanding the model's ability to handle diverse scenarios, such as Generative End-to-End Autonomous Driving.

Conclusion

This research paper presents a comprehensive framework for generating synthetic data and training an efficient deep learning model for pedestrian intention prediction. The authors have made a notable effort to address the challenges in this critical area for autonomous vehicle development. The proposed approach holds promise for enhancing the safety and reliability of self-driving cars, but further research is needed to refine the techniques and expand their applicability to real-world scenarios.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Synthetic Data Generation Framework, Dataset, and Efficient Deep Model for Pedestrian Intention Prediction

Muhammad Naveed Riaz, Maciej Wielgosz, Abel Garcia Romera, Antonio M. Lopez

Pedestrian intention prediction is crucial for autonomous driving. In particular, knowing if pedestrians are going to cross in front of the ego-vehicle is core to performing safe and comfortable maneuvers. Creating accurate and fast models that predict such intentions from sequential images is challenging. A factor contributing to this is the lack of datasets with diverse crossing and non-crossing (C/NC) scenarios. We address this scarceness by introducing a framework, named ARCANE, which allows programmatically generating synthetic datasets consisting of C/NC video clip samples. As an example, we use ARCANE to generate a large and diverse dataset named PedSynth. We will show how PedSynth complements widely used real-world datasets such as JAAD and PIE, so enabling more accurate models for C/NC prediction. Considering the onboard deployment of C/NC prediction models, we also propose a deep model named PedGNN, which is fast and has a very low memory footprint. PedGNN is based on a GNN-GRU architecture that takes a sequence of pedestrian skeletons as input to predict crossing intentions.

6/18/2024

Gating Syn-to-Real Knowledge for Pedestrian Crossing Prediction in Safe Driving

Jie Bai, Jianwu Fang, Yisheng Lv, Chen Lv, Jianru Xue, Zhengguo Li

Pedestrian Crossing Prediction (PCP) in driving scenes plays a critical role in ensuring the safe operation of intelligent vehicles. Due to the limited observations of pedestrian crossing behaviors in typical situations, recent studies have begun to leverage synthetic data with flexible variation to boost prediction performance, employing domain adaptation frameworks. However, different domain knowledge has distinct cross-domain distribution gaps, which necessitates suitable domain knowledge adaption ways for PCP tasks. In this work, we propose a Gated Syn-to-Real Knowledge transfer approach for PCP (Gated-S2R-PCP), which has two aims: 1) designing the suitable domain adaptation ways for different kinds of crossing-domain knowledge, and 2) transferring suitable knowledge for specific situations with gated knowledge fusion. Specifically, we design a framework that contains three domain adaption methods including style transfer, distribution approximation, and knowledge distillation for various information, such as visual, semantic, depth, location, etc. A Learnable Gated Unit (LGU) is employed to fuse suitable cross-domain knowledge to boost pedestrian crossing prediction. We construct a new synthetic benchmark S2R-PCP-3181 with 3181 sequences (489,740 frames) which contains the pedestrian locations, RGB frames, semantic images, and depth images. With the synthetic S2R-PCP-3181, we transfer the knowledge to two real challenging datasets of PIE and JAAD, and superior PCP performance is obtained to the state-of-the-art methods.

9/12/2024

Context-aware Multi-task Learning for Pedestrian Intent and Trajectory Prediction

Farzeen Munir, Tomasz Piotr Kucner

The advancement of socially-aware autonomous vehicles hinges on precise modeling of human behavior. Within this broad paradigm, the specific challenge lies in accurately predicting pedestrian's trajectory and intention. Traditional methodologies have leaned heavily on historical trajectory data, frequently overlooking vital contextual cues such as pedestrian-specific traits and environmental factors. Furthermore, there's a notable knowledge gap as trajectory and intention prediction have largely been approached as separate problems, despite their mutual dependence. To bridge this gap, we introduce PTINet (Pedestrian Trajectory and Intention Prediction Network), which jointly learns the trajectory and intention prediction by combining past trajectory observations, local contextual features (individual pedestrian behaviors), and global features (signs, markings etc.). The efficacy of our approach is evaluated on widely used public datasets: JAAD and PIE, where it has demonstrated superior performance over existing state-of-the-art models in trajectory and intention prediction. The results from our experiments and ablation studies robustly validate PTINet's effectiveness in jointly exploring intention and trajectory prediction for pedestrian behaviour modelling. The experimental evaluation indicates the advantage of using global and local contextual features for pedestrian trajectory and intention prediction. The effectiveness of PTINet in predicting pedestrian behavior paves the way for the development of automated systems capable of seamlessly interacting with pedestrians in urban settings.

7/25/2024

Toward Pedestrian Head Tracking: A Benchmark Dataset and an Information Fusion Network

Kailai Sun, Xinwei Wang, Shaobo Liu, Qianchuan Zhao, Gao Huang, Chang Liu

Pedestrian detection and tracking in crowded video sequences have a wide range of applications, including autonomous driving, robot navigation and pedestrian flow surveillance. However, detecting and tracking pedestrians in high-density crowds face many challenges, including intra-class occlusions, complex motions, and diverse poses. Although deep learning models have achieved remarkable progress in head detection, head tracking datasets and methods are extremely lacking. Existing head datasets have limited coverage of complex pedestrian flows and scenes (e.g., pedestrian interactions, occlusions, and object interference). It is of great importance to develop new head tracking datasets and methods. To address these challenges, we present a Chinese Large-scale Cross-scene Pedestrian Head Tracking dataset (Cchead) and a Multi-Source Information Fusion Network (MIFN). Our dataset has features that are of considerable interest, including 10 diverse scenes of 50,528 frames with over 2,366,249 heads and 2,358 tracks annotated. Our dataset contains diverse human moving speeds, directions, and complex crowd pedestrian flows with collision avoidance behaviors. We provide a comprehensive analysis and comparison with existing state-of-the-art (SOTA) algorithms. Moreover, our MIFN is the first end-to-end CNN-based head detection and tracking network that jointly trains RGB frames, pixel-level motion information (optical flow and frame difference maps), depth maps, and density maps in videos. Compared with SOTA pedestrian detection and tracking methods, MIFN achieves superior performance on our Cchead dataset. We believe our datasets and baseline will become valuable resources towards developing pedestrian tracking in dense crowds.

8/13/2024