Multiple Object Detection and Tracking in Panoramic Videos for Cycling Safety Analysis

Read original: arXiv:2407.15199 - Published 7/23/2024 by Jingwei Guo, Meihui Wang, Ilya Ilyankou, Natchapon Jongwiriyanurak, Xiaowei Gao, Nicola Christie, James Haworth

Multiple Object Detection and Tracking in Panoramic Videos for Cycling Safety Analysis

Overview

This paper presents a system for detecting and tracking multiple objects in panoramic videos, with a focus on analyzing cycling safety.
The system uses computer vision techniques to identify and follow different objects, such as cyclists, pedestrians, and vehicles, within 360-degree camera footage.
The goal is to provide insights into traffic patterns, interactions, and potential safety risks for cyclists.

Plain English Explanation

The researchers developed a computer system that can analyze 360-degree video footage to detect and track different objects, like cyclists, pedestrians, and cars. The idea is to use this technology to study cycling safety by understanding how these objects interact with each other in the environment.

For example, the system could identify where cyclists are frequently passing by certain areas, or where they tend to have close encounters with vehicles. This information could help urban planners and transportation officials identify potential safety issues and make improvements, such as adding bike lanes or adjusting traffic signals.

By using panoramic, or 360-degree, video, the researchers are able to capture a much wider view of the surroundings compared to traditional cameras. This gives the system a more complete picture of the traffic and interactions happening in the area.

The key benefit of this technology is that it allows for a more thorough and data-driven analysis of cycling safety, beyond what could be achieved through manual observation or limited camera coverage. With this information, communities can make more informed decisions to create safer environments for cyclists.

Technical Explanation

The paper describes a computer vision system that can detect and track multiple objects in panoramic video footage. The system first uses object detection models to identify different entities, such as cyclists, pedestrians, and vehicles, within the 360-degree video frames.

It then employs multi-object tracking techniques to follow the movement of these detected objects across the video sequence. This allows the researchers to analyze the trajectories and interactions between the various road users.

The panoramic nature of the video provides a more comprehensive view of the environment compared to traditional cameras with limited fields of view. This helps the system capture a fuller picture of the traffic and potential safety issues facing cyclists.

The researchers tested their system on a dataset of 360-degree videos recorded from the perspective of cyclists. By analyzing the detected objects and their movements, they were able to gain insights into cycling safety, such as identifying areas with high-risk interactions between cyclists and vehicles.

Critical Analysis

The paper presents a promising approach for using computer vision and 360-degree video to analyze cycling safety. However, it's important to note that the system's performance and reliability may be influenced by factors such as video quality, lighting conditions, and the complexity of the traffic environment.

Additionally, the paper does not provide a detailed evaluation of the system's accuracy in detecting and tracking different road users. Further research and testing would be needed to fully assess the reliability and practical applications of this technology.

It would also be valuable to explore how this system could be integrated with other data sources, such as infrastructure sensors or crowdsourced reports, to provide a more comprehensive understanding of cycling safety issues. Combining multiple data streams could lead to more robust insights and more effective interventions.

Conclusion

This research demonstrates the potential of using advanced computer vision and 360-degree video technology to gain valuable insights into cycling safety. By detecting and tracking multiple objects in panoramic footage, the system can provide a more holistic view of traffic patterns and interactions that impact the safety of cyclists.

While further development and evaluation are needed, this work represents an important step towards creating data-driven solutions to improve cycling infrastructure and make roads safer for all users. As communities continue to invest in sustainable transportation, tools like this could play a crucial role in supporting evidence-based decision-making and creating safer, more accessible environments for cyclists.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Multiple Object Detection and Tracking in Panoramic Videos for Cycling Safety Analysis

Jingwei Guo, Meihui Wang, Ilya Ilyankou, Natchapon Jongwiriyanurak, Xiaowei Gao, Nicola Christie, James Haworth

Panoramic cycling videos can record 360{deg} views around the cyclists. Thus, it is essential to conduct automatic road user analysis on them using computer vision models to provide data for studies on cycling safety. However, the features of panoramic data such as severe distortions, large number of small objects and boundary continuity have brought great challenges to the existing CV models, including poor performance and evaluation methods that are no longer applicable. In addition, due to the lack of data with annotations, it is not easy to re-train the models. In response to these problems, the project proposed and implemented a three-step methodology: (1) improve the prediction performance of the pre-trained object detection models on panoramic data by projecting the original image into 4 perspective sub-images; (2) introduce supports for boundary continuity and category information into DeepSORT, a commonly used multiple object tracking model, and set an improved detection model as its detector; (3) using the tracking results, develop an application for detecting the overtaking behaviour of the surrounding vehicles. Evaluated on the panoramic cycling dataset built by the project, the proposed methodology improves the average precision of YOLO v5m6 and Faster RCNN-FPN under any input resolution setting. In addition, it raises MOTA and IDF1 of DeepSORT by 7.6% and 9.7% respectively. When detecting the overtakes in the test videos, it achieves the F-score of 0.88. The code is available on GitHub at github.com/cuppp1998/360_object_tracking to ensure the reproducibility and further improvements of results.

7/23/2024

360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos

Yinzhe Xu, Huajian Huang, Yingshu Chen, Sai-Kit Yeung

Visual object tracking and segmentation in omnidirectional videos are challenging due to the wide field-of-view and large spherical distortion brought by 360{deg} images. To alleviate these problems, we introduce a novel representation, extended bounding field-of-view (eBFoV), for target localization and use it as the foundation of a general 360 tracking framework which is applicable for both omnidirectional visual object tracking and segmentation tasks. Building upon our previous work on omnidirectional visual object tracking (360VOT), we propose a comprehensive dataset and benchmark that incorporates a new component called omnidirectional video object segmentation (360VOS). The 360VOS dataset includes 290 sequences accompanied by dense pixel-wise masks and covers a broader range of target categories. To support both the development and evaluation of algorithms in this domain, we divide the dataset into a training subset with 170 sequences and a testing subset with 120 sequences. Furthermore, we tailor evaluation metrics for both omnidirectional tracking and segmentation to ensure rigorous assessment. Through extensive experiments, we benchmark state-of-the-art approaches and demonstrate the effectiveness of our proposed 360 tracking framework and training dataset. Homepage: https://360vots.hkustvgd.com/

4/23/2024

Open Panoramic Segmentation

Junwei Zheng, Ruiping Liu, Yufan Chen, Kunyu Peng, Chengzhi Wu, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen

Panoramic images, capturing a 360{deg} field of view (FoV), encompass omnidirectional spatial information crucial for scene understanding. However, it is not only costly to obtain training-sufficient dense-annotated panoramas but also application-restricted when training models in a close-vocabulary setting. To tackle this problem, in this work, we define a new task termed Open Panoramic Segmentation (OPS), where models are trained with FoV-restricted pinhole images in the source domain in an open-vocabulary setting while evaluated with FoV-open panoramic images in the target domain, enabling the zero-shot open panoramic semantic segmentation ability of models. Moreover, we propose a model named OOOPS with a Deformable Adapter Network (DAN), which significantly improves zero-shot panoramic semantic segmentation performance. To further enhance the distortion-aware modeling ability from the pinhole source domain, we propose a novel data augmentation method called Random Equirectangular Projection (RERP) which is specifically designed to address object deformations in advance. Surpassing other state-of-the-art open-vocabulary semantic segmentation approaches, a remarkable performance boost on three panoramic datasets, WildPASS, Stanford2D3D, and Matterport3D, proves the effectiveness of our proposed OOOPS model with RERP on the OPS task, especially +2.2% on outdoor WildPASS and +2.4% mIoU on indoor Stanford2D3D. The source code is publicly available at https://junweizheng93.github.io/publications/OPS/OPS.html.

7/15/2024

360 in the Wild: Dataset for Depth Prediction and View Synthesis

Kibaek Park, Francois Rameau, Jaesik Park, In So Kweon

The large abundance of perspective camera datasets facilitated the emergence of novel learning-based strategies for various tasks, such as camera localization, single image depth estimation, or view synthesis. However, panoramic or omnidirectional image datasets, including essential information, such as pose and depth, are mostly made with synthetic scenes. In this work, we introduce a large scale 360$^{circ}$ videos dataset in the wild. This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide. Hence, this dataset exhibits very diversified environments (e.g., indoor and outdoor) and contexts (e.g., with and without moving objects). Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map. We illustrate the relevance of our dataset for two main tasks, namely, single image depth estimation and view synthesis.

7/8/2024