360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos

Read original: arXiv:2404.13953 - Published 4/23/2024 by Yinzhe Xu, Huajian Huang, Yingshu Chen, Sai-Kit Yeung

360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos

Overview

• This research paper introduces a novel dataset called "360VOTS" for visual object tracking and segmentation in omnidirectional videos. • The dataset contains high-quality annotations for object tracking and segmentation in 360-degree videos, addressing the lack of such resources for omnidirectional vision tasks. • The paper also presents benchmark results for various state-of-the-art algorithms on the 360VOTS dataset, providing a comprehensive evaluation of the current capabilities in this domain.

Plain English Explanation

• The research team has created a new dataset called "360VOTS" that contains 360-degree videos with detailed annotations for objects within the videos. This allows researchers and developers to test and improve algorithms for tracking and segmenting objects in omnidirectional videos. • Omnidirectional videos, also known as 360-degree videos, capture a full 360-degree view around the camera. This is different from traditional videos, which only show a limited field of view. • Tracking and segmenting objects in 360-degree videos is more challenging than in regular videos, as the objects can appear distorted and the camera's movement can be more complex. • The 360VOTS dataset provides high-quality annotations for objects in these 360-degree videos, allowing researchers to develop and evaluate algorithms that can effectively track and segment objects in omnidirectional video settings. • By establishing this new dataset and benchmark, the researchers hope to drive progress in the field of omnidirectional vision and enable better applications, such as 360-degree video editing, 360-degree object reconstruction, and 360-degree object detection.

Technical Explanation

• The 360VOTS dataset consists of 90 high-quality 360-degree videos with a total duration of over 5 hours. Each video has been annotated with bounding boxes and segmentation masks for the objects of interest, providing a comprehensive dataset for visual object tracking and segmentation in omnidirectional videos. • The dataset covers a diverse range of scenes, object types, and camera motions, making it a valuable resource for evaluating the performance of various algorithms in challenging omnidirectional video settings. • The paper presents benchmark results for several state-of-the-art object tracking and segmentation algorithms on the 360VOTS dataset, including 360-degree visual localization and 360-degree object outpainting techniques. • The evaluation metrics used in the benchmarks include standard measures such as intersection-over-union (IoU) for object segmentation and center location error (CLE) for object tracking, as well as specialized metrics that account for the unique challenges of omnidirectional video analysis. • The results demonstrate the capabilities and limitations of current state-of-the-art methods, providing valuable insights into the current state of the field and guiding future research directions.

Critical Analysis

• The 360VOTS dataset represents a significant contribution to the field of omnidirectional vision, as it addresses the lack of high-quality datasets for evaluating object tracking and segmentation algorithms in 360-degree video settings. • However, the dataset is limited to a specific set of scenes and object types, and it remains to be seen how well the algorithms will generalize to a wider range of real-world omnidirectional video scenarios. • The paper also does not provide a detailed analysis of the challenges and sources of errors encountered by the evaluated algorithms, which could limit the insights gained from the benchmark results. • Furthermore, the paper does not discuss the potential ethical implications of using these technologies, such as privacy concerns or the potential for misuse in surveillance applications. It would be valuable for future research to consider these important factors. • Despite these limitations, the 360VOTS dataset and the benchmark results presented in this paper represent an important step forward in advancing the field of omnidirectional vision and enabling better applications, such as 360-degree object detection and 360-degree object reconstruction.

Conclusion

• The 360VOTS dataset and benchmark presented in this paper provide a valuable resource for researchers and developers working on visual object tracking and segmentation in omnidirectional video settings. • The dataset's comprehensive annotations and the benchmark's evaluation of state-of-the-art algorithms offer important insights into the current capabilities and limitations in this domain, paving the way for future advancements. • By addressing the lack of high-quality datasets for omnidirectional vision tasks, this research significantly contributes to the development of more robust and effective algorithms, which can enable a wide range of applications, from 360-degree video editing to 360-degree object detection and 360-degree object reconstruction.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos

Yinzhe Xu, Huajian Huang, Yingshu Chen, Sai-Kit Yeung

Visual object tracking and segmentation in omnidirectional videos are challenging due to the wide field-of-view and large spherical distortion brought by 360{deg} images. To alleviate these problems, we introduce a novel representation, extended bounding field-of-view (eBFoV), for target localization and use it as the foundation of a general 360 tracking framework which is applicable for both omnidirectional visual object tracking and segmentation tasks. Building upon our previous work on omnidirectional visual object tracking (360VOT), we propose a comprehensive dataset and benchmark that incorporates a new component called omnidirectional video object segmentation (360VOS). The 360VOS dataset includes 290 sequences accompanied by dense pixel-wise masks and covers a broader range of target categories. To support both the development and evaluation of algorithms in this domain, we divide the dataset into a training subset with 170 sequences and a testing subset with 120 sequences. Furthermore, we tailor evaluation metrics for both omnidirectional tracking and segmentation to ensure rigorous assessment. Through extensive experiments, we benchmark state-of-the-art approaches and demonstrate the effectiveness of our proposed 360 tracking framework and training dataset. Homepage: https://360vots.hkustvgd.com/

4/23/2024

360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries

Huajian Huang, Changkun Liu, Yipeng Zhu, Hui Cheng, Tristan Braud, Sai-Kit Yeung

Portable 360$^circ$ cameras are becoming a cheap and efficient tool to establish large visual databases. By capturing omnidirectional views of a scene, these cameras could expedite building environment models that are essential for visual localization. However, such an advantage is often overlooked due to the lack of valuable datasets. This paper introduces a new benchmark dataset, 360Loc, composed of 360$^circ$ images with ground truth poses for visual localization. We present a practical implementation of 360$^circ$ mapping combining 360$^circ$ images with lidar data to generate the ground truth 6DoF poses. 360Loc is the first dataset and benchmark that explores the challenge of cross-device visual positioning, involving 360$^circ$ reference frames, and query frames from pinhole, ultra-wide FoV fisheye, and 360$^circ$ cameras. We propose a virtual camera approach to generate lower-FoV query frames from 360$^circ$ images, which ensures a fair comparison of performance among different query types in visual localization tasks. We also extend this virtual camera approach to feature matching-based and pose regression-based methods to alleviate the performance loss caused by the cross-device domain gap, and evaluate its effectiveness against state-of-the-art baselines. We demonstrate that omnidirectional visual localization is more robust in challenging large-scale scenes with symmetries and repetitive structures. These results provide new insights into 360-camera mapping and omnidirectional visual localization with cross-device queries.

6/3/2024

360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation

Wenxuan Lu, Mengshun Hu, Yansheng Qiu, Liang Liao, Zheng Wang

Head-mounted 360{deg} displays and portable 360{deg} cameras have significantly progressed, providing viewers a realistic and immersive experience. However, many omnidirectional videos have low frame rates that can lead to visual fatigue, and the prevailing plane frame interpolation methodologies are unsuitable for omnidirectional video interpolation because they are designed solely for traditional videos. This paper introduces the benchmark dataset, 360VFI, for Omnidirectional Video Frame Interpolation. We present a practical implementation that introduces a distortion prior from omnidirectional video into the network to modulate distortions. Specifically, we propose a pyramid distortion-sensitive feature extractor that uses the unique characteristics of equirectangular projection (ERP) format as prior information. Moreover, we devise a decoder that uses an affine transformation to further facilitate the synthesis of intermediate frames. 360VFI is the first dataset and benchmark that explores the challenge of Omnidirectional Video Frame Interpolation. Through our benchmark analysis, we present four different distortion condition scenes in the proposed 360VFI dataset to evaluate the challenges triggered by distortion during interpolation. Besides, experimental results demonstrate that Omnidirectional Video Interpolation can be effectively improved by modeling for omnidirectional distortion.

9/10/2024

Multiple Object Detection and Tracking in Panoramic Videos for Cycling Safety Analysis

Jingwei Guo, Meihui Wang, Ilya Ilyankou, Natchapon Jongwiriyanurak, Xiaowei Gao, Nicola Christie, James Haworth

Panoramic cycling videos can record 360{deg} views around the cyclists. Thus, it is essential to conduct automatic road user analysis on them using computer vision models to provide data for studies on cycling safety. However, the features of panoramic data such as severe distortions, large number of small objects and boundary continuity have brought great challenges to the existing CV models, including poor performance and evaluation methods that are no longer applicable. In addition, due to the lack of data with annotations, it is not easy to re-train the models. In response to these problems, the project proposed and implemented a three-step methodology: (1) improve the prediction performance of the pre-trained object detection models on panoramic data by projecting the original image into 4 perspective sub-images; (2) introduce supports for boundary continuity and category information into DeepSORT, a commonly used multiple object tracking model, and set an improved detection model as its detector; (3) using the tracking results, develop an application for detecting the overtaking behaviour of the surrounding vehicles. Evaluated on the panoramic cycling dataset built by the project, the proposed methodology improves the average precision of YOLO v5m6 and Faster RCNN-FPN under any input resolution setting. In addition, it raises MOTA and IDF1 of DeepSORT by 7.6% and 9.7% respectively. When detecting the overtakes in the test videos, it achieves the F-score of 0.88. The code is available on GitHub at github.com/cuppp1998/360_object_tracking to ensure the reproducibility and further improvements of results.

7/23/2024