360 in the Wild: Dataset for Depth Prediction and View Synthesis

2406.18898

Published 6/28/2024 by Kibaek Park, Francois Rameau, Jaesik Park, In So Kweon

360 in the Wild: Dataset for Depth Prediction and View Synthesis

Abstract

The large abundance of perspective camera datasets facilitated the emergence of novel learning-based strategies for various tasks, such as camera localization, single image depth estimation, or view synthesis. However, panoramic or omnidirectional image datasets, including essential information, such as pose and depth, are mostly made with synthetic scenes. In this work, we introduce a large scale 360$^{circ}$ videos dataset in the wild. This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide. Hence, this dataset exhibits very diversified environments (e.g., indoor and outdoor) and contexts (e.g., with and without moving objects). Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map. We illustrate the relevance of our dataset for two main tasks, namely, single image depth estimation and view synthesis.

Create account to get full access

Overview

This paper presents a new dataset called "360° in the Wild" for depth prediction and view synthesis in 360° panoramic images.
The dataset contains high-quality 360° images with corresponding depth maps, camera poses, and other metadata.
The authors benchmark several state-of-the-art depth estimation and view synthesis models on this dataset, providing a comprehensive evaluation.
The dataset and evaluation framework are designed to advance research in omnidirectional visual perception and content generation.

Plain English Explanation

The researchers have created a new dataset of 360-degree panoramic images that can be used to train and test AI models for depth prediction and view synthesis. Depth prediction is the task of estimating how far away objects are in an image, while view synthesis is the ability to generate new views of a scene from different perspectives.

This dataset is special because it contains high-quality 360-degree images along with accurate depth maps and other information about the camera position and scene. Previous datasets for this type of task have been limited, so this new "360° in the Wild" dataset provides a richer and more realistic benchmark for evaluating the latest AI models.

The authors have also used this dataset to test several state-of-the-art depth estimation and view synthesis models. By comparing how well these models perform on the dataset, they can identify the strengths and weaknesses of different approaches and help guide future research in this area.

Overall, this new dataset and evaluation framework are important contributions that will advance the field of omnidirectional visual perception - the ability of AI systems to understand and interact with 360-degree environments. This has many potential applications, from robotics and virtual reality to autonomous driving and immersive media.

Technical Explanation

The paper introduces a new large-scale dataset called "360° in the Wild" for depth prediction and view synthesis in 360° panoramic images. The dataset contains over 100,000 high-quality 360° images with corresponding depth maps, camera poses, and other metadata. These images were collected from diverse real-world scenes, providing a more challenging and realistic benchmark compared to previous datasets.

To demonstrate the value of the dataset, the authors evaluate several state-of-the-art depth estimation and view synthesis models. For depth prediction, they benchmark methods like Depth Anywhere and Towards Long-Term Robotics in the Wild. For view synthesis, they evaluate approaches like 360X: Panoptic Multi-Modal Scene Understanding Dataset and 360° in the Wild: Dataset for Depth Prediction and View Synthesis. The results show significant room for improvement, highlighting the challenges of these tasks in the 360° domain.

The dataset and evaluation framework are designed to advance research in omnidirectional visual perception - the ability of AI systems to understand and interact with 360-degree environments. This has many potential applications, from robotics and virtual reality to autonomous driving and immersive media.

Critical Analysis

The "360° in the Wild" dataset and evaluation framework presented in this paper are valuable contributions to the field. By providing a large-scale, diverse, and realistic benchmark, the authors enable more rigorous and meaningful comparisons of depth estimation and view synthesis models.

However, the paper does not address certain limitations of the dataset. For example, the depth maps were generated using a semi-automatic process, which may introduce some inaccuracies. Additionally, the dataset is focused on static scenes, while many real-world applications would require understanding of dynamic environments.

Furthermore, the paper's discussion of the benchmarking results could be more in-depth. While the authors highlight the challenges faced by current state-of-the-art models, they do not provide a nuanced analysis of the strengths and weaknesses of different approaches. A more thorough examination of the results could yield additional insights to guide future research.

Finally, the potential applications of this work, such as in robotics and autonomous driving, are not explored in detail. The paper could benefit from a more extensive discussion of the real-world implications and societal impact of this research.

Conclusion

The "360° in the Wild" dataset and evaluation framework presented in this paper are valuable contributions to the field of omnidirectional visual perception. By providing a large-scale, diverse, and realistic benchmark, the authors enable more rigorous and meaningful comparisons of depth estimation and view synthesis models.

The benchmarking results highlight the significant challenges that current state-of-the-art models face in these tasks, underscoring the need for continued research and innovation. As the field of omnidirectional visual perception advances, the applications of this work in areas like robotics, virtual reality, and autonomous driving could have a profound impact on how we interact with and understand our 360-degree environments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries

Huajian Huang, Changkun Liu, Yipeng Zhu, Hui Cheng, Tristan Braud, Sai-Kit Yeung

Portable 360$^circ$ cameras are becoming a cheap and efficient tool to establish large visual databases. By capturing omnidirectional views of a scene, these cameras could expedite building environment models that are essential for visual localization. However, such an advantage is often overlooked due to the lack of valuable datasets. This paper introduces a new benchmark dataset, 360Loc, composed of 360$^circ$ images with ground truth poses for visual localization. We present a practical implementation of 360$^circ$ mapping combining 360$^circ$ images with lidar data to generate the ground truth 6DoF poses. 360Loc is the first dataset and benchmark that explores the challenge of cross-device visual positioning, involving 360$^circ$ reference frames, and query frames from pinhole, ultra-wide FoV fisheye, and 360$^circ$ cameras. We propose a virtual camera approach to generate lower-FoV query frames from 360$^circ$ images, which ensures a fair comparison of performance among different query types in visual localization tasks. We also extend this virtual camera approach to feature matching-based and pose regression-based methods to alleviate the performance loss caused by the cross-device domain gap, and evaluate its effectiveness against state-of-the-art baselines. We demonstrate that omnidirectional visual localization is more robust in challenging large-scale scenes with symmetries and repetitive structures. These results provide new insights into 360-camera mapping and omnidirectional visual localization with cross-device queries.

6/3/2024

cs.CV

360+x: A Panoptic Multi-modal Scene Understanding Dataset

Hao Chen, Yuqi Hou, Chenyuan Qu, Irene Testini, Xiaohan Hong, Jianbo Jiao

Human perception of the world is shaped by a multitude of viewpoints and modalities. While many existing datasets focus on scene understanding from a certain perspective (e.g. egocentric or third-person views), our dataset offers a panoptic perspective (i.e. multiple viewpoints with multiple data modalities). Specifically, we encapsulate third-person panoramic and front views, as well as egocentric monocular/binocular views with rich modalities including video, multi-channel audio, directional binaural delay, location data and textual scene descriptions within each scene captured, presenting comprehensive observation of the world. Figure 1 offers a glimpse of all 28 scene categories of our 360+x dataset. To the best of our knowledge, this is the first database that covers multiple viewpoints with multiple data modalities to mimic how daily information is accessed in the real world. Through our benchmark analysis, we presented 5 different scene understanding tasks on the proposed 360+x dataset to evaluate the impact and benefit of each data modality and perspective in panoptic scene understanding. We hope this unique dataset could broaden the scope of comprehensive scene understanding and encourage the community to approach these problems from more diverse perspectives.

4/9/2024

cs.CV cs.AI cs.MM cs.SD eess.AS

⚙️

Towards Long-term Robotics in the Wild

Stephen Hausler, Ethan Griffiths, Milad Ramezani, Peyman Moghadam

In this paper, we emphasise the critical importance of large-scale datasets for advancing field robotics capabilities, particularly in natural environments. While numerous datasets exist for urban and suburban settings, those tailored to natural environments are scarce. Our recent benchmarks WildPlaces and WildScenes address this gap by providing synchronised image, lidar, semantic and accurate 6-DoF pose information in forest-type environments. We highlight the multi-modal nature of this dataset and discuss and demonstrate its utility in various downstream tasks, such as place recognition and 2D and 3D semantic segmentation tasks.

4/30/2024

cs.RO

Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation

Ning-Hsu Wang, Yu-Lun Liu

Accurately estimating depth in 360-degree imagery is crucial for virtual reality, autonomous navigation, and immersive media applications. Existing depth estimation methods designed for perspective-view imagery fail when applied to 360-degree images due to different camera projections and distortions, whereas 360-degree methods perform inferior due to the lack of labeled data pairs. We propose a new depth estimation framework that utilizes unlabeled 360-degree data effectively. Our approach uses state-of-the-art perspective depth estimation models as teacher models to generate pseudo labels through a six-face cube projection technique, enabling efficient labeling of depth in 360-degree images. This method leverages the increasing availability of large datasets. Our approach includes two main stages: offline mask generation for invalid regions and an online semi-supervised joint training regime. We tested our approach on benchmark datasets such as Matterport3D and Stanford2D3D, showing significant improvements in depth estimation accuracy, particularly in zero-shot scenarios. Our proposed training pipeline can enhance any 360 monocular depth estimator and demonstrates effective knowledge transfer across different camera projections and data types. See our project page for results: https://albert100121.github.io/Depth-Anywhere/

6/19/2024

cs.CV