Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections

2406.10373

Published 6/18/2024 by Jiacong Xu, Yiqun Mei, Vishal M. Patel

Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections

Abstract

Photographs captured in unstructured tourist environments frequently exhibit variable appearances and transient occlusions, challenging accurate scene reconstruction and inducing artifacts in novel view synthesis. Although prior approaches have integrated the Neural Radiance Field (NeRF) with additional learnable modules to handle the dynamic appearances and eliminate transient objects, their extensive training demands and slow rendering speeds limit practical deployments. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising alternative to NeRF, offering superior training and inference efficiency along with better rendering quality. This paper presents Wild-GS, an innovative adaptation of 3DGS optimized for unconstrained photo collections while preserving its efficiency benefits. Wild-GS determines the appearance of each 3D Gaussian by their inherent material attributes, global illumination and camera properties per image, and point-level local variance of reflectance. Unlike previous methods that model reference features in image space, Wild-GS explicitly aligns the pixel appearance features to the corresponding local Gaussians by sampling the triplane extracted from the reference image. This novel design effectively transfers the high-frequency detailed appearance of the reference view to 3D space and significantly expedites the training process. Furthermore, 2D visibility maps and depth regularization are leveraged to mitigate the transient effects and constrain the geometry, respectively. Extensive experiments demonstrate that Wild-GS achieves state-of-the-art rendering performance and the highest efficiency in both training and inference among all the existing techniques.

Create account to get full access

Overview

This paper presents a novel method called "Wild-GS" for real-time novel view synthesis from unconstrained photo collections.
The method leverages a Gaussian Representation to efficiently model and render complex 3D scenes.
It enables high-quality view synthesis in real-time, even for challenging scenes with occlusions, reflections, and lighting variations.
The approach is designed to work with "in-the-wild" photo collections, without requiring specialized camera setups or carefully controlled image capture.

Plain English Explanation

The researchers have developed a new system called "Wild-GS" that can take a collection of ordinary photographs and use them to generate realistic new views of a 3D scene in real-time. This is useful for applications like virtual tourism, where you could explore a location by seamlessly transitioning between different perspectives, even if the original photos were taken under varying conditions.

The key innovation is the use of a "Gaussian Representation" to model the 3D scene. This allows the system to efficiently capture complex details like occlusions, reflections, and lighting changes, which are common challenges in real-world photo collections. The end result is a system that can generate smooth, high-quality novel views very quickly, without requiring specialized camera equipment or carefully curated image data.

This work builds on previous research in view synthesis, 3D scene modeling, and sparse view synthesis. The novel contributions include the Gaussian Representation and the ability to handle "in-the-wild" photo collections, making the system more practical and widely applicable.

Technical Explanation

The core of the "Wild-GS" approach is the use of a Gaussian Representation to model the 3D scene. This allows the system to efficiently capture complex scene details like occlusions, reflections, and lighting variations, which are common challenges when working with unconstrained photo collections.

The authors first extract features from the input photos and use them to estimate a set of Gaussian distributions that represent the 3D structure of the scene. These Gaussians encode information about the position, orientation, and appearance of different scene elements. The system can then use this Gaussian Representation to rapidly generate novel views by sampling from the appropriate Gaussian distributions.

Key innovations in this work include:

Efficient 3D Gaussian representation that can capture complex scene details
Robust handling of "in-the-wild" photo collections with variations in lighting, viewpoint, and occlusions
Real-time rendering of novel views using sparse view synthesis techniques
Few-shot adaptation to new scenes with limited additional training data

The authors evaluate their approach on a variety of challenging "in-the-wild" photo collections, demonstrating high-quality novel view synthesis in real-time, even for complex scenes.

Critical Analysis

The "Wild-GS" approach represents a significant advance in the field of view synthesis, particularly in its ability to handle unconstrained photo collections. The use of a Gaussian Representation to model 3D scenes is a clever and effective solution to the challenges of occlusions, reflections, and lighting variations.

That said, the paper does acknowledge some limitations of the current approach. For example, the system may struggle with very sparse or highly repetitive scene elements, and the quality of the novel views can degrade for significant changes in viewpoint. Additionally, the authors note that the current implementation relies on GPU acceleration, which could limit its deployment on resource-constrained devices.

Further research could explore ways to address these limitations, such as self-calibrating techniques to handle more extreme viewpoint changes, or the development of more efficient neural network architectures for CPU-based inference.

Overall, the "Wild-GS" method represents an impressive step forward in the field of view synthesis, demonstrating the potential for high-quality, real-time novel view generation from unconstrained photo collections. The system's ability to handle challenging "in-the-wild" scenarios makes it a valuable tool for a wide range of applications, from virtual tourism to immersive visual experiences.

Conclusion

The "Wild-GS" method presented in this paper is a significant advancement in the field of novel view synthesis. By leveraging a Gaussian Representation to efficiently model complex 3D scenes, the system can generate high-quality novel views in real-time, even for challenging "in-the-wild" photo collections.

This work builds on and extends previous research in areas like view synthesis, 3D scene modeling, and sparse view synthesis. The key innovations include the Gaussian Representation and the ability to handle real-world photo collections with varying lighting, viewpoints, and occlusions.

The potential applications of this technology are wide-ranging, from virtual tourism and immersive gaming to telepresence and 3D content creation. As the authors continue to refine and expand the capabilities of "Wild-GS", it could become an increasingly valuable tool for a variety of industries and applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections

Yuze Wang, Junyi Wang, Yue Qi

Novel View Synthesis (NVS) from unconstrained photo collections is challenging in computer graphics. Recently, 3D Gaussian Splatting (3DGS) has shown promise for photorealistic and real-time NVS of static scenes. Building on 3DGS, we propose an efficient point-based differentiable rendering framework for scene reconstruction from photo collections. Our key innovation is a residual-based spherical harmonic coefficients transfer module that adapts 3DGS to varying lighting conditions and photometric post-processing. This lightweight module can be pre-computed and ensures efficient gradient propagation from rendered images to 3D Gaussian attributes. Additionally, we observe that the appearance encoder and the transient mask predictor, the two most critical parts of NVS from unconstrained photo collections, can be mutually beneficial. We introduce a plug-and-play lightweight spatial attention module to simultaneously predict transient occluders and latent appearance representation for each image. After training and preprocessing, our method aligns with the standard 3DGS format and rendering pipeline, facilitating seamlessly integration into various 3DGS applications. Extensive experiments on diverse datasets show our approach outperforms existing approaches on the rendering quality of novel view and appearance synthesis with high converge and rendering speed.

6/5/2024

cs.CV

SparseGS: Real-Time 360{deg} Sparse View Synthesis using Gaussian Splatting

Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, Achuta Kadambi

The problem of novel view synthesis has grown significantly in popularity recently with the introduction of Neural Radiance Fields (NeRFs) and other implicit scene representation methods. A recent advance, 3D Gaussian Splatting (3DGS), leverages an explicit representation to achieve real-time rendering with high-quality results. However, 3DGS still requires an abundance of training views to generate a coherent scene representation. In few shot settings, similar to NeRF, 3DGS tends to overfit to training views, causing background collapse and excessive floaters, especially as the number of training views are reduced. We propose a method to enable training coherent 3DGS-based radiance fields of 360-degree scenes from sparse training views. We integrate depth priors with generative and explicit constraints to reduce background collapse, remove floaters, and enhance consistency from unseen viewpoints. Experiments show that our method outperforms base 3DGS by 6.4% in LPIPS and by 12.2% in PSNR, and NeRF-based methods by at least 17.6% in LPIPS on the MipNeRF-360 dataset with substantially less training and inference cost.

5/14/2024

cs.CV cs.LG eess.IV

From Chaos to Clarity: 3DGS in the Dark

Zhihao Li, Yufei Wang, Alex Kot, Bihan Wen

Novel view synthesis from raw images provides superior high dynamic range (HDR) information compared to reconstructions from low dynamic range RGB images. However, the inherent noise in unprocessed raw images compromises the accuracy of 3D scene representation. Our study reveals that 3D Gaussian Splatting (3DGS) is particularly susceptible to this noise, leading to numerous elongated Gaussian shapes that overfit the noise, thereby significantly degrading reconstruction quality and reducing inference speed, especially in scenarios with limited views. To address these issues, we introduce a novel self-supervised learning framework designed to reconstruct HDR 3DGS from a limited number of noisy raw images. This framework enhances 3DGS by integrating a noise extractor and employing a noise-robust reconstruction loss that leverages a noise distribution prior. Experimental results show that our method outperforms LDR/HDR 3DGS and previous state-of-the-art (SOTA) self-supervised and supervised pre-trained models in both reconstruction quality and inference speed on the RawNeRF dataset across a broad range of training views. Code can be found in url{https://lizhihao6.github.io/Raw3DGS}.

6/13/2024

eess.IV cs.CV

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

Zehao Zhu, Zhiwen Fan, Yifan Jiang, Zhangyang Wang

Novel view synthesis from limited observations remains an important and persistent task. However, high efficiency in existing NeRF-based few-shot view synthesis is often compromised to obtain an accurate 3D representation. To address this challenge, we propose a few-shot view synthesis framework based on 3D Gaussian Splatting that enables real-time and photo-realistic view synthesis with as few as three training views. The proposed method, dubbed FSGS, handles the extremely sparse initialized SfM points with a thoughtfully designed Gaussian Unpooling process. Our method iteratively distributes new Gaussians around the most representative locations, subsequently infilling local details in vacant areas. We also integrate a large-scale pre-trained monocular depth estimator within the Gaussians optimization process, leveraging online augmented views to guide the geometric optimization towards an optimal solution. Starting from sparse points observed from limited input viewpoints, our FSGS can accurately grow into unseen regions, comprehensively covering the scene and boosting the rendering quality of novel views. Overall, FSGS achieves state-of-the-art performance in both accuracy and rendering efficiency across diverse datasets, including LLFF, Mip-NeRF360, and Blender. Project website: https://zehaozhu.github.io/FSGS/.

6/18/2024

cs.CV