SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length

Read original: arXiv:2409.07759 - Published 9/14/2024 by Bangya Liu, Suman Banerjee

👁️

Overview

Recent advancements in 3D Gaussian Splatting (3DGS) have gained significant attention in computer vision and graphics due to its high rendering speed and quality.
While previous research has tried to extend 3DGS from static to dynamic scenes, they have faced challenges like large model sizes, constraints on video duration, and content deviation.
These limitations have restricted the use of dynamic 3D Gaussian models in applications like volumetric video, autonomous vehicles, and immersive technologies.

Plain English Explanation

The paper introduces a new framework called SwinGS that addresses the limitations of previous dynamic 3D Gaussian splatting approaches. SwinGS integrates spacetime Gaussian with Markov Chain Monte Carlo (MCMC) to adapt the model to fit various 3D scenes across frames. It also uses a sliding window to capture Gaussian snapshots for each frame in an accumulative way.

This allows SwinGS to provide real-time streaming of volumetric video while reducing transmission costs by 83.6% compared to previous work, with minimal compromise in quality. Additionally, SwinGS can easily scale to long video sequences without quality degradation.

The paper also presents an interactive WebGL viewer that enables real-time playback of volumetric video on a variety of devices, including smartphones and tablets.

Technical Explanation

The core idea behind SwinGS is to address the limitations of previous dynamic 3D Gaussian splatting approaches by:

Integrating Spacetime Gaussian with MCMC: This allows the model to adapt to fit various 3D scenes across frames, enhancing its streamability.
Using a Sliding Window: This captures Gaussian snapshots for each frame in an accumulative way, further improving the streaming capabilities of the system.

The paper implements a prototype of SwinGS and demonstrates its performance across various datasets and scenes. The authors also develop an interactive WebGL viewer that enables real-time playback of volumetric video on a wide range of devices.

Critical Analysis

The paper presents a novel approach to address the limitations of previous dynamic 3D Gaussian splatting techniques, which is a significant contribution to the field. However, the authors do not discuss potential caveats or limitations of the SwinGS framework.

For example, the paper does not explore the impact of the sliding window size on the overall performance and quality of the system. Additionally, the scalability of SwinGS to extremely long video sequences or high-resolution 3D scenes is not thoroughly investigated.

Further research could also explore the integration of SwinGS with other compression techniques or machine learning models to enhance its efficiency and applicability in a broader range of scenarios.

Conclusion

The SwinGS framework introduced in this paper represents a significant advancement in the field of dynamic 3D Gaussian splatting. By integrating spacetime Gaussian with MCMC and using a sliding window, the system is able to provide real-time streaming of volumetric video with reduced transmission costs and minimal quality compromise.

The interactive WebGL viewer developed by the authors further enhances the accessibility and usability of the technology, paving the way for its adoption in various applications, such as volumetric video, autonomous vehicles, and immersive experiences. Although the paper does not address all the potential limitations, the core ideas and the demonstrated performance of SwinGS make it a promising step forward in the evolution of real-time 3D rendering and streaming technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👁️

SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length

Bangya Liu, Suman Banerjee

Recent advances in 3D Gaussian Splatting (3DGS) have garnered significant attention in computer vision and computer graphics due to its high rendering speed and remarkable quality. While extant research has endeavored to extend the application of 3DGS from static to dynamic scenes, such efforts have been consistently impeded by excessive model sizes, constraints on video duration, and content deviation. These limitations significantly compromise the streamability of dynamic 3D Gaussian models, thereby restricting their utility in downstream applications, including volumetric video, autonomous vehicle, and immersive technologies such as virtual, augmented, and mixed reality. This paper introduces SwinGS, a novel framework for training, delivering, and rendering volumetric video in a real-time streaming fashion. To address the aforementioned challenges and enhance streamability, SwinGS integrates spacetime Gaussian with Markov Chain Monte Carlo (MCMC) to adapt the model to fit various 3D scenes across frames, in the meantime employing a sliding window captures Gaussian snapshots for each frame in an accumulative way. We implement a prototype of SwinGS and demonstrate its streamability across various datasets and scenes. Additionally, we develop an interactive WebGL viewer enabling real-time volumetric video playback on most devices with modern browsers, including smartphones and tablets. Experimental results show that SwinGS reduces transmission costs by 83.6% compared to previous work with ignorable compromise in PSNR. Moreover, SwinGS easily scales to long video sequences without compromising quality.

9/14/2024

SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting

Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, Eduardo Perez-Pellitero

Novel view synthesis has shown rapid progress recently, with methods capable of producing increasingly photorealistic results. 3D Gaussian Splatting has emerged as a promising method, producing high-quality renderings of scenes and enabling interactive viewing at real-time frame rates. However, it is limited to static scenes. In this work, we extend 3D Gaussian Splatting to reconstruct dynamic scenes. We model a scene's dynamics using dynamic MLPs, learning deformations from temporally-local canonical representations to per-frame 3D Gaussians. To disentangle static and dynamic regions, tuneable parameters weigh each Gaussian's respective MLP parameters, improving the dynamics modelling of imbalanced scenes. We introduce a sliding window training strategy that partitions the sequence into smaller manageable windows to handle arbitrary length scenes while maintaining high rendering quality. We propose an adaptive sampling strategy to determine appropriate window size hyperparameters based on the scene's motion, balancing training overhead with visual quality. Training a separate dynamic 3D Gaussian model for each sliding window allows the canonical representation to change, enabling the reconstruction of scenes with significant geometric changes. Temporal consistency is enforced using a fine-tuning step with self-supervising consistency loss on randomly sampled novel views. As a result, our method produces high-quality renderings of general dynamic scenes with competitive quantitative performance, which can be viewed in real-time in our dynamic interactive viewer.

7/19/2024

New!Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu

Volumetric video represents a transformative advancement in visual media, enabling users to freely navigate immersive virtual experiences and narrowing the gap between digital and real worlds. However, the need for extensive manual intervention to stabilize mesh sequences and the generation of excessively large assets in existing workflows impedes broader adoption. In this paper, we present a novel Gaussian-based approach, dubbed textit{DualGS}, for real-time and high-fidelity playback of complex human performance with excellent compression ratios. Our key idea in DualGS is to separately represent motion and appearance using the corresponding skin and joint Gaussians. Such an explicit disentanglement can significantly reduce motion redundancy and enhance temporal coherence. We begin by initializing the DualGS and anchoring skin Gaussians to joint Gaussians at the first frame. Subsequently, we employ a coarse-to-fine training strategy for frame-by-frame human performance modeling. It includes a coarse alignment phase for overall motion prediction as well as a fine-grained optimization for robust tracking and high-fidelity rendering. To integrate volumetric video seamlessly into VR environments, we efficiently compress motion using entropy encoding and appearance using codec compression coupled with a persistent codebook. Our approach achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame. We demonstrate the efficacy of our representation through photo-realistic, free-view experiences on VR headsets, enabling users to immersively watch musicians in performance and feel the rhythm of the notes at the performers' fingertips.

9/16/2024

Towards Real-Time Gaussian Splatting: Accelerating 3DGS through Photometric SLAM

Yan Song Hu, Dayou Mao, Yuhao Chen, John Zelek

Initial applications of 3D Gaussian Splatting (3DGS) in Visual Simultaneous Localization and Mapping (VSLAM) demonstrate the generation of high-quality volumetric reconstructions from monocular video streams. However, despite these promising advancements, current 3DGS integrations have reduced tracking performance and lower operating speeds compared to traditional VSLAM. To address these issues, we propose integrating 3DGS with Direct Sparse Odometry, a monocular photometric SLAM system. We have done preliminary experiments showing that using Direct Sparse Odometry point cloud outputs, as opposed to standard structure-from-motion methods, significantly shortens the training time needed to achieve high-quality renders. Reducing 3DGS training time enables the development of 3DGS-integrated SLAM systems that operate in real-time on mobile hardware. These promising initial findings suggest further exploration is warranted in combining traditional VSLAM systems with 3DGS.

8/9/2024