Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

Read original: arXiv:2409.08353 - Published 9/16/2024 by Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

Overview

Explains a new method called "Robust Dual Gaussian Splatting" for creating high-quality volumetric videos of humans
Focuses on improving the realism and immersion of human-centric volumetric content
Introduces a neural rendering approach that combines dual Gaussian splatting with robust reconstruction techniques

Plain English Explanation

The paper describes a new way to create high-quality volumetric videos of people that look more realistic and immersive. The key idea is to use a technique called "Robust Dual Gaussian Splatting" which combines two Gaussian functions to model the appearance of each point in the 3D scene. This allows the method to better capture the fuzzy edges and soft transitions around objects, resulting in more natural-looking human figures.

The paper also introduces some additional techniques to further improve the realism, such as robust reconstruction that can handle missing or noisy data. By combining these innovations, the approach is able to generate photorealistic, animatable human avatars that can be used in immersive virtual environments and mixed reality applications.

Technical Explanation

The core of the method is a Gaussian splatting technique that represents each 3D point as a dual Gaussian function. This allows the system to model both the color and opacity of the point in a continuous way, capturing soft edges and gradients more accurately than previous approaches.

To make the splatting robust to real-world challenges like missing data or sensor noise, the authors introduce several enhancements. This includes a novel optimization procedure that jointly optimizes the Gaussian parameters and the underlying 3D geometry. There is also a content-aware filtering step that adaptively adjusts the splat sizes to preserve important details while removing noise.

The resulting volumetric representation can then be efficiently rendered using GPU-accelerated techniques like Gaussian splatting. This allows the system to generate high-quality, temporally coherent human-centric volumetric videos in real-time.

Critical Analysis

The paper presents a compelling approach for creating immersive volumetric videos of humans, with several clever technical innovations. However, the authors acknowledge some limitations:

The method currently requires a multi-camera setup to capture the 3D geometry, which may limit its real-world applicability
The optimization process can be computationally intensive, potentially making it challenging to scale to large scenes
The paper focuses on static human subjects, so additional work may be needed to handle dynamic motions and occlusions

Further research could explore ways to address these limitations, such as incorporating more efficient reconstruction techniques or extending the approach to handle more complex scenes and motions.

Conclusion

Overall, this paper makes an important contribution to the field of human performance capture and neural rendering. By developing a robust Gaussian splatting technique, the authors have demonstrated how to generate highly realistic and immersive volumetric videos of people. This technology has the potential to enable new applications in virtual reality, mixed reality, and other interactive experiences where human-centric content is crucial.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu

Volumetric video represents a transformative advancement in visual media, enabling users to freely navigate immersive virtual experiences and narrowing the gap between digital and real worlds. However, the need for extensive manual intervention to stabilize mesh sequences and the generation of excessively large assets in existing workflows impedes broader adoption. In this paper, we present a novel Gaussian-based approach, dubbed textit{DualGS}, for real-time and high-fidelity playback of complex human performance with excellent compression ratios. Our key idea in DualGS is to separately represent motion and appearance using the corresponding skin and joint Gaussians. Such an explicit disentanglement can significantly reduce motion redundancy and enhance temporal coherence. We begin by initializing the DualGS and anchoring skin Gaussians to joint Gaussians at the first frame. Subsequently, we employ a coarse-to-fine training strategy for frame-by-frame human performance modeling. It includes a coarse alignment phase for overall motion prediction as well as a fine-grained optimization for robust tracking and high-fidelity rendering. To integrate volumetric video seamlessly into VR environments, we efficiently compress motion using entropy encoding and appearance using codec compression coupled with a persistent codebook. Our approach achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame. We demonstrate the efficacy of our representation through photo-realistic, free-view experiences on VR headsets, enabling users to immersively watch musicians in performance and feel the rhythm of the notes at the performers' fingertips.

9/16/2024

👁️

SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length

Bangya Liu, Suman Banerjee

Recent advances in 3D Gaussian Splatting (3DGS) have garnered significant attention in computer vision and computer graphics due to its high rendering speed and remarkable quality. While extant research has endeavored to extend the application of 3DGS from static to dynamic scenes, such efforts have been consistently impeded by excessive model sizes, constraints on video duration, and content deviation. These limitations significantly compromise the streamability of dynamic 3D Gaussian models, thereby restricting their utility in downstream applications, including volumetric video, autonomous vehicle, and immersive technologies such as virtual, augmented, and mixed reality. This paper introduces SwinGS, a novel framework for training, delivering, and rendering volumetric video in a real-time streaming fashion. To address the aforementioned challenges and enhance streamability, SwinGS integrates spacetime Gaussian with Markov Chain Monte Carlo (MCMC) to adapt the model to fit various 3D scenes across frames, in the meantime employing a sliding window captures Gaussian snapshots for each frame in an accumulative way. We implement a prototype of SwinGS and demonstrate its streamability across various datasets and scenes. Additionally, we develop an interactive WebGL viewer enabling real-time volumetric video playback on most devices with modern browsers, including smartphones and tablets. Experimental results show that SwinGS reduces transmission costs by 83.6% compared to previous work with ignorable compromise in PSNR. Moreover, SwinGS easily scales to long video sequences without compromising quality.

9/14/2024

SG-GS: Photo-realistic Animatable Human Avatars with Semantically-Guided Gaussian Splatting

Haoyu Zhao, Chen Yang, Hao Wang, Xingyue Zhao, Wei Shen

Reconstructing photo-realistic animatable human avatars from monocular videos remains challenging in computer vision and graphics. Recently, methods using 3D Gaussians to represent the human body have emerged, offering faster optimization and real-time rendering. However, due to ignoring the crucial role of human body semantic information which represents the intrinsic structure and connections within the human body, they fail to achieve fine-detail reconstruction of dynamic human avatars. To address this issue, we propose SG-GS, which uses semantics-embedded 3D Gaussians, skeleton-driven rigid deformation, and non-rigid cloth dynamics deformation to create photo-realistic animatable human avatars from monocular videos. We then design a Semantic Human-Body Annotator (SHA) which utilizes SMPL's semantic prior for efficient body part semantic labeling. The generated labels are used to guide the optimization of Gaussian semantic attributes. To address the limited receptive field of point-level MLPs for local features, we also propose a 3D network that integrates geometric and semantic associations for human avatar deformation. We further implement three key strategies to enhance the semantic accuracy of 3D Gaussians and rendering quality: semantic projection with 2D regularization, semantic-guided density regularization and semantic-aware regularization with neighborhood consistency. Extensive experiments demonstrate that SG-GS achieves state-of-the-art geometry and appearance reconstruction performance.

8/20/2024

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, Xinggang Wang

Representing and rendering dynamic scenes has been an important but challenging task. Especially, to accurately model complex motions, high efficiency is usually hard to guarantee. To achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency, we propose 4D Gaussian Splatting (4D-GS) as a holistic representation for dynamic scenes rather than applying 3D-GS for each individual frame. In 4D-GS, a novel explicit representation containing both 3D Gaussians and 4D neural voxels is proposed. A decomposed neural voxel encoding algorithm inspired by HexPlane is proposed to efficiently build Gaussian features from 4D neural voxels and then a lightweight MLP is applied to predict Gaussian deformations at novel timestamps. Our 4D-GS method achieves real-time rendering under high resolutions, 82 FPS at an 800$times$800 resolution on an RTX 3090 GPU while maintaining comparable or better quality than previous state-of-the-art methods. More demos and code are available at https://guanjunwu.github.io/4dgs/.

7/16/2024