OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering

2404.08449

YC

0

Reddit

1

Published 4/16/2024 by Jingrui Ye, Zongkai Zhang, Yujiao Jiang, Qingmin Liao, Wenming Yang, Zongqing Lu
OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering

Abstract

Rendering dynamic 3D human from monocular videos is crucial for various applications such as virtual reality and digital entertainment. Most methods assume the people is in an unobstructed scene, while various objects may cause the occlusion of body parts in real-life scenarios. Previous method utilizing NeRF for surface rendering to recover the occluded areas, but it requiring more than one day to train and several seconds to render, failing to meet the requirements of real-time interactive applications. To address these issues, we propose OccGaussian based on 3D Gaussian Splatting, which can be trained within 6 minutes and produces high-quality human renderings up to 160 FPS with occluded input. OccGaussian initializes 3D Gaussian distributions in the canonical space, and we perform occlusion feature query at occluded regions, the aggregated pixel-align feature is extracted to compensate for the missing information. Then we use Gaussian Feature MLP to further process the feature along with the occlusion-aware loss functions to better perceive the occluded area. Extensive experiments both in simulated and real-world occlusions, demonstrate that our method achieves comparable or even superior performance compared to the state-of-the-art method. And we improving training and inference speeds by 250x and 800x, respectively. Our code will be available for research purposes.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper proposes a method called "OccGaussian" for rendering occluded humans in 3D scenes using Gaussian splatting.
  • The key idea is to represent human body parts as 3D Gaussian distributions, which can be efficiently rendered and composited even when occluded by other objects.
  • The approach is designed to handle complex occlusion scenarios and produce realistic results in real-time.

Plain English Explanation

The paper introduces a new way to render human figures in 3D computer graphics, even when they are partially hidden or obscured by other objects. The key insight is to represent the different parts of the human body (like the head, torso, and limbs) as 3D Gaussian distributions, which are mathematical shapes that can be efficiently drawn and combined.

This Gaussian splatting approach allows the system to handle complex occlusions, where some body parts are blocked by other objects in the scene. By modeling the body parts as probabilistic distributions rather than rigid shapes, the system can seamlessly integrate the visible and occluded regions to produce a natural-looking result.

The real-time performance of this method makes it well-suited for interactive applications like video games and augmented reality, where the human figures need to be rendered quickly as the scene changes. The robust nature of the Gaussian representation also helps the system cope with noisy or incomplete input data, further expanding its potential use cases.

Technical Explanation

The core of the OccGaussian method is the representation of the human body as a set of 3D Gaussian distributions, one for each major body part. This Gaussian splatting approach allows the system to efficiently render and composite the body parts, even when some of them are occluded by other objects in the scene.

The system first estimates the 3D pose and shape of the human figure from sensor data, such as RGB-D cameras or motion capture systems. It then fits a set of Gaussian distributions to the different body parts, capturing their size, orientation, and position in the 3D scene. These Gaussian distributions can be efficiently rendered using standard graphics techniques, and their probabilistic nature allows them to seamlessly handle occlusions.

During rendering, the system composites the Gaussian distributions for the visible body parts, while attenuating the contributions of the occluded regions based on the depth information from the scene. This 3D geometry-aware approach produces realistic results, even in complex occlusion scenarios.

The real-time performance of the OccGaussian method is achieved through the efficient rendering of the Gaussian distributions and the use of GPU-accelerated techniques. This makes the approach suitable for interactive applications, such as video games and augmented reality, where human figures need to be rendered quickly and seamlessly.

Critical Analysis

The OccGaussian method represents a significant advancement in the rendering of occluded humans in 3D scenes. By using a probabilistic Gaussian representation, the system is able to handle complex occlusions in a robust and efficient manner, outperforming traditional approaches that rely on more rigid body models.

One potential limitation of the method is the accuracy of the initial 3D pose and shape estimation, as errors in this step could lead to inaccuracies in the final rendered result. The authors acknowledge this issue and suggest that future work could focus on improving the robustness of the pose and shape estimation, perhaps by integrating it more tightly with the rendering process.

Another area for further research could be the extension of the OccGaussian method to handle more complex occlusion scenarios, such as when multiple human figures are present in the scene and occlude each other. The 3D geometry-aware nature of the current approach suggests that it could be possible to handle such cases, but additional algorithmic developments may be required.

Overall, the OccGaussian method represents a significant step forward in the field of human rendering under occlusion, and the authors have demonstrated its effectiveness through a series of compelling experiments and real-world applications.

Conclusion

The OccGaussian method introduced in this paper provides a novel approach to rendering occluded humans in 3D scenes. By representing the human body as a set of 3D Gaussian distributions, the system is able to efficiently handle complex occlusions and produce realistic results in real-time.

This work has important implications for a wide range of applications, from video games and augmented reality to robotics and virtual cinematography. By enabling the seamless integration of visible and occluded human figures, the OccGaussian method can help create more immersive and believable 3D experiences.

While the current method has some limitations, the authors have outlined promising directions for future research, such as improving the pose and shape estimation and extending the approach to handle more complex occlusion scenarios. As the field of human rendering continues to evolve, the OccGaussian method stands as an important contribution that could help drive further advancements in this important area of computer graphics and vision.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Gaussian Splatting SLAM

Gaussian Splatting SLAM

Hidenobu Matsuki, Riku Murai, Paul H. J. Kelly, Andrew J. Davison

YC

0

Reddit

0

We present the first application of 3D Gaussian Splatting in monocular SLAM, the most fundamental but the hardest setup for Visual SLAM. Our method, which runs live at 3fps, utilises Gaussians as the only 3D representation, unifying the required representation for accurate, efficient tracking, mapping, and high-quality rendering. Designed for challenging monocular settings, our approach is seamlessly extendable to RGB-D SLAM when an external depth sensor is available. Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera. First, to move beyond the original 3DGS algorithm, which requires accurate poses from an offline Structure from Motion (SfM) system, we formulate camera tracking for 3DGS using direct optimisation against the 3D Gaussians, and show that this enables fast and robust tracking with a wide basin of convergence. Second, by utilising the explicit nature of the Gaussians, we introduce geometric verification and regularisation to handle the ambiguities occurring in incremental 3D dense reconstruction. Finally, we introduce a full SLAM system which not only achieves state-of-the-art results in novel view synthesis and trajectory estimation but also reconstruction of tiny and even transparent objects.

Read more

4/16/2024

🏷️

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Shunyuan Zheng, Boyao Zhou, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, Yebin Liu

YC

0

Reddit

0

We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner. The proposed method enables 2K-resolution rendering under a sparse-view camera setting. Unlike the original Gaussian Splatting or neural implicit rendering methods that necessitate per-subject optimizations, we introduce Gaussian parameter maps defined on the source views and regress directly Gaussian Splatting properties for instant novel view synthesis without any fine-tuning or optimization. To this end, we train our Gaussian parameter regression module on a large amount of human scan data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable and experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.

Read more

4/17/2024

SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians

SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians

Hiba Dahmani, Moussab Bennehar, Nathan Piasco, Luis Roldao, Dzmitry Tsishkou

YC

0

Reddit

0

Implicit neural representation methods have shown impressive advancements in learning 3D scenes from unstructured in-the-wild photo collections but are still limited by the large computational cost of volumetric rendering. More recently, 3D Gaussian Splatting emerged as a much faster alternative with superior rendering quality and training efficiency, especially for small-scale and object-centric scenarios. Nevertheless, this technique suffers from poor performance on unstructured in-the-wild data. To tackle this, we extend over 3D Gaussian Splatting to handle unstructured image collections. We achieve this by modeling appearance to seize photometric variations in the rendered images. Additionally, we introduce a new mechanism to train transient Gaussians to handle the presence of scene occluders in an unsupervised manner. Experiments on diverse photo collection scenes and multi-pass acquisition of outdoor landmarks show the effectiveness of our method over prior works achieving state-of-the-art results with improved efficiency.

Read more

4/8/2024

Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses

Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses

Inhee Lee, Byungjun Kim, Hanbyul Joo

YC

0

Reddit

0

In this paper, we present a method to reconstruct the world and multiple dynamic humans in 3D from a monocular video input. As a key idea, we represent both the world and multiple humans via the recently emerging 3D Gaussian Splatting (3D-GS) representation, enabling to conveniently and efficiently compose and render them together. In particular, we address the scenarios with severely limited and sparse observations in 3D human reconstruction, a common challenge encountered in the real world. To tackle this challenge, we introduce a novel approach to optimize the 3D-GS representation in a canonical space by fusing the sparse cues in the common space, where we leverage a pre-trained 2D diffusion model to synthesize unseen views while keeping the consistency with the observed 2D appearances. We demonstrate our method can reconstruct high-quality animatable 3D humans in various challenging examples, in the presence of occlusion, image crops, few-shot, and extremely sparse observations. After reconstruction, our method is capable of not only rendering the scene in any novel views at arbitrary time instances, but also editing the 3D scene by removing individual humans or applying different motions for each human. Through various experiments, we demonstrate the quality and efficiency of our methods over alternative existing approaches.

Read more

4/23/2024