GenS: Generalizable Neural Surface Reconstruction from Multi-View Images

Read original: arXiv:2406.02495 - Published 6/5/2024 by Rui Peng, Xiaodong Gu, Luyang Tang, Shihe Shen, Fanqi Yu, Ronggang Wang
Total Score

0

GenS: Generalizable Neural Surface Reconstruction from Multi-View Images

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper presents a novel neural network architecture called GenS (Generalizable Neural Surface Reconstruction) that can reconstruct 3D surfaces from multi-view images.
  • GenS aims to overcome the limitations of previous methods that struggled with generalization to new scenes and objects.
  • The key innovation of GenS is its ability to learn a generalizable representation of 3D geometry that can be applied to diverse scenes and objects.

Plain English Explanation

GenS: Generalizable Neural Surface Reconstruction from Multi-View Images is a new deep learning method that can create 3D models from multiple camera views of an object or scene. Previous techniques had difficulty working well across a wide range of different objects and environments, but GenS has been designed to be more flexible and adaptable.

The core idea behind GenS is to learn a general representation of 3D shape that can be applied to many different situations, rather than relying on models trained on specific datasets. This allows the system to reconstruct 3D surfaces from images, even for objects or scenes it hasn't seen before. The researchers achieve this by using novel network architecture and training strategies that enable the model to grasp the underlying geometry in a more universal way.

UForeCon: Generalizable Sparse-View Surface Reconstruction from Unconstrained Multi-View Images and NC-SDF: Enhancing Indoor Scene Reconstruction Using Normal Constraints are some related techniques that also aim to improve the generalization of 3D reconstruction from images. GenS builds upon insights from these prior works.

Technical Explanation

The core of the GenS approach is a neural network that takes in multi-view images of an object or scene and outputs a 3D surface representation. A key innovation is the use of a novel "geometry-aware" network architecture that helps the model learn a more generalizable 3D representation.

This architecture includes components like a feature extraction backbone, a 3D-aware feature fusion module, and a surface prediction head. The 3D-aware fusion module is designed to effectively combine 2D features from different viewpoints into a coherent 3D understanding.

The training process also incorporates techniques like multi-scale supervision and data augmentation to further boost the model's generalization capabilities. GenS is evaluated on several 3D reconstruction benchmarks and demonstrates strong performance, exceeding prior state-of-the-art methods.

Critical Analysis

The authors acknowledge that GenS, like other 3D reconstruction techniques, may struggle with challenging cases like thin or transparent objects. They also note that the current implementation is limited to fixed-camera scenarios, and extending it to handle freely moving cameras would be an important area for future work.

Additionally, while GenS shows impressive generalization, there may still be room for improvement, especially for highly diverse or unusual object/scene categories that differ significantly from the training data. Exploring ways to make the underlying 3D representation even more adaptable could be a fruitful direction.

Overall, Geometry-Aware Reconstruction Fusion for Refined Rendering and Generalizable Implicit Surface Reconstruction and GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance represent important steps forward in making 3D reconstruction systems more versatile and applicable to a broader range of real-world scenarios.

Conclusion

The GenS method introduces a novel neural network architecture and training approach that enables more generalizable 3D surface reconstruction from multi-view images. By learning a more robust and adaptable 3D representation, GenS can be applied to a wider variety of objects and scenes compared to previous techniques.

This work represents a significant advance in the field of 3D reconstruction, with potential applications in areas like robotics, augmented reality, and digital content creation. The authors have made an important contribution to improving the flexibility and real-world applicability of neural 3D reconstruction systems.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

GenS: Generalizable Neural Surface Reconstruction from Multi-View Images
Total Score

0

GenS: Generalizable Neural Surface Reconstruction from Multi-View Images

Rui Peng, Xiaodong Gu, Luyang Tang, Shihe Shen, Fanqi Yu, Ronggang Wang

Combining the signed distance function (SDF) and differentiable volume rendering has emerged as a powerful paradigm for surface reconstruction from multi-view images without 3D supervision. However, current methods are impeded by requiring long-time per-scene optimizations and cannot generalize to new scenes. In this paper, we present GenS, an end-to-end generalizable neural surface reconstruction model. Unlike coordinate-based methods that train a separate network for each scene, we construct a generalized multi-scale volume to directly encode all scenes. Compared with existing solutions, our representation is more powerful, which can recover high-frequency details while maintaining global smoothness. Meanwhile, we introduce a multi-scale feature-metric consistency to impose the multi-view consistency in a more discriminative multi-scale feature space, which is robust to the failures of the photometric consistency. And the learnable feature can be self-enhanced to continuously improve the matching accuracy and mitigate aggregation ambiguity. Furthermore, we design a view contrast loss to force the model to be robust to those regions covered by few viewpoints through distilling the geometric prior from dense input to sparse input. Extensive experiments on popular benchmarks show that our model can generalize well to new scenes and outperform existing state-of-the-art methods even those employing ground-truth depth supervision. Code is available at https://github.com/prstrive/GenS.

Read more

6/5/2024

GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions
Total Score

0

GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions

Salvatore Esposito, Qingshan Xu, Kacper Kania, Charlie Hewitt, Octave Mariotti, Lohit Petikam, Julien Valentin, Arno Onken, Oisin Mac Aodha

We introduce a new generative approach for synthesizing 3D geometry and images from single-view collections. Most existing approaches predict volumetric density to render multi-view consistent images. By employing volumetric rendering using neural radiance fields, they inherit a key limitation: the generated geometry is noisy and unconstrained, limiting the quality and utility of the output meshes. To address this issue, we propose GeoGen, a new SDF-based 3D generative model trained in an end-to-end manner. Initially, we reinterpret the volumetric density as a Signed Distance Function (SDF). This allows us to introduce useful priors to generate valid meshes. However, those priors prevent the generative model from learning details, limiting the applicability of the method to real-world scenarios. To alleviate that problem, we make the transformation learnable and constrain the rendered depth map to be consistent with the zero-level set of the SDF. Through the lens of adversarial training, we encourage the network to produce higher fidelity details on the output meshes. For evaluation, we introduce a synthetic dataset of human avatars captured from 360-degree camera angles, to overcome the challenges presented by real-world datasets, which often lack 3D consistency and do not cover all camera angles. Our experiments on multiple datasets show that GeoGen produces visually and quantitatively better geometry than the previous generative models based on neural radiance fields.

Read more

6/17/2024

Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
Total Score

0

Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu

We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian parameters. 2) To further enhance performance, we propose a hybrid Gaussian rendering that integrates an efficient volume rendering design for novel view synthesis. 3) To support fast fine-tuning for specific scenes, we introduce a multi-view geometric consistent aggregation strategy to effectively aggregate the point clouds generated by the generalizable model, serving as the initialization for per-scene optimization. Compared with previous generalizable NeRF-based methods, which typically require minutes of fine-tuning and seconds of rendering per image, MVSGaussian achieves real-time rendering with better synthesis quality for each scene. Compared with the vanilla 3D-GS, MVSGaussian achieves better view synthesis with less training computational cost. Extensive experiments on DTU, Real Forward-facing, NeRF Synthetic, and Tanks and Temples datasets validate that MVSGaussian attains state-of-the-art performance with convincing generalizability, real-time rendering speed, and fast per-scene optimization.

Read more

7/16/2024

Surface-Centric Modeling for High-Fidelity Generalizable Neural Surface Reconstruction
Total Score

0

Surface-Centric Modeling for High-Fidelity Generalizable Neural Surface Reconstruction

Rui Peng, Shihe Shen, Kaiqiang Xiong, Huachen Gao, Jianbo Jiao, Xiaodong Gu, Ronggang Wang

Reconstructing the high-fidelity surface from multi-view images, especially sparse images, is a critical and practical task that has attracted widespread attention in recent years. However, existing methods are impeded by the memory constraint or the requirement of ground-truth depths and cannot recover satisfactory geometric details. To this end, we propose SuRF, a new Surface-centric framework that incorporates a new Region sparsification based on a matching Field, achieving good trade-offs between performance, efficiency and scalability. To our knowledge, this is the first unsupervised method achieving end-to-end sparsification powered by the introduced matching field, which leverages the weight distribution to efficiently locate the boundary regions containing surface. Instead of predicting an SDF value for each voxel, we present a new region sparsification approach to sparse the volume by judging whether the voxel is inside the surface region. In this way, our model can exploit higher frequency features around the surface with less memory and computational consumption. Extensive experiments on multiple benchmarks containing complex large-scale scenes show that our reconstructions exhibit high-quality details and achieve new state-of-the-art performance, i.e., 46% improvements with 80% less memory consumption. Code is available at https://github.com/prstrive/SuRF.

Read more

9/6/2024