GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

Read original: arXiv:2404.07992 - Published 4/12/2024 by Jiang Wu, Rui Li, Haofei Xu, Wenxun Zhao, Yu Zhu, Jinqiu Sun, Yanning Zhang

GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

Overview

The paper proposes a new method called GoMVS (Geometrically Consistent Cost Aggregation for Multi-View Stereo) to improve the accuracy of 3D reconstruction from multiple images.
The key idea is to leverage geometric constraints to guide the cost aggregation process, leading to more consistent and accurate depth estimates.
GoMVS outperforms state-of-the-art learning-based multi-view stereo methods on several benchmark datasets.

Plain English Explanation

The paper presents a new technique called GoMVS that can create 3D models from multiple camera views more accurately than previous methods. The core insight is to use the geometric relationships between the camera views to guide the process of combining the depth information from each view into a final 3D reconstruction.

Typical multi-view stereo (MVS) methods estimate depth by comparing the visual features in each image and finding the depth that best explains the differences. However, this can lead to inconsistencies between the depth estimates from different views. GoMVS introduces a way to enforce geometric consistency, ensuring that the final 3D model is coherent and aligns properly with the input images.

By incorporating these geometric constraints, GoMVS is able to produce 3D reconstructions that are more accurate and faithful to the real-world scene compared to other state-of-the-art learning-based MVS approaches. This could have important applications in areas like 3D mapping, virtual/augmented reality, and digital content creation.

Technical Explanation

The paper introduces a new multi-view stereo (MVS) method called GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo. The key innovation is the use of geometric consistency constraints to guide the cost aggregation process, leading to more accurate and coherent depth estimates.

Typical learning-based MVS methods [1,2,3] estimate depth by comparing visual features across multiple views and finding the depth hypothesis that best explains the observed differences. However, this can result in inconsistencies between the depth maps from different views. GoMVS addresses this by enforcing geometric relationships, such as the epipolar constraint, during cost aggregation.

Specifically, GoMVS first computes a set of depth hypotheses and their associated matching costs for each input view. It then aggregates these costs across views, weighting them based on the geometric consistency with neighboring views. This encourages the final depth map to align with the underlying 3D scene structure.

The authors evaluate GoMVS on several standard MVS benchmarks, including DTU, Tanks and Temples, and Middlebury. The results show that GoMVS outperforms state-of-the-art learning-based MVS methods in terms of both geometric accuracy and visual quality of the reconstructed 3D models.

Critical Analysis

The paper presents a compelling approach to improving the accuracy of multi-view stereo reconstruction by leveraging geometric constraints. The authors demonstrate impressive results on standard benchmarks, suggesting that the GoMVS method offers tangible benefits over previous techniques.

However, the paper does not address several potential limitations and areas for future work. For example, the method assumes that the camera poses and intrinsic parameters are known a priori, which may not always be the case in real-world scenarios. It would be interesting to see how GoMVS could be extended to handle cases with unknown or noisy camera calibration.

Additionally, the paper focuses on evaluating GoMVS on offline benchmarks, but does not discuss its feasibility for real-time applications or large-scale 3D reconstruction tasks. Exploring the computational efficiency and scalability of the method could be a valuable direction for further research.

It would also be interesting to see how GoMVS compares to other recent advancements in multi-view stereo, such as the use of learnable cost volumes or depth fusion techniques. A more comprehensive comparison with the state-of-the-art could provide deeper insights into the strengths and limitations of the proposed approach.

Conclusion

The GoMVS method presented in this paper offers a promising new direction for improving the accuracy of multi-view stereo reconstruction. By incorporating geometric consistency constraints into the cost aggregation process, the technique is able to produce 3D models that are more faithful to the underlying scene structure compared to previous learning-based approaches.

The demonstrated performance gains on standard benchmarks suggest that GoMVS could have meaningful practical applications in areas like 3D mapping, virtual/augmented reality, and digital content creation. Further research to address the limitations and extend the method's capabilities could unlock even more powerful 3D reconstruction capabilities.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

Jiang Wu, Rui Li, Haofei Xu, Wenxun Zhao, Yu Zhu, Jinqiu Sun, Yanning Zhang

Matching cost aggregation plays a fundamental role in learning-based multi-view stereo networks. However, directly aggregating adjacent costs can lead to suboptimal results due to local geometric inconsistency. Related methods either seek selective aggregation or improve aggregated depth in the 2D space, both are unable to handle geometric inconsistency in the cost volume effectively. In this paper, we propose GoMVS to aggregate geometrically consistent costs, yielding better utilization of adjacent geometries. More specifically, we correspond and propagate adjacent costs to the reference pixel by leveraging the local geometric smoothness in conjunction with surface normals. We achieve this by the geometric consistent propagation (GCP) module. It computes the correspondence from the adjacent depth hypothesis space to the reference depth space using surface normals, then uses the correspondence to propagate adjacent costs to the reference geometry, followed by a convolution for aggregation. Our method achieves new state-of-the-art performance on DTU, Tanks & Temple, and ETH3D datasets. Notably, our method ranks 1st on the Tanks & Temple Advanced benchmark.

4/12/2024

🤿

Ghost-Stereo: GhostNet-based Cost Volume Enhancement and Aggregation for Stereo Matching Networks

Xingguang Jiang, Xiaofeng Bian, Chenggang Guo

Depth estimation based on stereo matching is a classic but popular computer vision problem, which has a wide range of real-world applications. Current stereo matching methods generally adopt the deep Siamese neural network architecture, and have achieved impressing performance by constructing feature matching cost volumes and using 3D convolutions for cost aggregation. However, most existing methods suffer from large number of parameters and slow running time due to the sequential use of 3D convolutions. In this paper, we propose Ghost-Stereo, a novel end-to-end stereo matching network. The feature extraction part of the network uses the GhostNet to form a U-shaped structure. The core of Ghost-Stereo is a GhostNet feature-based cost volume enhancement (Ghost-CVE) module and a GhostNet-inspired lightweight cost volume aggregation (Ghost-CVA) module. For the Ghost-CVE part, cost volumes are constructed and fused by the GhostNet-based features to enhance the spatial context awareness. For the Ghost-CVA part, a lightweight 3D convolution bottleneck block based on the GhostNet is proposed to reduce the computational complexity in this module. By combining with the context and geometry fusion module, a classical hourglass-shaped cost volume aggregate structure is constructed. Ghost-Stereo achieves a comparable performance than state-of-the-art real-time methods on several publicly benchmarks, and shows a better generalization ability.

5/24/2024

MSP-MVS: Multi-granularity Segmentation Prior Guided Multi-View Stereo

Zhenlong Yuan, Cong Liu, Fei Shen, Zhaoxin Li, Tianlu Mao, Zhaoqi Wang

Reconstructing textureless areas in MVS poses challenges due to the absence of reliable pixel correspondences within fixed patch. Although certain methods employ patch deformation to expand the receptive field, their patches mistakenly skip depth edges to calculate areas with depth discontinuity, thereby causing ambiguity. Consequently, we introduce Multi-granularity Segmentation Prior Multi-View Stereo (MSP-MVS). Specifically, we first propose multi-granularity segmentation prior by integrating multi-granularity depth edges to restrict patch deformation within homogeneous areas. Moreover, we present anchor equidistribution that bring deformed patches with more uniformly distributed anchors to ensure an adequate coverage of their own homogeneous areas. Furthermore, we introduce iterative local search optimization to represent larger patch with sparse representative candidates, significantly boosting the expressive capacity for each patch. The state-of-the-art results on ETH3D and Tanks & Temples benchmarks demonstrate the effectiveness and robust generalization ability of our proposed method.

9/17/2024

MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification

Zhuoxiao Li, Shanliang Yao, Yijie Chu, Angel F. Garcia-Fernandez, Yong Yue, Eng Gee Lim, Xiaohui Zhu

In the rapidly evolving field of 3D reconstruction, 3D Gaussian Splatting (3DGS) and 2D Gaussian Splatting (2DGS) represent significant advancements. Although 2DGS compresses 3D Gaussian primitives into 2D Gaussian surfels to effectively enhance mesh extraction quality, this compression can potentially lead to a decrease in rendering quality. Additionally, unreliable densification processes and the calculation of depth through the accumulation of opacity can compromise the detail of mesh extraction. To address this issue, we introduce MVG-Splatting, a solution guided by Multi-View considerations. Specifically, we integrate an optimized method for calculating normals, which, combined with image gradients, helps rectify inconsistencies in the original depth computations. Additionally, utilizing projection strategies akin to those in Multi-View Stereo (MVS), we propose an adaptive quantile-based method that dynamically determines the level of additional densification guided by depth maps, from coarse to fine detail. Experimental evidence demonstrates that our method not only resolves the issues of rendering quality degradation caused by depth discrepancies but also facilitates direct mesh extraction from dense Gaussian point clouds using the Marching Cubes algorithm. This approach significantly enhances the overall fidelity and accuracy of the 3D reconstruction process, ensuring that both the geometric details and visual quality.

7/17/2024