Double-Shot 3D Shape Measurement with a Dual-Branch Network

Read original: arXiv:2407.14198 - Published 7/22/2024 by Mingyang Lei, Jingfan Fan, Long Shao, Hong Song, Deqiang Xiao, Danni Ai, Tianyu Fu, Ying Gu, Jian Yang

🌐

Overview

Structured light (SL)-based 3D measurement techniques using deep learning, like speckle projection profilometry (SPP) and fringe projection profilometry (FPP), are widely studied.
These methods typically use a single projection pattern for reconstruction, leading to fringe order ambiguity or poor reconstruction accuracy.
To address these issues, the paper proposes a parallel dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) that leverages convolutional operations and self-attention mechanisms for processing different SL modalities.

Plain English Explanation

The paper describes a new approach for accurately measuring 3D shapes using structured light techniques and deep learning. Structured light methods project patterns of light onto an object and analyze the distortion of these patterns to reconstruct the 3D shape.

Two common structured light techniques are speckle projection profilometry (SPP) and fringe projection profilometry (FPP). These methods typically use a single projection pattern, which can lead to ambiguity in the measurements or inaccurate reconstruction of the 3D shape, especially near object boundaries.

To address these limitations, the researchers propose a new Parallel Dual-branch CNN-Transformer Network (PDCNet). This network combines the strengths of convolutional neural networks (CNNs), which are good at capturing local details, and transformers, which excel at modeling global relationships.

Within PDCNet, the Transformer branch is used to capture the global structure and patterns in the fringe images, while the CNN branch focuses on collecting local details from the speckle images. The network also includes a Double-stream Attention Aggregation Module (DAAM) that dynamically combines the local and global representations to optimize the 3D reconstruction.

Additionally, the paper introduces an Adaptive Mixture Density Head that can better handle object boundaries and discontinuities compared to standard regression-based approaches. This helps improve the overall accuracy of the 3D measurements.

Technical Explanation

The key elements of the proposed PDCNet architecture are:

Dual-branch Design: The network has two parallel branches - a CNN branch for processing speckle images and a Transformer branch for processing fringe images. This allows the model to leverage the complementary strengths of these two types of deep learning architectures.
Double-stream Attention Aggregation Module (DAAM): This module dynamically fuses the local and global features from the CNN and Transformer branches, respectively. It uses a parallel attention subnetwork to selectively combine multi-scale spatial information.
Adaptive Mixture Density Head: Instead of a standard regression-based output, the network uses an adaptive mixture density head with a bimodal Gaussian distribution. This allows the model to better represent discontinuities and object boundaries, improving the overall 3D reconstruction accuracy.

The researchers evaluate their PDCNet approach on a self-made dataset and demonstrate that it can reduce fringe order ambiguity while producing high-accuracy 3D reconstructions. They also show that the proposed architecture has potential applications in infrared-visible image fusion tasks.

Critical Analysis

The paper presents a well-designed and innovative approach to address the limitations of existing structured light 3D measurement techniques. The use of a parallel CNN-Transformer architecture and the adaptive mixture density head are particularly interesting contributions.

However, the paper does not provide much discussion on the potential limitations or caveats of the proposed method. For example, it would be helpful to know how the method performs on challenging scenarios, such as highly reflective or transparent surfaces, or how it scales to larger and more complex scenes.

Additionally, the paper could have explored the trade-offs between the increased complexity of the PDCNet architecture and the improvements in 3D reconstruction accuracy. It would be valuable to understand the computational and memory requirements of the proposed model and how they compare to simpler, single-branch approaches.

Further research could also investigate the generalizability of the PDCNet architecture to other 3D sensing modalities beyond structured light, such as time-of-flight or stereo vision. Exploring the potential of the adaptive mixture density head in these other contexts could also be an interesting avenue for future work.

Conclusion

The proposed Parallel Dual-branch CNN-Transformer Network (PDCNet) is a innovative approach to improve the accuracy of 3D reconstruction using structured light techniques and deep learning. By leveraging the strengths of both convolutional neural networks and transformers, and introducing an adaptive mixture density head, the method can effectively reduce fringe order ambiguity and produce high-quality 3D measurements.

The paper demonstrates the potential of this architecture not only for structured light 3D sensing, but also for broader applications in multimodal fusion, such as infrared-visible image processing. While the paper could have provided more discussion on the limitations and trade-offs of the approach, the overall contributions represent an important step forward in the field of 3D computer vision.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌐

Double-Shot 3D Shape Measurement with a Dual-Branch Network

Mingyang Lei, Jingfan Fan, Long Shao, Hong Song, Deqiang Xiao, Danni Ai, Tianyu Fu, Ying Gu, Jian Yang

The structured light (SL)-based 3D measurement techniques with deep learning have been widely studied, among which speckle projection profilometry (SPP) and fringe projection profilometry (FPP) are two popular methods. However, they generally use a single projection pattern for reconstruction, resulting in fringe order ambiguity or poor reconstruction accuracy. To alleviate these problems, we propose a parallel dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet), to take advantage of convolutional operations and self-attention mechanisms for processing different SL modalities. Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images. To fully integrate complementary features, we design a double-stream attention aggregation module (DAAM) that consist of a parallel attention subnetwork for aggregating multi-scale spatial structure information. This module can dynamically retain local and global representations to the maximum extent. Moreover, an adaptive mixture density head with bimodal Gaussian distribution is proposed for learning a representation that is precise near discontinuities. Compared to the standard disparity regression strategy, this adaptive mixture head can effectively improves performance at object boundaries. Extensive experiments demonstrate that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset. We also show that the proposed architecture reveals the potential in infrared-visible image fusion task.

7/22/2024

↗️

Dual-Scale Transformer for Large-Scale Single-Pixel Imaging

Gang Qu, Ping Wang, Xin Yuan

Single-pixel imaging (SPI) is a potential computational imaging technique which produces image by solving an illposed reconstruction problem from few measurements captured by a single-pixel detector. Deep learning has achieved impressive success on SPI reconstruction. However, previous poor reconstruction performance and impractical imaging model limit its real-world applications. In this paper, we propose a deep unfolding network with hybrid-attention Transformer on Kronecker SPI model, dubbed HATNet, to improve the imaging quality of real SPI cameras. Specifically, we unfold the computation graph of the iterative shrinkagethresholding algorithm (ISTA) into two alternative modules: efficient tensor gradient descent and hybrid-attention multiscale denoising. By virtue of Kronecker SPI, the gradient descent module can avoid high computational overheads rooted in previous gradient descent modules based on vectorized SPI. The denoising module is an encoder-decoder architecture powered by dual-scale spatial attention for high- and low-frequency aggregation and channel attention for global information recalibration. Moreover, we build a SPI prototype to verify the effectiveness of the proposed method. Extensive experiments on synthetic and real data demonstrate that our method achieves the state-of-the-art performance. The source code and pre-trained models are available at https://github.com/Gang-Qu/HATNet-SPI.

4/9/2024

Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement

Zishu Yao, Guodong Fan, Jinfu Fan, Min Gan, C. L. Philip Chen

Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range correlations in such images. On the other hand, transformer-based methods that focus on global information face high computational complexities when processing high-resolution remote sensing images. From another perspective, Fourier transform can compute global information without introducing a large number of parameters, enabling the network to more efficiently capture the overall image structure and establish long-range correlations. Therefore, we propose a Dual-Domain Feature Fusion Network (DFFN) for low-light remote sensing image enhancement. Specifically, this challenging task of low-light enhancement is divided into two more manageable sub-tasks: the first phase learns amplitude information to restore image brightness, and the second phase learns phase information to refine details. To facilitate information exchange between the two phases, we designed an information fusion affine block that combines data from different phases and scales. Additionally, we have constructed two dark light remote sensing datasets to address the current lack of datasets in dark light remote sensing image enhancement. Extensive evaluations show that our method outperforms existing state-of-the-art methods. The code is available at https://github.com/iijjlk/DFFN.

9/9/2024

Lightweight Multiscale Feature Fusion Super-Resolution Network Based on Two-branch Convolution and Transformer

Li Ke, Liu Yukai

The single image super-resolution(SISR) algorithms under deep learning currently have two main models, one based on convolutional neural networks and the other based on Transformer. The former uses the stacking of convolutional layers with different convolutional kernel sizes to design the model, which enables the model to better extract the local features of the image; the latter uses the self-attention mechanism to design the model, which allows the model to establish long-distance dependencies between image pixel points through the self-attention mechanism and then better extract the global features of the image. However, both of the above methods face their problems. Based on this, this paper proposes a new lightweight multi-scale feature fusion network model based on two-way complementary convolutional and Transformer, which integrates the respective features of Transformer and convolutional neural networks through a two-branch network architecture, to realize the mutual fusion of global and local information. Meanwhile, considering the partial loss of information caused by the low-pixel images trained by the deep neural network, this paper designs a modular connection method of multi-stage feature supplementation to fuse the feature maps extracted from the shallow stage of the model with those extracted from the deep stage of the model, to minimize the loss of the information in the feature images that is beneficial to the image restoration as much as possible, to facilitate the obtaining of a higher-quality restored image. The practical results finally show that the model proposed in this paper is optimal in image recovery performance when compared with other lightweight models with the same amount of parameters.

9/11/2024