DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion

Read original: arXiv:2409.11642 - Published 9/19/2024 by Jian Xu, Xin He

DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion

Overview

Proposes a Dual-Branch Feature Decomposition Fusion Network (DAF-Net) for fusing infrared and visible images
Employs a dual-branch architecture with feature decomposition and domain adaptation
Aims to enhance the quality and clarity of fused images by leveraging complementary information from infrared and visible spectra

Plain English Explanation

The paper introduces a novel Dual-Branch Feature Decomposition Fusion Network (DAF-Net) for fusing infrared and visible images. Infrared and visible images capture different types of information - infrared images are sensitive to heat sources, while visible images capture color and texture details. By fusing these two types of images, the resulting fused image can provide a more comprehensive and informative representation of the scene.

The key innovation of DAF-Net is its dual-branch architecture, which allows the network to separately process and decompose the features from the infrared and visible inputs. This feature decomposition helps to preserve the unique characteristics of each input modality. Additionally, the network employs a domain adaptation module to align the feature distributions between the infrared and visible branches, enabling more effective feature fusion.

The goal of this approach is to enhance the quality and clarity of the fused image, making it more useful for applications like surveillance, navigation, and target detection, where both thermal and visual information are important.

Technical Explanation

The DAF-Net architecture consists of two main components:

Dual-Branch Feature Decomposition: The network has two parallel branches, one for processing the infrared input and one for the visible input. Each branch uses a series of convolutional layers to extract features from the respective input modality. Importantly, the features from the two branches are kept separate, rather than being immediately combined.
Domain Adaptive Fusion: After the feature extraction, the network employs a domain adaptation module to align the feature distributions between the infrared and visible branches. This is achieved using a multi-kernel maximum mean discrepancy (MK-MMD) loss, which helps to reduce the domain gap between the two modalities. The adapted features are then fused using a hybrid kernel function.

The network is trained end-to-end using a combination of reconstruction loss, perceptual loss, and the domain adaptation loss. This encourages the model to learn features that capture the complementary information from the infrared and visible inputs, while also aligning the feature distributions for more effective fusion.

Critical Analysis

The DAF-Net paper presents a well-designed and thorough approach to infrared-visible image fusion. The use of a dual-branch architecture with feature decomposition is a thoughtful strategy to preserve the unique characteristics of each input modality. The incorporation of domain adaptation is also a valuable contribution, as it helps to bridge the gap between the different input modalities.

One potential limitation of the work is the reliance on the MK-MMD loss for domain adaptation. While effective, this loss function may be computationally intensive and could potentially limit the scalability of the approach. Additionally, the paper does not provide a detailed analysis of the runtime or memory requirements of the proposed network, which would be useful for understanding its practical feasibility.

Furthermore, the paper could have benefited from a more comprehensive evaluation, including comparisons to a wider range of state-of-the-art fusion methods and a deeper exploration of the qualitative and quantitative trade-offs between the proposed approach and alternative techniques.

Conclusion

The DAF-Net paper presents an innovative approach to infrared-visible image fusion, leveraging a dual-branch architecture with feature decomposition and domain adaptation. This strategy helps to preserve the unique characteristics of the input modalities and enhance the quality and clarity of the fused output. The technical implementation is well-designed, though some areas for further investigation, such as computational complexity and a more extensive evaluation, could strengthen the work. Overall, the proposed DAF-Net offers a compelling solution for applications that require the synthesis of thermal and visual information.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion

Jian Xu, Xin He

Infrared and visible image fusion aims to combine complementary information from both modalities to provide a more comprehensive scene understanding. However, due to the significant differences between the two modalities, preserving key features during the fusion process remains a challenge. To address this issue, we propose a dual-branch feature decomposition fusion network (DAF-Net) with domain adaptive, which introduces Multi-Kernel Maximum Mean Discrepancy (MK-MMD) into the base encoder and designs a hybrid kernel function suitable for infrared and visible image fusion. The base encoder built on the Restormer network captures global structural information while the detail encoder based on Invertible Neural Networks (INN) focuses on extracting detail texture information. By incorporating MK-MMD, the DAF-Net effectively aligns the latent feature spaces of visible and infrared images, thereby improving the quality of the fused images. Experimental results demonstrate that the proposed method outperforms existing techniques across multiple datasets, significantly enhancing both visual quality and fusion performance. The related Python code is available at https://github.com/xujian000/DAF-Net.

9/19/2024

VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing

Meng Yu, Te Cui, Haoyang Lu, Yufeng Yue

Image dehazing poses significant challenges in environmental perception. Recent research mainly focus on deep learning-based methods with single modality, while they may result in severe information loss especially in dense-haze scenarios. The infrared image exhibits robustness to the haze, however, existing methods have primarily treated the infrared modality as auxiliary information, failing to fully explore its rich information in dehazing. To address this challenge, the key insight of this study is to design a visible-infrared fusion network for image dehazing. In particular, we propose a multi-scale Deep Structure Feature Extraction (DSFE) module, which incorporates the Channel-Pixel Attention Block (CPAB) to restore more spatial and marginal information within the deep structural features. Additionally, we introduce an inconsistency weighted fusion strategy to merge the two modalities by leveraging the more reliable information. To validate this, we construct a visible-infrared multimodal dataset called AirSim-VID based on the AirSim simulation platform. Extensive experiments performed on challenging real and simulated image datasets demonstrate that VIFNet can outperform many state-of-the-art competing methods. The code and dataset are available at https://github.com/mengyu212/VIFNet_dehazing.

4/12/2024

A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion

Xiaoli Zhang, Liying Wang, Libo Zhao, Xiongfei Li, Siwei Ma

Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images. To tackle the problem of insufficient feature extraction and lack of semantic awareness for complex scenes, this paper focuses on how to model correlation-driven decomposing features and reason high-level graph representation by efficiently extracting complementary features and multi-guided feature aggregation. We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy. The transformer with Multi-Dconv Transposed Attention and Local-enhanced Feed Forward network is used to extract shallow features after the depthwise convolution. In the three parallel branches encoder, Cross Attention and Invertible Block (CAI) enables to extract local features and preserve high-frequency texture details. Base feature extraction module (BFE) with residual connections can capture long-range dependency and enhance shared-modality expression capabilities. Graph Reasoning Module (GR) is introduced to reason high-level cross-modality relations and extract low-level details features as CAI's specific-modality complementary information simultaneously. Experiments demonstrate that our method has obtained competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks. Moreover, we surpass other fusion methods in terms of subsequent tasks, averagely scoring 9.78% [email protected] higher in object detection and 6.46% mIoU higher in semantic segmentation.

7/9/2024

🌐

Multi-Scale Direction-Aware Network for Infrared Small Target Detection

Jinmiao Zhao, Zelin Shi, Chuang Yu, Yunpeng Liu

Infrared small target detection faces the problem that it is difficult to effectively separate the background and the target. Existing deep learning-based methods focus on appearance features and ignore high-frequency directional features. Therefore, we propose a multi-scale direction-aware network (MSDA-Net), which is the first attempt to integrate the high-frequency directional features of infrared small targets as domain prior knowledge into neural networks. Specifically, an innovative multi-directional feature awareness (MDFA) module is constructed, which fully utilizes the prior knowledge of targets and emphasizes the focus on high-frequency directional features. On this basis, combined with the multi-scale local relation learning (MLRL) module, a multi-scale direction-aware (MSDA) module is further constructed. The MSDA module promotes the full extraction of local relations at different scales and the full perception of key features in different directions. Meanwhile, a high-frequency direction injection (HFDI) module without training parameters is constructed to inject the high-frequency directional information of the original image into the network. This helps guide the network to pay attention to detailed information such as target edges and shapes. In addition, we propose a feature aggregation (FA) structure that aggregates multi-level features to solve the problem of small targets disappearing in deep feature maps. Furthermore, a lightweight feature alignment fusion (FAF) module is constructed, which can effectively alleviate the pixel offset existing in multi-level feature map fusion. Extensive experimental results show that our MSDA-Net achieves state-of-the-art (SOTA) results on the public NUDT-SIRST, SIRST and IRSTD-1k datasets.

6/5/2024