VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing

Read original: arXiv:2404.07790 - Published 4/12/2024 by Meng Yu, Te Cui, Haoyang Lu, Yufeng Yue

VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing

Overview

Presents an end-to-end visible-infrared fusion network called VIFNet for image dehazing
Leverages the complementary information from visible and infrared images to improve dehazing performance
Proposes a novel cross-modal attention module to effectively fuse the features from visible and infrared inputs

Plain English Explanation

The paper introduces a new deep learning-based system called VIFNet that aims to improve the process of removing haze or fog from images. Hazy images can be difficult to see clearly, which can be a problem in various applications like autonomous driving or surveillance.

VIFNet takes advantage of two different types of images - visible (regular color) and infrared. Infrared images can capture information that is not visible to the human eye, and the researchers found that combining these two image modalities can lead to better dehazing results compared to using just one type of image.

The key innovation in VIFNet is a "cross-modal attention" module that learns how to effectively fuse the features extracted from the visible and infrared images. This allows the network to intelligently combine the complementary information from both inputs to produce a clearer, less hazy output image.

By leveraging the strengths of both visible and infrared imaging, VIFNet represents a promising approach to improving image dehazing, which could have important applications in fields like transportation, security, and more.

Technical Explanation

The paper proposes an end-to-end visible-infrared fusion network (VIFNet) for image dehazing. The system takes in a visible image and a corresponding infrared image, and outputs a dehaized visible image.

The core of VIFNet is a cross-modal attention module that learns to effectively fuse the features extracted from the visible and infrared inputs. This allows the network to leverage the complementary information present in the two modalities to produce superior dehazing results compared to using a single modality.

The overall VIFNet architecture consists of a feature extraction backbone, the cross-modal attention module, and a reconstruction head. The feature extractor encodes the visible and infrared inputs into compact feature representations. The cross-modal attention module then dynamically weights and combines these features to capture the most salient information for dehazing. Finally, the reconstruction head takes the fused features and outputs the final dehazed visible image.

The researchers evaluated VIFNet on several standard image dehazing benchmarks, and found that it outperformed state-of-the-art methods that use only visible or only infrared images. This demonstrates the value of their visible-infrared fusion approach enabled by the cross-modal attention module.

Critical Analysis

The paper provides a solid technical contribution by introducing a novel visible-infrared fusion network for image dehazing. The key strength of the work is the cross-modal attention module, which appears to effectively leverage the complementary information in the two input modalities.

However, the paper does not deeply explore the limitations of the approach. For example, it is unclear how VIFNet would perform in real-world scenarios where the visible and infrared images may not be perfectly aligned or synchronized. There is also no discussion of the computational or memory requirements of the network, which could be an important practical consideration.

Additionally, while the results on benchmark datasets are promising, further evaluation on more diverse and challenging real-world hazy images would help strengthen the claims about the method's effectiveness. Comparing VIFNet to multi-view fusion approaches could also provide additional insights.

Overall, the paper presents a compelling visible-infrared fusion architecture for image dehazing, but additional research is needed to fully understand the strengths, weaknesses, and practical applicability of the proposed VIFNet system.

Conclusion

This paper introduces VIFNet, an end-to-end visible-infrared fusion network for image dehazing. By leveraging a novel cross-modal attention module to effectively combine features from visible and infrared inputs, VIFNet is able to outperform state-of-the-art dehazing methods that use a single modality.

The work demonstrates the potential benefits of multimodal fusion for computer vision tasks like image dehazing, which could have important real-world applications in areas such as autonomous driving and surveillance. Further research is needed to fully explore the practical limitations and deployment considerations of the VIFNet approach, but this paper represents a promising step forward in visible-infrared image fusion for enhanced dehazing performance.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing

Meng Yu, Te Cui, Haoyang Lu, Yufeng Yue

Image dehazing poses significant challenges in environmental perception. Recent research mainly focus on deep learning-based methods with single modality, while they may result in severe information loss especially in dense-haze scenarios. The infrared image exhibits robustness to the haze, however, existing methods have primarily treated the infrared modality as auxiliary information, failing to fully explore its rich information in dehazing. To address this challenge, the key insight of this study is to design a visible-infrared fusion network for image dehazing. In particular, we propose a multi-scale Deep Structure Feature Extraction (DSFE) module, which incorporates the Channel-Pixel Attention Block (CPAB) to restore more spatial and marginal information within the deep structural features. Additionally, we introduce an inconsistency weighted fusion strategy to merge the two modalities by leveraging the more reliable information. To validate this, we construct a visible-infrared multimodal dataset called AirSim-VID based on the AirSim simulation platform. Extensive experiments performed on challenging real and simulated image datasets demonstrate that VIFNet can outperform many state-of-the-art competing methods. The code and dataset are available at https://github.com/mengyu212/VIFNet_dehazing.

4/12/2024

IAIFNet: An Illumination-Aware Infrared and Visible Image Fusion Network

Qiao Yang, Yu Zhang, Zijing Zhao, Jian Zhang, Shunli Zhang

Infrared and visible image fusion (IVIF) is used to generate fusion images with comprehensive features of both images, which is beneficial for downstream vision tasks. However, current methods rarely consider the illumination condition in low-light environments, and the targets in the fused images are often not prominent. To address the above issues, we propose an Illumination-Aware Infrared and Visible Image Fusion Network, named as IAIFNet. In our framework, an illumination enhancement network first estimates the incident illumination maps of input images. Afterwards, with the help of proposed adaptive differential fusion module (ADFM) and salient target aware module (STAM), an image fusion network effectively integrates the salient features of the illumination-enhanced infrared and visible images into a fusion image of high visual quality. Extensive experimental results verify that our method outperforms five state-of-the-art methods of fusing infrared and visible images.

5/28/2024

A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion

Xiaoli Zhang, Liying Wang, Libo Zhao, Xiongfei Li, Siwei Ma

Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images. To tackle the problem of insufficient feature extraction and lack of semantic awareness for complex scenes, this paper focuses on how to model correlation-driven decomposing features and reason high-level graph representation by efficiently extracting complementary features and multi-guided feature aggregation. We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy. The transformer with Multi-Dconv Transposed Attention and Local-enhanced Feed Forward network is used to extract shallow features after the depthwise convolution. In the three parallel branches encoder, Cross Attention and Invertible Block (CAI) enables to extract local features and preserve high-frequency texture details. Base feature extraction module (BFE) with residual connections can capture long-range dependency and enhance shared-modality expression capabilities. Graph Reasoning Module (GR) is introduced to reason high-level cross-modality relations and extract low-level details features as CAI's specific-modality complementary information simultaneously. Experiments demonstrate that our method has obtained competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks. Moreover, we surpass other fusion methods in terms of subsequent tasks, averagely scoring 9.78% [email protected] higher in object detection and 6.46% mIoU higher in semantic segmentation.

7/9/2024

AHDGAN: An Attention-Based Generator and Heterogeneous Dual-Discriminator Generative Adversarial Network for Infrared and Visible Image Fusion

Guosheng Lu, Zile Fang, Jiaju Tian, Haowen Huang, Yuelong Xu, Zhuolin Han, Yaoming Kang, Can Feng, Zhigang Zhao

Infrared and visible image fusion (IVIF) aims to preserve thermal radiation information from infrared images while integrating texture details from visible images. Thermal radiation information is mainly expressed through image intensities, while texture details are typically expressed through image gradients. However, existing dual-discriminator generative adversarial networks (GANs) often rely on two structurally identical discriminators for learning, which do not fully account for the distinct learning needs of infrared and visible image information. To this end, this paper proposes a novel GAN with a heterogeneous dual-discriminator network and an attention-based fusion strategy (GAN-HA). Specifically, recognizing the intrinsic differences between infrared and visible images, we propose, for the first time, a novel heterogeneous dual-discriminator network to simultaneously capture thermal radiation information and texture details. The two discriminators in this network are structurally different, including a salient discriminator for infrared images and a detailed discriminator for visible images. They are able to learn rich image intensity information and image gradient information, respectively. In addition, a new attention-based fusion strategy is designed in the generator to appropriately emphasize the learned information from different source images, thereby improving the information representation ability of the fusion result. In this way, the fused images generated by GAN-HA can more effectively maintain both the salience of thermal targets and the sharpness of textures. Extensive experiments on various public datasets demonstrate the superiority of GAN-HA over other state-of-the-art (SOTA) algorithms while showcasing its higher potential for practical applications.

9/5/2024