Searching a Compact Architecture for Robust Multi-Exposure Image Fusion

Read original: arXiv:2305.12236 - Published 8/27/2024 by Zhu Liu, Jinyuan Liu, Guanyao Wu, Zihang Chen, Xin Fan, Risheng Liu

🖼️

Overview

Learning-based methods have made significant progress in multi-exposure image fusion.
However, two major challenges hinder further development: pixel misalignment and inefficient inference.
Existing methods rely on aligned image pairs, making them susceptible to artifacts from device motion.
Current techniques often use complex, handcrafted architectures with redundant parameters, reducing inference efficiency and flexibility.

Plain English Explanation

To overcome the limitations of existing methods, this study introduces a new approach that combines architecture search with self-alignment and detail repletion modules.

The self-alignment module addresses the problem of pixel misalignment by leveraging scene relighting to ensure proper illumination for the subsequent alignment and feature extraction steps. This helps the method handle extreme differences in exposure between the input images.

The detail repletion module enhances the texture details of the scenes, complementing the alignment process.

Finally, the researchers incorporated a hardware-sensitive constraint into the architecture search to explore compact and efficient networks for the fusion task. This aims to improve the inference speed and flexibility of the final model.

Overall, this work presents a comprehensive solution to the key challenges in multi-exposure image fusion, demonstrating significant improvements in both performance and efficiency compared to existing approaches.

Technical Explanation

The proposed method incorporates an architecture search-based paradigm that includes self-alignment and detail repletion modules to address the issues of pixel misalignment and inefficient inference.

The self-alignment module is designed to handle the extreme discrepancy in exposure between the input images. It leverages scene relighting to constrain the illumination degree, which helps to align the images and extract robust features.

The detail repletion module is introduced to enhance the texture details of the scenes, complementing the alignment process and ensuring that important visual information is preserved.

Additionally, the researchers present a fusion-oriented architecture search that incorporates a hardware-sensitive constraint. This allows the exploration of compact and efficient network architectures for the multi-exposure image fusion task, improving inference time and flexibility.

The proposed method outperforms various competitive schemes, achieving a 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios. Furthermore, it significantly reduces inference time by 69.1%.

Critical Analysis

The authors acknowledge that their method relies on the effectiveness of the architecture search process, which can be computationally intensive. Additionally, the performance of the self-alignment module may be influenced by the accuracy of the scene relighting algorithm, which could be an area for further improvement.

While the proposed method demonstrates substantial gains in both performance and efficiency, it would be valuable to investigate its robustness to more diverse and challenging multi-exposure scenarios, such as those with complex lighting conditions or significant occlusions.

Furthermore, the authors could explore the integration of semantic-aware or equivariant fusion techniques to further enhance the method's ability to preserve meaningful scene details and structures.

Conclusion

This research presents a novel approach to multi-exposure image fusion that addresses the key challenges of pixel misalignment and inefficient inference. By incorporating self-alignment, detail repletion, and hardware-sensitive architecture search, the proposed method demonstrates significant improvements in both performance and efficiency compared to existing techniques.

The findings of this study have the potential to drive advancements in various applications, such as computational photography, image enhancement, and visual analysis, where robust and efficient multi-exposure fusion is crucial. The open-sourcing of the code will also enable further exploration and refinement of the proposed ideas by the research community.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

Searching a Compact Architecture for Robust Multi-Exposure Image Fusion

Zhu Liu, Jinyuan Liu, Guanyao Wu, Zihang Chen, Xin Fan, Risheng Liu

In recent years, learning-based methods have achieved significant advancements in multi-exposure image fusion. However, two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference. Reliance on aligned image pairs in existing methods causes susceptibility to artifacts due to device motion. Additionally, existing techniques often rely on handcrafted architectures with huge network engineering, resulting in redundant parameters, adversely impacting inference efficiency and flexibility. To mitigate these limitations, this study introduces an architecture search-based paradigm incorporating self-alignment and detail repletion modules for robust multi-exposure image fusion. Specifically, targeting the extreme discrepancy of exposure, we propose the self-alignment module, leveraging scene relighting to constrain the illumination degree for following alignment and feature extraction. Detail repletion is proposed to enhance the texture details of scenes. Additionally, incorporating a hardware-sensitive constraint, we present the fusion-oriented architecture search to explore compact and efficient networks for fusion. The proposed method outperforms various competitive schemes, achieving a noteworthy 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios. Moreover, it significantly reduces inference time by 69.1%. The code will be available at https://github.com/LiuZhu-CV/CRMEF.

8/27/2024

MobileMEF: Fast and Efficient Method for Multi-Exposure Fusion

Lucas Nedel Kirsten, Zhicheng Fu, Nikhil Ambha Madhusudhana

Recent advances in camera design and imaging technology have enabled the capture of high-quality images using smartphones. However, due to the limited dynamic range of digital cameras, the quality of photographs captured in environments with highly imbalanced lighting often results in poor-quality images. To address this issue, most devices capture multi-exposure frames and then use some multi-exposure fusion method to merge those frames into a final fused image. Nevertheless, most traditional and current deep learning approaches are unsuitable for real-time applications on mobile devices due to their heavy computational and memory requirements. We propose a new method for multi-exposure fusion based on an encoder-decoder deep learning architecture with efficient building blocks tailored for mobile devices. This efficient design makes our model capable of processing 4K resolution images in less than 2 seconds on mid-range smartphones. Our method outperforms state-of-the-art techniques regarding full-reference quality measures and computational efficiency (runtime and memory usage), making it ideal for real-time applications on hardware-constrained devices. Our code is available at: https://github.com/LucasKirsten/MobileMEF.

8/16/2024

A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion

Xiaoli Zhang, Liying Wang, Libo Zhao, Xiongfei Li, Siwei Ma

Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images. To tackle the problem of insufficient feature extraction and lack of semantic awareness for complex scenes, this paper focuses on how to model correlation-driven decomposing features and reason high-level graph representation by efficiently extracting complementary features and multi-guided feature aggregation. We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy. The transformer with Multi-Dconv Transposed Attention and Local-enhanced Feed Forward network is used to extract shallow features after the depthwise convolution. In the three parallel branches encoder, Cross Attention and Invertible Block (CAI) enables to extract local features and preserve high-frequency texture details. Base feature extraction module (BFE) with residual connections can capture long-range dependency and enhance shared-modality expression capabilities. Graph Reasoning Module (GR) is introduced to reason high-level cross-modality relations and extract low-level details features as CAI's specific-modality complementary information simultaneously. Experiments demonstrate that our method has obtained competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks. Moreover, we surpass other fusion methods in terms of subsequent tasks, averagely scoring 9.78% [email protected] higher in object detection and 6.46% mIoU higher in semantic segmentation.

7/9/2024

Multi-Level Aggregation and Recursive Alignment Architecture for Efficient Parallel Inference Segmentation Network

Yanhua Zhang, Ke Zhang, Jingyu Wang, Yulin Wu, Wuwei Wang

Real-time semantic segmentation is a crucial research for real-world applications. However, many methods lay particular emphasis on reducing the computational complexity and model size, while largely sacrificing the accuracy. To tackle this problem, we propose a parallel inference network customized for semantic segmentation tasks to achieve a good trade-off between speed and accuracy. We employ a shallow backbone to ensure real-time speed, and propose three core components to compensate for the reduced model capacity to improve accuracy. Specifically, we first design a dual-pyramidal path architecture (Multi-level Feature Aggregation Module, MFAM) to aggregate multi-level features from the encoder to each scale, providing hierarchical clues for subsequent spatial alignment and corresponding in-network inference. Then, we build Recursive Alignment Module (RAM) by combining the flow-based alignment module with recursive upsampling architecture for accurate spatial alignment between multi-scale feature maps with half the computational complexity of the straightforward alignment method. Finally, we perform independent parallel inference on the aligned features to obtain multi-scale scores, and adaptively fuse them through an attention-based Adaptive Scores Fusion Module (ASFM) so that the final prediction can favor objects of multiple scales. Our framework shows a better balance between speed and accuracy than state-of-the-art real-time methods on Cityscapes and CamVid datasets. We also conducted systematic ablation studies to gain insight into our motivation and architectural design. Code is available at: https://github.com/Yanhua-Zhang/MFARANet.

4/19/2024