DEFormer: DCT-driven Enhancement Transformer for Low-light Image and Dark Vision

Read original: arXiv:2309.06941 - Published 9/10/2024 by Xiangchen Yin, Zhenda Yu, Xin Gao, Xiao Sun

🖼️

Overview

This paper introduces a new approach for enhancing low-light images by incorporating frequency-domain information.
The proposed "DCT-driven Enhancement Transformer (DEFormer)" framework includes a learnable frequency branch that uses discrete cosine transform (DCT) processing and curvature-based frequency enhancement.
The framework also includes a cross-domain fusion module to bridge the gap between the RGB and frequency domains.
The methods demonstrated improved performance on standard low-light image enhancement benchmarks compared to prior work.

Plain English Explanation

Low-light images often lack color and detail due to the limited light available when the photo was taken. DEFormer aims to restore the lost information and improve the overall visual quality of these types of images.

Rather than relying solely on the standard RGB color channels, the key innovation in this approach is incorporating information from the frequency domain. The frequency domain represents the underlying patterns and structures in the image, which can provide additional clues for enhancing the details.

Specifically, DEFormer includes a "learnable frequency branch" that processes the image using a mathematical technique called discrete cosine transform (DCT). This allows the network to learn how to best utilize the frequency information to restore the lost details in dark image regions.

Additionally, DEFormer employs a "cross-domain fusion" module to effectively combine the RGB and frequency-domain representations, ensuring the final enhanced image benefits from both sources of information.

The researchers showed that DEFormer outperformed previous low-light image enhancement methods on standard benchmarks, demonstrating the value of incorporating frequency-domain clues. This advance could lead to improved performance for downstream computer vision tasks that rely on high-quality images, such as object detection in dark environments.

Technical Explanation

The key technical components of the DEFormer framework are:

Learnable Frequency Branch (LFB): This module takes the input image and processes it in the frequency domain using DCT. It then applies a "curvature-based frequency enhancement" technique to emphasize important frequency features. The LFB allows the network to learn how to best leverage the frequency information for enhancement.
Cross-Domain Fusion (CDF): This module aims to bridge the gap between the RGB and frequency domains. It fuses the features from the two domains to ensure the final enhanced image benefits from both sources of information.

The overall DEFormer architecture consists of an encoder-decoder structure, with the LFB and CDF modules integrated into the network. The encoder extracts features from the input image, the LFB processes the frequency information, the CDF fuses the domains, and the decoder generates the final enhanced output.

The researchers evaluated DEFormer on two standard low-light image enhancement benchmarks: the LOL dataset and the MIT-Adobe FiveK dataset. They reported improvements in both perceptual and task-specific metrics compared to prior state-of-the-art methods. For example, DEFormer demonstrated stronger performance on a dark object detection task, indicating the enhanced images better preserved important visual details.

Critical Analysis

The paper provides a solid technical contribution by introducing a novel approach to leverage frequency-domain information for low-light image enhancement. The researchers carefully designed the LFB and CDF modules and demonstrated their effectiveness through extensive experiments.

However, the paper does not extensively discuss potential limitations or areas for future work. For example, it would be interesting to know how DEFormer performs on more challenging low-light scenarios, such as extreme underexposure or highly non-uniform lighting conditions. Additionally, the researchers could explore the generalization of the approach to other image enhancement tasks beyond low-light, such as underwater image enhancement or image deblurring.

Furthermore, while the paper demonstrates improved performance on a dark object detection task, it would be valuable to investigate the impact of DEFormer on a wider range of high-level computer vision applications, such as segmentation or classification in low-light environments.

Overall, the DEFormer framework represents a promising step forward in leveraging frequency-domain information for image enhancement, and the paper provides a solid technical foundation for future research in this area.

Conclusion

This paper introduces the DCT-driven Enhancement Transformer (DEFormer), a novel framework for enhancing low-light images by incorporating frequency-domain information. The key technical contributions include a learnable frequency branch that processes the image in the frequency domain and a cross-domain fusion module that effectively combines the RGB and frequency representations.

The researchers demonstrated that DEFormer outperforms previous state-of-the-art low-light image enhancement methods on standard benchmarks, including improvements in perceptual quality and performance on a dark object detection task. This suggests that leveraging frequency-domain information can be a valuable approach for restoring lost details and improving the overall visual quality of low-light images.

The DEFormer framework and its underlying techniques could have broader implications for various image enhancement and computer vision applications that rely on high-quality input data, particularly in challenging lighting conditions. Further research exploring the generalization of this approach and its impact on a wider range of tasks would be valuable for advancing the field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

DEFormer: DCT-driven Enhancement Transformer for Low-light Image and Dark Vision

Xiangchen Yin, Zhenda Yu, Xin Gao, Xiao Sun

Low-light image enhancement restores colors and details of single image and improves high-level visual tasks. However, restoring the lost details in the dark area is a challenge by only relying on the RGB domain. In this paper, we introduce frequency as a new clue into the network and propose a DCT-driven enhancement transformer (DEFormer) framework. First, we propose a learnable frequency branch (LFB) for frequency enhancement contains DCT processing and curvature-based frequency enhancement (CFE) to represent frequency features. In addition, we propose a cross domain fusion (CDF) for reducing the differences between the RGB domain and the frequency domain. Our DEFormer has achieved advanced results in both the LOL and MIT-Adobe FiveK datasets and improved the performance of dark detection.

9/10/2024

Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement

Zishu Yao, Guodong Fan, Jinfu Fan, Min Gan, C. L. Philip Chen

Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range correlations in such images. On the other hand, transformer-based methods that focus on global information face high computational complexities when processing high-resolution remote sensing images. From another perspective, Fourier transform can compute global information without introducing a large number of parameters, enabling the network to more efficiently capture the overall image structure and establish long-range correlations. Therefore, we propose a Dual-Domain Feature Fusion Network (DFFN) for low-light remote sensing image enhancement. Specifically, this challenging task of low-light enhancement is divided into two more manageable sub-tasks: the first phase learns amplitude information to restore image brightness, and the second phase learns phase information to refine details. To facilitate information exchange between the two phases, we designed an information fusion affine block that combines data from different phases and scales. Additionally, we have constructed two dark light remote sensing datasets to address the current lack of datasets in dark light remote sensing image enhancement. Extensive evaluations show that our method outperforms existing state-of-the-art methods. The code is available at https://github.com/iijjlk/DFFN.

9/9/2024

LoFormer: Local Frequency Transformer for Image Deblurring

Xintian Mao, Jiansheng Wang, Xingran Xie, Qingli Li, Yan Wang

Due to the computational complexity of self-attention (SA), prevalent techniques for image deblurring often resort to either adopting localized SA or employing coarse-grained global SA methods, both of which exhibit drawbacks such as compromising global modeling or lacking fine-grained correlation. In order to address this issue by effectively modeling long-range dependencies without sacrificing fine-grained details, we introduce a novel approach termed Local Frequency Transformer (LoFormer). Within each unit of LoFormer, we incorporate a Local Channel-wise SA in the frequency domain (Freq-LC) to simultaneously capture cross-covariance within low- and high-frequency local windows. These operations offer the advantage of (1) ensuring equitable learning opportunities for both coarse-grained structures and fine-grained details, and (2) exploring a broader range of representational properties compared to coarse-grained global SA methods. Additionally, we introduce an MLP Gating mechanism complementary to Freq-LC, which serves to filter out irrelevant features while enhancing global learning capabilities. Our experiments demonstrate that LoFormer significantly improves performance in the image deblurring task, achieving a PSNR of 34.09 dB on the GoPro dataset with 126G FLOPs. https://github.com/DeepMed-Lab-ECNU/Single-Image-Deblur

7/25/2024

F2former: When Fractional Fourier Meets Deep Wiener Deconvolution and Selective Frequency Transformer for Image Deblurring

Subhajit Paul, Sahil Kumawat, Ashutosh Gupta, Deepak Mishra

Recent progress in image deblurring techniques focuses mainly on operating in both frequency and spatial domains using the Fourier transform (FT) properties. However, their performance is limited due to the dependency of FT on stationary signals and its lack of capability to extract spatial-frequency properties. In this paper, we propose a novel approach based on the Fractional Fourier Transform (FRFT), a unified spatial-frequency representation leveraging both spatial and frequency components simultaneously, making it ideal for processing non-stationary signals like images. Specifically, we introduce a Fractional Fourier Transformer (F2former), where we combine the classical fractional Fourier based Wiener deconvolution (F2WD) as well as a multi-branch encoder-decoder transformer based on a new fractional frequency aware transformer block (F2TB). We design F2TB consisting of a fractional frequency aware self-attention (F2SA) to estimate element-wise product attention based on important frequency components and a novel feed-forward network based on frequency division multiplexing (FM-FFN) to refine high and low frequency features separately for efficient latent clear image restoration. Experimental results for the cases of both motion deblurring as well as defocus deblurring show that the performance of our proposed method is superior to other state-of-the-art (SOTA) approaches.

9/4/2024