Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

2405.04964

Published 5/9/2024 by Yi Xiao, Qiangqiang Yuan, Kui Jiang, Yuzeng Chen, Qiang Zhang, Chia-Wen Lin

🖼️

Abstract

Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-scale RSI. To alleviate these issues, we develop the first attempt to integrate the Vision State Space Model (Mamba) for RSI-SR, which specializes in processing large-scale RSI by capturing long-range dependency with linear complexity. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR, to explore the spatial and frequent correlations. In particular, our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to grasp their merits for effective spatial-frequency fusion. Recognizing that global and local dependencies are complementary and both beneficial for SR, we further recalibrate these multi-level features for accurate feature fusion via learnable scaling adaptors. Extensive experiments on AID, DOTA, and DIOR benchmarks demonstrate that our FMSR outperforms state-of-the-art Transformer-based methods HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory consumption and complexity, respectively.

Create account to get full access

Overview

This paper introduces a novel approach called Frequency-assisted Mamba (FMSR) for remote sensing image super-resolution (RSI-SR) using deep neural networks.
Existing super-resolution methods often suffer from either limited receptive fields or high computational costs, leading to sub-optimal global representation and performance.
The paper proposes integrating the Vision State Space Model (Mamba) to process large-scale RSI by capturing long-range dependencies with linear complexity.
To further improve super-resolution reconstruction, the paper introduces the FMSR framework, which combines the strengths of spatial and frequency-domain processing.

Plain English Explanation

In the field of remote sensing, super-resolution techniques are used to enhance the quality of low-resolution satellite or aerial images. Current deep learning-based super-resolution methods often struggle with two key issues:

Limited Receptive Field: Some methods have a limited ability to capture the broader context and long-range dependencies in large-scale remote sensing images, leading to sub-optimal global representation.
High Computational Costs: Other methods require high computational resources, making them impractical for processing large-scale remote sensing data.

To address these challenges, the researchers in this paper developed a new approach called Frequency-assisted Mamba (FMSR). FMSR builds upon the Vision State Space Model (Mamba), which is designed to efficiently process large-scale remote sensing images by capturing long-range dependencies with linear computational complexity.

To further enhance super-resolution performance, FMSR incorporates a multi-level fusion architecture that combines the strengths of spatial and frequency-domain processing. This includes:

Frequency Selection Module (FSM): Extracts and fuses relevant frequency information.
Vision State Space Module (VSSM): Captures long-range spatial dependencies efficiently.
Hybrid Gate Module (HGM): Intelligently combines the spatial and frequency-domain features.

By integrating these specialized modules, FMSR is able to achieve better super-resolution results compared to state-of-the-art methods, while requiring significantly less memory and computational resources.

Technical Explanation

The paper proposes the Frequency-assisted Mamba (FMSR) framework for remote sensing image super-resolution (RSI-SR). FMSR builds upon the Vision State Space Model (Mamba), which is designed to efficiently process large-scale remote sensing images by capturing long-range dependencies with linear computational complexity.

To achieve better super-resolution reconstruction, the FMSR framework introduces a multi-level fusion architecture that combines spatial and frequency-domain processing. Specifically, it includes the following key components:

Frequency Selection Module (FSM): This module extracts and fuses relevant frequency information to enhance the super-resolution performance.
Vision State Space Module (VSSM): This module, based on the Mamba architecture, efficiently captures long-range spatial dependencies in the remote sensing images.
Hybrid Gate Module (HGM): This module intelligently combines the spatial and frequency-domain features to further improve the super-resolution reconstruction.

Additionally, the paper recognizes that global and local dependencies are complementary and both beneficial for super-resolution. To leverage this, the multi-level features are recalibrated using learnable scaling adaptors for accurate feature fusion.

The proposed FMSR framework is evaluated on several remote sensing image benchmarks, including AID, DOTA, and DIOR. The results demonstrate that FMSR outperforms state-of-the-art Transformer-based methods, such as HAT-L, in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of the memory and complexity, respectively.

Critical Analysis

The paper presents a compelling approach to address the limitations of existing remote sensing image super-resolution methods. By integrating the Vision State Space Model (Mamba) and combining spatial and frequency-domain processing, the proposed FMSR framework achieves impressive performance improvements while significantly reducing computational and memory requirements.

One potential area for further research could be exploring the generalization capabilities of FMSR beyond the evaluated benchmarks, as the paper does not provide extensive testing on diverse remote sensing image datasets. Additionally, the paper could delve deeper into the trade-offs between the different module components (FSM, VSSM, HGM) and their individual contributions to the overall performance.

While the paper demonstrates the effectiveness of the FMSR approach, it would be valuable to understand the limitations or weaknesses of the method, such as potential performance degradation in specific scenarios or edge cases. Addressing these aspects could further strengthen the research and provide a more comprehensive understanding of the FMSR framework.

Conclusion

This paper presents a novel Frequency-assisted Mamba (FMSR) framework for remote sensing image super-resolution that addresses the limitations of existing methods. By integrating the Vision State Space Model (Mamba) and combining spatial and frequency-domain processing, FMSR achieves state-of-the-art performance while significantly reducing computational and memory requirements.

The key innovation of FMSR lies in its multi-level fusion architecture, which includes specialized modules like the Frequency Selection Module, Vision State Space Module, and Hybrid Gate Module. This integration allows FMSR to effectively capture both global and local dependencies, leading to improved super-resolution reconstruction.

The demonstrated performance improvements and efficiency gains of FMSR on benchmark remote sensing image datasets suggest that this approach could have a significant impact on real-world applications, enabling more accurate and scalable super-resolution of large-scale remote sensing data. Further research exploring the generalization and limitations of FMSR could provide valuable insights and drive continued advancements in this important field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

👀

DVMSR: Distillated Vision Mamba for Efficient Super-Resolution

Xiaoyan Lei, Wenlong Zhang, Weifeng Cao

Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational complexity, which have shown impressive performance on high-level vision tasks. In this paper, we propose DVMSR, a novel lightweight Image SR network that incorporates Vision Mamba and a distillation strategy. The network of DVMSR consists of three modules: feature extraction convolution, multiple stacked Residual State Space Blocks (RSSBs), and a reconstruction module. Specifically, the deep feature extraction module is composed of several residual state space blocks (RSSB), each of which has several Vision Mamba Moudles(ViMM) together with a residual connection. To achieve efficiency improvement while maintaining comparable performance, we employ a distillation strategy to the vision Mamba network for superior performance. Specifically, we leverage the rich representation knowledge of teacher network as additional supervision for the output of lightweight student networks. Extensive experiments have demonstrated that our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters while maintaining the performance of both PSNR and SSIM. The source code is available at https://github.com/nathan66666/DVMSR.git

5/14/2024

eess.IV cs.CV cs.LG

Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning

Ruisheng Gao, Zeyu Xiao, Zhiwei Xiong

Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution by effectively modeling long-range spatial-angular correlations, but their quadratic complexity hinders the efficient processing of high resolution 4D inputs, resulting in slow inference speed and high memory cost. As a compromise, most prior work adopts a patch-based strategy, which fails to leverage the full information from the entire input LFs. The recently proposed selective state-space model, Mamba, has gained popularity for its efficient long-range sequence modeling. In this paper, we propose a Mamba-based Light Field Super-Resolution method, named MLFSR, by designing an efficient subspace scanning strategy. Specifically, we tokenize 4D LFs into subspace sequences and conduct bi-directional scanning on each subspace. Based on our scanning strategy, we then design the Mamba-based Global Interaction (MGI) module to capture global information and the local Spatial- Angular Modulator (SAM) to complement local details. Additionally, we introduce a Transformer-to-Mamba (T2M) loss to further enhance overall performance. Extensive experiments on public benchmarks demonstrate that MLFSR surpasses CNN-based models and rivals Transformer-based methods in performance while maintaining higher efficiency. With quicker inference speed and reduced memory demand, MLFSR facilitates full-image processing of high-resolution 4D LFs with enhanced performance.

6/26/2024

eess.IV cs.CV

LFMamba: Light Field Image Super-Resolution with State Space Model

Wang xia, Yao Lu, Shunzhou Wang, Ziqi Wang, Peiqi Xia, Tianfei Zhou

Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scanning mechanism (S6), exemplified by Mamba, has emerged as a superior alternative in various vision tasks compared to traditional CNN- and Transformer-based approaches, benefiting from its effective long-range sequence modeling capability and linear-time complexity. Therefore, integrating S6 into LFSR becomes compelling, especially considering the vast data volume of 4D light fields. However, the primary challenge lies in emph{designing an appropriate scanning method for 4D light fields that effectively models light field features}. To tackle this, we employ SSMs on the informative 2D slices of 4D LFs to fully explore spatial contextual information, complementary angular information, and structure information. To achieve this, we carefully devise a basic SSM block characterized by an efficient SS2D mechanism that facilitates more effective and efficient feature learning on these 2D slices. Based on the above two designs, we further introduce an SSM-based network for LFSR termed LFMamba. Experimental results on LF benchmarks demonstrate the superior performance of LFMamba. Furthermore, extensive ablation studies are conducted to validate the efficacy and generalization ability of our proposed method. We expect that our LFMamba shed light on effective representation learning of LFs with state space models.

6/19/2024

cs.CV eess.IV

RS-Mamba for Large Remote Sensing Image Dense Prediction

Sijie Zhao, Hao Chen, Xueliang Zhang, Pengfeng Xiao, Lei Bai, Wanli Ouyang

Context modeling is critical for remote sensing image dense prediction tasks. Nowadays, the growing size of very-high-resolution (VHR) remote sensing images poses challenges in effectively modeling context. While transformer-based models possess global modeling capabilities, they encounter computational challenges when applied to large VHR images due to their quadratic complexity. The conventional practice of cropping large images into smaller patches results in a notable loss of contextual information. To address these issues, we propose the Remote Sensing Mamba (RSM) for dense prediction tasks in large VHR remote sensing images. RSM is specifically designed to capture the global context of remote sensing images with linear complexity, facilitating the effective processing of large VHR images. Considering that the land covers in remote sensing images are distributed in arbitrary spatial directions due to characteristics of remote sensing over-head imaging, the RSM incorporates an omnidirectional selective scan module to globally model the context of images in multiple directions, capturing large spatial features from various directions. Extensive experiments on semantic segmentation and change detection tasks across various land covers demonstrate the effectiveness of the proposed RSM. We designed simple yet effective models based on RSM, achieving state-of-the-art performance on dense prediction tasks in VHR remote sensing images without fancy training strategies. Leveraging the linear complexity and global modeling capabilities, RSM achieves better efficiency and accuracy than transformer-based models on large remote sensing images. Interestingly, we also demonstrated that our model generally performs better with a larger image size on dense prediction tasks. Our code is available at https://github.com/walking-shadow/Official_Remote_Sensing_Mamba.

4/11/2024

cs.CV