MambaUIE&SR: Unraveling the Ocean's Secrets with Only 2.8 FLOPs

Read original: arXiv:2404.13884 - Published 5/27/2024 by Zhihao Chen, Yiyuan Ge

🤷

Overview

Underwater Image Enhancement (UIE) techniques aim to address the problem of underwater image degradation due to light absorption and scattering.
Recent approaches have explored both Convolutional Neural Network (CNN)-based and Transformer-based methods, as well as combining CNN and Transformer to effectively combine global and local information for enhancement.
However, these approaches are still affected by the secondary complexity of the Transformer and cannot maximize performance.
The state-space model (SSM)-based architecture Mamba has been proposed, which excels in modeling long distances while maintaining linear complexity.

Plain English Explanation

Underwater images often suffer from poor quality due to the way light behaves in water. Light gets absorbed and scattered, making the images look dull and distorted. Researchers have been exploring different ways to enhance these underwater images and improve their quality.

Recently, they've been trying two main approaches: Convolutional Neural Networks (CNNs) and Transformer-based models. CNNs are good at capturing local details, while Transformers can capture global context. Combining these two approaches can help enhance both the local and global information in the images.

However, the Transformer part of these models adds extra complexity, which can limit their overall performance. To address this, researchers have now explored using a state-space model (SSM) architecture called Mamba. This SSM-based model is efficient at handling long-range dependencies while keeping the complexity low.

Technical Explanation

The researchers explore the potential of the SSM-based Mamba model for UIE, from both efficiency and effectiveness perspectives. However, they find that directly applying Mamba leads to poor performance, as it cannot fully utilize the local fine-grained features that are crucial for image enhancement.

To address this, the researchers propose a customized MambaUIE architecture for efficient UIE. They introduce Visual State Space (VSS) blocks to capture global contextual information at the macro level while mining local information at the micro level. They also propose a Dynamic Interaction Block (DIB) and a Spatial feed-forward Network (SGFN) for intra-block feature aggregation of these two types of information.

This allows MambaUIE to efficiently synthesize global and local information while maintaining a very small number of parameters and high accuracy. Experiments on the UIEB dataset show that MambaUIE reduces GFLOPs (a measure of computational complexity) by 67.4% (2.715G) compared to the state-of-the-art method.

Critical Analysis

The paper introduces a novel approach to underwater image enhancement by leveraging the efficient Mamba architecture. However, the researchers acknowledge that directly applying Mamba leads to poor performance because it cannot fully utilize the local fine-grained features, which are crucial for image enhancement.

To address this, the researchers propose customizations to the Mamba architecture, such as the Visual State Space (VSS) blocks, Dynamic Interaction Block (DIB), and Spatial feed-forward Network (SGFN). These customizations allow MambaUIE to effectively capture both global and local information, leading to significant improvements in computational efficiency and accuracy.

While the results are promising, the paper does not explore the potential limitations or drawbacks of the MambaUIE approach. For example, it would be interesting to understand how the model performs on a wider range of underwater image datasets, or how it compares to other state-of-the-art methods in terms of qualitative image enhancement.

Conclusion

This paper presents a novel approach to underwater image enhancement using the efficient Mamba architecture. By customizing the Mamba model with specialized blocks, the researchers were able to capture both global and local information, leading to significant improvements in computational efficiency and accuracy compared to the state-of-the-art.

This work demonstrates the potential of state-space models, like Mamba, for tackling complex computer vision problems, such as underwater image enhancement. The MambaUIE model's ability to efficiently synthesize global and local information while maintaining a small number of parameters could have broader implications for developing high-performance, resource-efficient computer vision systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤷

MambaUIE&SR: Unraveling the Ocean's Secrets with Only 2.8 FLOPs

Zhihao Chen, Yiyuan Ge

Underwater Image Enhancement (UIE) techniques aim to address the problem of underwater image degradation due to light absorption and scattering. In recent years, both Convolution Neural Network (CNN)-based and Transformer-based methods have been widely explored. In addition, combining CNN and Transformer can effectively combine global and local information for enhancement. However, this approach is still affected by the secondary complexity of the Transformer and cannot maximize the performance. Recently, the state-space model (SSM) based architecture Mamba has been proposed, which excels in modeling long distances while maintaining linear complexity. This paper explores the potential of this SSM-based model for UIE from both efficiency and effectiveness perspectives. However, the performance of directly applying Mamba is poor because local fine-grained features, which are crucial for image enhancement, cannot be fully utilized. Specifically, we customize the MambaUIE architecture for efficient UIE. Specifically, we introduce visual state space (VSS) blocks to capture global contextual information at the macro level while mining local information at the micro level. Also, for these two kinds of information, we propose a Dynamic Interaction Block (DIB) and Spatial feed-forward Network (SGFN) for intra-block feature aggregation. MambaUIE is able to efficiently synthesize global and local information and maintains a very small number of parameters with high accuracy. Experiments on UIEB datasets show that our method reduces GFLOPs by 67.4% (2.715G) relative to the SOTA method. To the best of our knowledge, this is the first UIE model constructed based on SSM that breaks the limitation of FLOPs on accuracy in UIE. The official repository of MambaUIE at https://github.com/1024AILab/MambaUIE.

5/27/2024

📈

Mamba-UIE: Enhancing Underwater Images with Physical Model Constraint

Song Zhang, Yuqing Duan, Daoliang Li, Ran Zhao

In underwater image enhancement (UIE), convolutional neural networks (CNN) have inherent limitations in modeling long-range dependencies and are less effective in recovering global features. While Transformers excel at modeling long-range dependencies, their quadratic computational complexity with increasing image resolution presents significant efficiency challenges. Additionally, most supervised learning methods lack effective physical model constraint, which can lead to insufficient realism and overfitting in generated images. To address these issues, we propose a physical model constraint-based underwater image enhancement framework, Mamba-UIE. Specifically, we decompose the input image into four components: underwater scene radiance, direct transmission map, backscatter transmission map, and global background light. These components are reassembled according to the revised underwater image formation model, and the reconstruction consistency constraint is applied between the reconstructed image and the original image, thereby achieving effective physical constraint on the underwater image enhancement process. To tackle the quadratic computational complexity of Transformers when handling long sequences, we introduce the Mamba-UIE network based on linear complexity state space models. By incorporating the Mamba in Convolution block, long-range dependencies are modeled at both the channel and spatial levels, while the CNN backbone is retained to recover local features and details. Extensive experiments on three public datasets demonstrate that our proposed Mamba-UIE outperforms existing state-of-the-art methods, achieving a PSNR of 27.13 and an SSIM of 0.93 on the UIEB dataset. Our method is available at https://github.com/zhangsong1213/Mamba-UIE.

8/1/2024

WaterMamba: Visual State Space Model for Underwater Image Enhancement

Meisheng Guan, Haiyong Xu, Gangyi Jiang, Mei Yu, Yeyao Chen, Ting Luo, Yang Song

Underwater imaging often suffers from low quality due to factors affecting light propagation and absorption in water. To improve image quality, some underwater image enhancement (UIE) methods based on convolutional neural networks (CNN) and Transformer have been proposed. However, CNN-based UIE methods are limited in modeling long-range dependencies, and Transformer-based methods involve a large number of parameters and complex self-attention mechanisms, posing efficiency challenges. Considering computational complexity and severe underwater image degradation, a state space model (SSM) with linear computational complexity for UIE, named WaterMamba, is proposed. We propose spatial-channel omnidirectional selective scan (SCOSS) blocks comprising spatial-channel coordinate omnidirectional selective scan (SCCOSS) modules and a multi-scale feedforward network (MSFFN). The SCOSS block models pixel and channel information flow, addressing dependencies. The MSFFN facilitates information flow adjustment and promotes synchronized operations within SCCOSS modules. Extensive experiments showcase WaterMamba's cutting-edge performance with reduced parameters and computational resources, outperforming state-of-the-art methods on various datasets, validating its effectiveness and generalizability. The code will be released on GitHub after acceptance.

5/15/2024

PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement

Wei-Tung Lin, Yong-Xiang Lin, Jyun-Wei Chen, Kai-Lung Hua

Underwater Image Enhancement (UIE) is critical for marine research and exploration but hindered by complex color distortions and severe blurring. Recent deep learning-based methods have achieved remarkable results, yet these methods struggle with high computational costs and insufficient global modeling, resulting in locally under- or over- adjusted regions. We present PixMamba, a novel architecture, designed to overcome these challenges by leveraging State Space Models (SSMs) for efficient global dependency modeling. Unlike convolutional neural networks (CNNs) with limited receptive fields and transformer networks with high computational costs, PixMamba efficiently captures global contextual information while maintaining computational efficiency. Our dual-level strategy features the patch-level Efficient Mamba Net (EMNet) for reconstructing enhanced image feature and the pixel-level PixMamba Net (PixNet) to ensure fine-grained feature capturing and global consistency of enhanced image that were previously difficult to obtain. PixMamba achieves state-of-the-art performance across various underwater image datasets and delivers visually superior results. Code is available at: https://github.com/weitunglin/pixmamba.

6/13/2024