Spread Your Wings: A Radial Strip Transformer for Image Deblurring

Read original: arXiv:2404.00358 - Published 5/24/2024 by Duosheng Chen, Shihao Zhou, Jinshan Pan, Jinglei Shi, Lishen Qu, Jufeng Yang

Spread Your Wings: A Radial Strip Transformer for Image Deblurring

Overview

• This paper introduces a novel radial strip transformer architecture for image deblurring tasks.

• The key idea is to leverage motion information encoded in the polar coordinate system to improve deblurring performance.

• The proposed model, called Spread Your Wings (SYW), outperforms state-of-the-art methods on standard benchmarks.

Plain English Explanation

Image deblurring is the process of restoring a blurry image back to its original, clear state. This is a challenging task as there can be many different causes of blur, such as camera shake, object motion, or atmospheric disturbances.

The authors of this paper have developed a new type of neural network, called the Spread Your Wings (SYW) model, that is particularly well-suited for image deblurring. The core innovation is the use of a "radial strip" transformer, which processes the image data in a radial (circular) fashion rather than the typical Cartesian (grid-like) approach.

The key insight is that motion blur in an image can often be characterized by the direction and speed of movement, which are naturally captured in a polar coordinate system. By leveraging this motion information, the SYW model is able to more effectively remove blur and restore the original image.

This builds on prior work in deblurring using transformer networks and exploiting the geometry of the problem.

Technical Explanation

The SYW model consists of several key components:

Radial Strip Transformer: Instead of processing the image in a standard grid-like fashion, SYW uses a radial strip transformer that operates on image patches arranged in a circular manner. This allows the model to better capture the directionality and speed of motion blur.
Motion-Aware Attention: The radial strip transformer employs a specialized attention mechanism that is designed to focus on the motion information encoded in the polar coordinate representation of the image.
Multi-Scale Fusion: SYW combines features extracted at multiple scales to capture both local and global context, further enhancing deblurring performance.

The authors evaluate SYW on several standard image deblurring benchmarks and show that it outperforms state-of-the-art methods, including Parallel Cross-Strip Attention Network and DRCT, by a significant margin.

Critical Analysis

The authors provide a thorough evaluation of SYW and carefully discuss its limitations and potential areas for future research. One key caveat is that the performance gains of SYW are most pronounced on images with relatively simple, directional blur, whereas its advantages may be less pronounced for more complex blur patterns.

Additionally, the authors note that the radial strip transformer introduces a higher computational cost compared to standard grid-based transformers. Further research may be needed to optimize the efficiency of the proposed architecture.

Overall, the Spread Your Wings model represents an interesting and promising direction in image deblurring research, leveraging the geometry of motion blur in a novel way. The authors have made their code publicly available, which should facilitate further exploration and refinement of this approach.

Conclusion

This paper introduces the Spread Your Wings (SYW) model, a radial strip transformer architecture for image deblurring. By exploiting motion information encoded in the polar coordinate system, SYW is able to outperform state-of-the-art methods on standard benchmarks.

The key innovation of SYW is the use of a specialized radial transformer and motion-aware attention mechanism, which allows the model to more effectively capture and utilize the directional and speed characteristics of blur. While the approach introduces some computational overhead, the significant performance gains suggest that this is a fruitful direction for further research in image deblurring.

Overall, the Spread Your Wings model represents an exciting advancement in the field, demonstrating the potential benefits of incorporating geometric and motion-related insights into neural network architectures.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Spread Your Wings: A Radial Strip Transformer for Image Deblurring

Duosheng Chen, Shihao Zhou, Jinshan Pan, Jinglei Shi, Lishen Qu, Jufeng Yang

Exploring motion information is important for the motion deblurring task. Recent the window-based transformer approaches have achieved decent performance in image deblurring. Note that the motion causing blurry results is usually composed of translation and rotation movements and the window-shift operation in the Cartesian coordinate system by the window-based transformer approaches only directly explores translation motion in orthogonal directions. Thus, these methods have the limitation of modeling the rotation part. To alleviate this problem, we introduce the polar coordinate-based transformer, which has the angles and distance to explore rotation motion and translation information together. In this paper, we propose a Radial Strip Transformer (RST), which is a transformer-based architecture that restores the blur images in a polar coordinate system instead of a Cartesian one. RST contains a dynamic radial embedding module (DRE) to extract the shallow feature by a radial deformable convolution. We design a polar mask layer to generate the offsets for the deformable convolution, which can reshape the convolution kernel along the radius to better capture the rotation motion information. Furthermore, we proposed a radial strip attention solver (RSAS) as deep feature extraction, where the relationship of windows is organized by azimuth and radius. This attention module contains radial strip windows to reweight image features in the polar coordinate, which preserves more useful information in rotation and translation motion together for better recovering the sharp images. Experimental results on six synthesis and real-world datasets prove that our method performs favorably against other SOTA methods for the image deblurring task.

5/24/2024

Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

Huicong Zhang, Haozhe Xie, Hongxun Yao

Video deblurring relies on leveraging information from other frames in the video sequence to restore the blurred regions in the current frame. Mainstream approaches employ bidirectional feature propagation, spatio-temporal transformers, or a combination of both to extract information from the video sequence. However, limitations in memory and computational resources constraints the temporal window length of the spatio-temporal transformer, preventing the extraction of longer temporal contextual information from the video sequence. Additionally, bidirectional feature propagation is highly sensitive to inaccurate optical flow in blurry frames, leading to error accumulation during the propagation process. To address these issues, we propose textbf{BSSTNet}, textbf{B}lur-aware textbf{S}patio-temporal textbf{S}parse textbf{T}ransformer Network. It introduces the blur map, which converts the originally dense attention into a sparse form, enabling a more extensive utilization of information throughout the entire video sequence. Specifically, BSSTNet (1) uses a longer temporal window in the transformer, leveraging information from more distant frames to restore the blurry pixels in the current frame. (2) introduces bidirectional feature propagation guided by blur maps, which reduces error accumulation caused by the blur frame. The experimental results demonstrate the proposed BSSTNet outperforms the state-of-the-art methods on the GoPro and DVD datasets.

6/12/2024

🌀

DarSwin: Distortion Aware Radial Swin Transformer

Akshaya Athwale, Arman Afrasiyabi, Justin Lague, Ichrak Shili, Ola Ahmad, Jean-Franc{c}ois Lalonde

Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions, making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. Our proposed image encoder architecture, dubbed DarSwin, leverages the physical characteristics of such lenses analytically defined by the radial distortion profile. In contrast to conventional transformer-based architectures, DarSwin comprises a radial patch partitioning, a distortion-based sampling technique for creating token embeddings, and an angular position encoding for radial patch merging. Compared to other baselines, DarSwin achieves the best results on different datasets with significant gains when trained on bounded levels of distortions (very low, low, medium, and high) and tested on all, including out-of-distribution distortions. While the base DarSwin architecture requires knowledge of the radial distortion profile, we show it can be combined with a self-calibration network that estimates such a profile from the input image itself, resulting in a completely uncalibrated pipeline. Finally, we also present DarSwin-Unet, which extends DarSwin, to an encoder-decoder architecture suitable for pixel-level tasks. We demonstrate its performance on depth estimation and show through extensive experiments that DarSwin-Unet can perform zero-shot adaptation to unseen distortions of different wide-angle lenses. The code and models are publicly available at https://lvsn.github.io/darswin/

7/25/2024

Efficient Image Deblurring Networks based on Diffusion Models

Kang Chen, Yuanjie Liu

This article presents a sliding window model for defocus deblurring, named Swintormer, which achieves the best performance to date with remarkably low memory usage. This method utilizes a diffusion model to generate latent prior features, aiding in the restoration of more detailed images. Additionally, by adapting the sliding window strategy, it incorporates specialized Transformer blocks to enhance inference efficiency. The adoption of this new approach has led to a substantial reduction in Multiply-Accumulate Operations (MACs) per iteration, drastically cutting down memory requirements. In comparison to the currently leading GRL method, our Swintormer model significantly reduces the computational load that must depend on memory capacity, from 140.35 GMACs to 8.02 GMACs, while improving the Peak Signal-to-Noise Ratio (PSNR) for defocus deblurring from 27.04 dB to 27.07 dB. This innovative technique enables the processing of higher resolution images on memory-limited devices, vastly broadening potential application scenarios. The article wraps up with an ablation study, offering a comprehensive examination of how each network module contributes to the final performance.The source code and model will be available at the following website: https://github.com/bnm6900030/swintormer.

5/30/2024