Soft-Hard Attention U-Net Model and Benchmark Dataset for Multiscale Image Shadow Removal

Read original: arXiv:2408.03734 - Published 8/9/2024 by Eirini Cholopoulou, Dimitrios E. Diamantis, Dimitra-Christina C. Koutsiou, Dimitris K. Iakovidis

📈

Overview

Removing shadows from images is important for various applications like computer vision and photography.
Past methods have struggled to capture complex shadow patterns due to limited model assumptions and lack of diverse datasets.
This study proposes a new deep learning model and a new synthetic dataset to address these limitations.

Plain English Explanation

Shadows can make images look less clear and appealing. Removing shadows is important for many computer applications, like computer vision and digital photography. Previous methods for removing shadows have had trouble dealing with complex shadow patterns, often because the models were too simple or the training data didn't include enough variety.

This new study tackles these problems in two ways. First, it introduces a deep learning model called Soft-Hard Attention U-net (SHAU) that is specially designed to handle shadows of different sizes and intensities. Second, the researchers created a new dataset called Multiscale Shadow Removal Dataset (MSRD) that has a wide range of complex shadow patterns, including shadows from multiple objects. This dataset can help train better shadow removal models without privacy concerns.

The results show that SHAU outperforms other state-of-the-art shadow removal methods, improving key performance metrics by a significant margin. This suggests the new model and dataset are valuable contributions to the field of shadow removal and could lead to better image quality in many applications.

Technical Explanation

The key components of the proposed Soft-Hard Attention U-net (SHAU) architecture are the

soft

and

hard

attention modules, which work together with multiscale feature extraction blocks to enable effective removal of shadows at different scales and intensities.

The soft attention module learns to highlight the most relevant features for shadow removal, while the hard attention module helps the model focus on specific shadow regions. By combining these attention mechanisms with a U-net-based structure that captures features at multiple scales, SHAU is able to handle the complex shadow patterns often found in natural scenes, including urban environments.

To train and evaluate shadow removal models, the researchers also introduced the Multiscale Shadow Removal Dataset (MSRD). This synthetic dataset contains a diverse set of shadow patterns across varying scales and intensities, going beyond the simple, uniform shadows typically found in existing benchmarks. MSRD provides both the original images with shadows and the corresponding shadow-free ground truth, allowing for comprehensive evaluation of shadow removal algorithms.

Experiments show that SHAU outperforms other state-of-the-art shadow removal methods on multiple benchmark datasets. Specifically, SHAU improves the Peak Signal-to-Noise Ratio and Root Mean Square Error for the shadow area by 25.1% and 61.3%, respectively, demonstrating the effectiveness of the proposed architecture and the value of the new MSRD dataset.

Critical Analysis

While the proposed SHAU model and MSRD dataset represent significant advancements in shadow removal research, a few potential limitations and areas for further exploration are worth noting:

The MSRD dataset, although more diverse than existing benchmarks, is still a synthetic dataset. Validating the model's performance on real-world, natural images would be an important next step.
The study focuses on static image-based shadow removal, but extending the approach to video-based or interactive shadow removal could further broaden the practical applications.
Investigating the model's robustness to different lighting conditions, camera angles, and scene complexities would help assess its real-world reliability.

Overall, this research makes valuable contributions to the field of shadow removal by introducing a novel deep learning architecture and a more diverse dataset. Further refinement and validation of the approach could lead to even more impactful applications in computer vision and digital imaging.

Conclusion

This study presents a significant advancement in shadow removal technology by proposing a new deep learning model, SHAU, and a new synthetic dataset, MSRD, to address the limitations of past approaches. SHAU's unique attention-based architecture and ability to handle shadows at multiple scales have demonstrated impressive performance gains over existing state-of-the-art methods.

The availability of the MSRD dataset, with its diverse and complex shadow patterns, is also a crucial contribution that can facilitate the development of more robust and comprehensive shadow removal algorithms. As these technologies continue to mature, we can expect to see tangible improvements in the visual quality of images across a wide range of applications, from computer vision to digital photography.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📈

Soft-Hard Attention U-Net Model and Benchmark Dataset for Multiscale Image Shadow Removal

Eirini Cholopoulou, Dimitrios E. Diamantis, Dimitra-Christina C. Koutsiou, Dimitris K. Iakovidis

Effective shadow removal is pivotal in enhancing the visual quality of images in various applications, ranging from computer vision to digital photography. During the last decades physics and machine learning -based methodologies have been proposed; however, most of them have limited capacity in capturing complex shadow patterns due to restrictive model assumptions, neglecting the fact that shadows usually appear at different scales. Also, current datasets used for benchmarking shadow removal are composed of a limited number of images with simple scenes containing mainly uniform shadows cast by single objects, whereas only a few of them include both manual shadow annotations and paired shadow-free images. Aiming to address all these limitations in the context of natural scene imaging, including urban environments with complex scenes, the contribution of this study is twofold: a) it proposes a novel deep learning architecture, named Soft-Hard Attention U-net (SHAU), focusing on multiscale shadow removal; b) it provides a novel synthetic dataset, named Multiscale Shadow Removal Dataset (MSRD), containing complex shadow patterns of multiple scales, aiming to serve as a privacy-preserving dataset for a more comprehensive benchmarking of future shadow removal methodologies. Key architectural components of SHAU are the soft and hard attention modules, which along with multiscale feature extraction blocks enable effective shadow removal of different scales and intensities. The results demonstrate the effectiveness of SHAU over the relevant state-of-the-art shadow removal methods across various benchmark datasets, improving the Peak Signal-to-Noise Ratio and Root Mean Square Error for the shadow area by 25.1% and 61.3%, respectively.

8/9/2024

💬

High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net

Zinuo Li, Xuhang Chen, Chi-Man Pun, Xiaodong Cun

Shadows often occur when we capture the documents with casual equipment, which influences the visual quality and readability of the digital copies. Different from the algorithms for natural shadow removal, the algorithms in document shadow removal need to preserve the details of fonts and figures in high-resolution input. Previous works ignore this problem and remove the shadows via approximate attention and small datasets, which might not work in real-world situations. We handle high-resolution document shadow removal directly via a larger-scale real-world dataset and a carefully designed frequency-aware network. As for the dataset, we acquire over 7k couples of high-resolution (2462 x 3699) images of real-world document pairs with various samples under different lighting circumstances, which is 10 times larger than existing datasets. As for the design of the network, we decouple the high-resolution images in the frequency domain, where the low-frequency details and high-frequency boundaries can be effectively learned via the carefully designed network structure. Powered by our network and dataset, the proposed method clearly shows a better performance than previous methods in terms of visual quality and numerical results. The code, models, and dataset are available at: https://github.com/CXH-Research/DocShadow-SD7K

6/19/2024

SoftShadow: Leveraging Penumbra-Aware Soft Masks for Shadow Removal

Xinrui Wang, Lanqing Guo, Xiyu Wang, Siyu Huang, Bihan Wen

Recent advancements in deep learning have yielded promising results for the image shadow removal task. However, most existing methods rely on binary pre-generated shadow masks. The binary nature of such masks could potentially lead to artifacts near the boundary between shadow and non-shadow areas. In view of this, inspired by the physical model of shadow formation, we introduce novel soft shadow masks specifically designed for shadow removal. To achieve such soft masks, we propose a textit{SoftShadow} framework by leveraging the prior knowledge of pretrained SAM and integrating physical constraints. Specifically, we jointly tune the SAM and the subsequent shadow removal network using penumbra formation constraint loss and shadow removal loss. This framework enables accurate predictions of penumbra (partially shaded regions) and umbra (fully shaded regions) areas while simultaneously facilitating end-to-end shadow removal. Through extensive experiments on popular datasets, we found that our SoftShadow framework, which generates soft masks, can better restore boundary artifacts, achieve state-of-the-art performance, and demonstrate superior generalizability.

9/12/2024

Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

Jinting Luo, Ru Li, Chengzhi Jiang, Mingyan Han, Xiaoming Zhang, Ting Jiang, Haoqiang Fan, Shuaicheng Liu

We propose Diff-Shadow, a global-guided diffusion model for high-quality shadow removal. Previous transformer-based approaches can utilize global information to relate shadow and non-shadow regions but are limited in their synthesis ability and recover images with obvious boundaries. In contrast, diffusion-based methods can generate better content but ignore global information, resulting in inconsistent illumination. In this work, we combine the advantages of diffusion models and global guidance to realize shadow-free restoration. Specifically, we propose a parallel UNets architecture: 1) the local branch performs the patch-based noise estimation in the diffusion process, and 2) the global branch recovers the low-resolution shadow-free images. A Reweight Cross Attention (RCA) module is designed to integrate global contextural information of non-shadow regions into the local branch. We further design a Global-guided Sampling Strategy (GSS) that mitigates patch boundary issues and ensures consistent illumination across shaded and unshaded regions in the recovered image. Comprehensive experiments on three publicly standard datasets ISTD, ISTD+, and SRD have demonstrated the effectiveness of Diff-Shadow. Compared to state-of-the-art methods, our method achieves a significant improvement in terms of PSNR, increasing from 32.33dB to 33.69dB on the SRD dataset. Codes will be released.

7/24/2024