Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

Read original: arXiv:2407.16214 - Published 7/24/2024 by Jinting Luo, Ru Li, Chengzhi Jiang, Mingyan Han, Xiaoming Zhang, Ting Jiang, Haoqiang Fan, Shuaicheng Liu

Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

Overview

Proposes a global-guided diffusion model called Diff-Shadow for effective shadow removal from images
Leverages the strengths of diffusion models to learn a mapping from input images with shadows to shadow-free outputs
Incorporates a global guidance mechanism to capture high-level semantic information and improve shadow removal quality

Plain English Explanation

The paper introduces Diff-Shadow, a new approach to removing shadows from images using a type of AI model called a diffusion model. Diffusion models have shown great potential for various image-to-image translation tasks, and the researchers wanted to harness their power for the specific problem of shadow removal.

The key idea behind Diff-Shadow is to guide the diffusion model using global information about the image. Instead of just focusing on the local details around the shadow regions, the model also considers the overall semantic context of the image. This global guidance helps the model better understand the scene and make more informed decisions about where and how to remove the shadows.

The researchers demonstrate that this global-guided approach leads to more accurate and realistic shadow removal results compared to previous methods. By considering the big picture, the model can better differentiate between actual shadows and other dark regions in the image, resulting in more natural-looking outputs.

Technical Explanation

The paper presents Diff-Shadow, a diffusion-based model for shadow removal that incorporates a global guidance mechanism. Diffusion models have shown promising results for various image-to-image translation tasks, and the researchers leverage their strengths to tackle the problem of shadow removal.

The core of Diff-Shadow is a U-Net-based diffusion model that learns to map input images with shadows to their shadow-free counterparts. To capture high-level semantic information and improve the shadow removal quality, the model is augmented with a global guidance module. This module takes the entire input image and extracts global features, which are then incorporated into the diffusion process to provide context-aware guidance.

The researchers design a multi-stage training process to effectively learn the mapping from shadowed to shadow-free images. First, they pretrain the diffusion model on a large-scale shadow dataset to learn the general shadow removal task. Then, they introduce the global guidance module and fine-tune the entire model end-to-end, allowing the global features to further refine the shadow removal outputs.

Extensive experiments on several shadow removal benchmarks demonstrate the effectiveness of Diff-Shadow. The global-guided approach outperforms state-of-the-art shadow removal methods, producing visually appealing and semantically consistent results. The researchers attribute this success to the model's ability to leverage both local and global information to accurately identify and remove shadows.

Critical Analysis

The paper presents a well-designed and thorough study on using diffusion models for shadow removal, a challenging yet important problem in computer vision. The incorporation of global guidance is a key contribution, as it allows the model to better understand the overall context of the scene and make more informed decisions about shadow removal.

One potential limitation mentioned in the paper is the computational complexity of the diffusion process, which may hinder real-time applications. The researchers suggest exploring ways to reduce the number of diffusion steps or accelerate the inference process as future work.

Additionally, while the paper demonstrates impressive results on standard shadow removal benchmarks, it would be valuable to see the model's performance on more diverse and challenging real-world scenarios. Evaluating the robustness of Diff-Shadow to variations in lighting, scene complexity, and shadow characteristics could provide further insights into its practical applicability.

Another area for further research could be exploring the interpretability of the global guidance mechanism. Understanding how the model leverages the global features to guide the shadow removal process could lead to valuable insights and potentially enable more targeted improvements.

Conclusion

The Diff-Shadow paper presents a novel approach to shadow removal that harnesses the power of diffusion models and incorporates a global guidance mechanism. By considering both local and global information, the model is able to produce high-quality, semantically consistent shadow removal results, outperforming state-of-the-art methods.

This research demonstrates the potential of diffusion models for challenging image-to-image translation tasks and highlights the importance of incorporating contextual understanding to improve the performance of such models. As the field of computer vision continues to advance, techniques like Diff-Shadow could have far-reaching applications in various domains, from computational photography to augmented reality and beyond.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

Jinting Luo, Ru Li, Chengzhi Jiang, Mingyan Han, Xiaoming Zhang, Ting Jiang, Haoqiang Fan, Shuaicheng Liu

We propose Diff-Shadow, a global-guided diffusion model for high-quality shadow removal. Previous transformer-based approaches can utilize global information to relate shadow and non-shadow regions but are limited in their synthesis ability and recover images with obvious boundaries. In contrast, diffusion-based methods can generate better content but ignore global information, resulting in inconsistent illumination. In this work, we combine the advantages of diffusion models and global guidance to realize shadow-free restoration. Specifically, we propose a parallel UNets architecture: 1) the local branch performs the patch-based noise estimation in the diffusion process, and 2) the global branch recovers the low-resolution shadow-free images. A Reweight Cross Attention (RCA) module is designed to integrate global contextural information of non-shadow regions into the local branch. We further design a Global-guided Sampling Strategy (GSS) that mitigates patch boundary issues and ensures consistent illumination across shaded and unshaded regions in the recovered image. Comprehensive experiments on three publicly standard datasets ISTD, ISTD+, and SRD have demonstrated the effectiveness of Diff-Shadow. Compared to state-of-the-art methods, our method achieves a significant improvement in terms of PSNR, increasing from 32.33dB to 33.69dB on the SRD dataset. Codes will be released.

7/24/2024

Semantic-guided Adversarial Diffusion Model for Self-supervised Shadow Removal

Ziqi Zeng, Chen Zhao, Weiling Cai, Chenyu Dong

Existing unsupervised methods have addressed the challenges of inconsistent paired data and tedious acquisition of ground-truth labels in shadow removal tasks. However, GAN-based training often faces issues such as mode collapse and unstable optimization. Furthermore, due to the complex mapping between shadow and shadow-free domains, merely relying on adversarial learning is not enough to capture the underlying relationship between two domains, resulting in low quality of the generated images. To address these problems, we propose a semantic-guided adversarial diffusion framework for self-supervised shadow removal, which consists of two stages. At first stage a semantic-guided generative adversarial network (SG-GAN) is proposed to carry out a coarse result and construct paired synthetic data through a cycle-consistent structure. Then the coarse result is refined with a diffusion-based restoration module (DBRM) to enhance the texture details and edge artifact at second stage. Meanwhile, we propose a multi-modal semantic prompter (MSP) that aids in extracting accurate semantic information from real images and text, guiding the shadow removal network to restore images better in SG-GAN. We conduct experiments on multiple public datasets, and the experimental results demonstrate the effectiveness of our method.

7/2/2024

📈

Soft-Hard Attention U-Net Model and Benchmark Dataset for Multiscale Image Shadow Removal

Eirini Cholopoulou, Dimitrios E. Diamantis, Dimitra-Christina C. Koutsiou, Dimitris K. Iakovidis

Effective shadow removal is pivotal in enhancing the visual quality of images in various applications, ranging from computer vision to digital photography. During the last decades physics and machine learning -based methodologies have been proposed; however, most of them have limited capacity in capturing complex shadow patterns due to restrictive model assumptions, neglecting the fact that shadows usually appear at different scales. Also, current datasets used for benchmarking shadow removal are composed of a limited number of images with simple scenes containing mainly uniform shadows cast by single objects, whereas only a few of them include both manual shadow annotations and paired shadow-free images. Aiming to address all these limitations in the context of natural scene imaging, including urban environments with complex scenes, the contribution of this study is twofold: a) it proposes a novel deep learning architecture, named Soft-Hard Attention U-net (SHAU), focusing on multiscale shadow removal; b) it provides a novel synthetic dataset, named Multiscale Shadow Removal Dataset (MSRD), containing complex shadow patterns of multiple scales, aiming to serve as a privacy-preserving dataset for a more comprehensive benchmarking of future shadow removal methodologies. Key architectural components of SHAU are the soft and hard attention modules, which along with multiscale feature extraction blocks enable effective shadow removal of different scales and intensities. The results demonstrate the effectiveness of SHAU over the relevant state-of-the-art shadow removal methods across various benchmark datasets, improving the Peak Signal-to-Noise Ratio and Root Mean Square Error for the shadow area by 25.1% and 61.3%, respectively.

8/9/2024

LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models

Hai Jiang, Ao Luo, Xiaohong Liu, Songchen Han, Shuaicheng Liu

In this paper, we propose a diffusion-based unsupervised framework that incorporates physically explainable Retinex theory with diffusion models for low-light image enhancement, named LightenDiffusion. Specifically, we present a content-transfer decomposition network that performs Retinex decomposition within the latent space instead of image space as in previous approaches, enabling the encoded features of unpaired low-light and normal-light images to be decomposed into content-rich reflectance maps and content-free illumination maps. Subsequently, the reflectance map of the low-light image and the illumination map of the normal-light image are taken as input to the diffusion model for unsupervised restoration with the guidance of the low-light feature, where a self-constrained consistency loss is further proposed to eliminate the interference of normal-light content on the restored results to improve overall visual quality. Extensive experiments on publicly available real-world benchmarks show that the proposed LightenDiffusion outperforms state-of-the-art unsupervised competitors and is comparable to supervised methods while being more generalizable to various scenes. Our code is available at https://github.com/JianghaiSCU/LightenDiffusion.

7/15/2024