High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net

Read original: arXiv:2308.14221 - Published 6/19/2024 by Zinuo Li, Xuhang Chen, Chi-Man Pun, Xiaodong Cun

💬

Overview

The paper focuses on removing shadows from high-resolution document images captured with casual equipment, which can affect the visual quality and readability of digital copies.
Previous algorithms for document shadow removal ignored the need to preserve details like fonts and figures, and used small datasets that may not work well in real-world situations.
The authors introduce a larger-scale real-world dataset and a carefully designed frequency-aware network to directly handle high-resolution document shadow removal.

Plain English Explanation

Shadows can often appear in digital copies of documents when they are captured using basic equipment. This can make the documents look less clear and harder to read. Unlike algorithms designed to remove shadows from natural images, document shadow removal algorithms need to be able to preserve the fine details of things like text and graphics in high-resolution input.

Previous research in this area has overlooked this issue and tried to remove shadows using approximate techniques and small datasets. This approach may not work very well in real-world situations with diverse document types and lighting conditions.

To address this, the authors of the paper created a much larger dataset of over 7,000 high-resolution document image pairs with shadows. They also developed a new network architecture that is designed to effectively learn and preserve the low-frequency details and high-frequency boundaries in these high-resolution images by working in the frequency domain.

The authors show that their approach, powered by this new dataset and network design, outperforms previous methods in terms of both visual quality and numerical performance metrics when it comes to removing shadows from high-resolution document images.

Technical Explanation

The key technical elements of the paper include:

Dataset: The authors created a dataset of over 7,000 high-resolution (2462 x 3699) document image pairs, with one image containing shadows and the other being the corresponding shadow-free version. This is about 10 times larger than previous datasets used for document shadow removal.
Network Architecture: The authors' network architecture is designed to work in the frequency domain, where it can effectively learn to preserve the low-frequency details and high-frequency boundaries that are important for maintaining the quality of document text and graphics. This is in contrast to previous approaches that used approximate attention mechanisms and smaller datasets.
Performance Evaluation: The authors show that their approach, enabled by the larger dataset and the frequency-aware network design, outperforms previous methods in both visual quality and numerical metrics when it comes to removing shadows from high-resolution document images.

Critical Analysis

The paper addresses an important practical problem in document digitization and clearly demonstrates the value of a larger, more representative dataset and a carefully designed network architecture. However, some potential limitations and areas for further research include:

The dataset, while larger than previous efforts, may still not capture the full diversity of real-world document types and lighting conditions. Continued dataset expansion and curation could further improve the robustness of the approach.
The frequency-domain approach, while effective, may not be the only way to preserve critical document details during shadow removal. Alternative techniques like those used in Shadow Refiner could also be investigated.
The authors do not provide much analysis on the computational efficiency of their approach, which may be an important consideration for real-world deployment, especially on resource-constrained devices.
Integrating the document shadow removal capabilities with end-to-end document digitization pipelines could further enhance the practical value of this research.

Conclusion

The paper presents a significant advancement in high-resolution document shadow removal by introducing a larger-scale real-world dataset and a carefully designed frequency-aware network. This allows for more effective preservation of critical document details like text and graphics during the shadow removal process, outperforming previous methods. The work has important implications for improving the quality and readability of digital document archives, with potential applications in areas like digital archiving, document digitization, and computational photography.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net

Zinuo Li, Xuhang Chen, Chi-Man Pun, Xiaodong Cun

Shadows often occur when we capture the documents with casual equipment, which influences the visual quality and readability of the digital copies. Different from the algorithms for natural shadow removal, the algorithms in document shadow removal need to preserve the details of fonts and figures in high-resolution input. Previous works ignore this problem and remove the shadows via approximate attention and small datasets, which might not work in real-world situations. We handle high-resolution document shadow removal directly via a larger-scale real-world dataset and a carefully designed frequency-aware network. As for the dataset, we acquire over 7k couples of high-resolution (2462 x 3699) images of real-world document pairs with various samples under different lighting circumstances, which is 10 times larger than existing datasets. As for the design of the network, we decouple the high-resolution images in the frequency domain, where the low-frequency details and high-frequency boundaries can be effectively learned via the carefully designed network structure. Powered by our network and dataset, the proposed method clearly shows a better performance than previous methods in terms of visual quality and numerical results. The code, models, and dataset are available at: https://github.com/CXH-Research/DocShadow-SD7K

6/19/2024

🔎

DocDeshadower: Frequency-Aware Transformer for Document Shadow Removal

Ziyang Zhou, Yingtie Lei, Xuhang Chen, Shenghong Luo, Wenjun Zhang, Chi-Man Pun, Zhen Wang

Shadows in scanned documents pose significant challenges for document analysis and recognition tasks due to their negative impact on visual quality and readability. Current shadow removal techniques, including traditional methods and deep learning approaches, face limitations in handling varying shadow intensities and preserving document details. To address these issues, we propose DocDeshadower, a novel multi-frequency Transformer-based model built upon the Laplacian Pyramid. By decomposing the shadow image into multiple frequency bands and employing two critical modules: the Attention-Aggregation Network for low-frequency shadow removal and the Gated Multi-scale Fusion Transformer for global refinement. DocDeshadower effectively removes shadows at different scales while preserving document content. Extensive experiments demonstrate DocDeshadower's superior performance compared to state-of-the-art methods, highlighting its potential to significantly improve document shadow removal techniques. The code is available at https://github.com/leiyingtie/DocDeshadower.

7/31/2024

📈

Soft-Hard Attention U-Net Model and Benchmark Dataset for Multiscale Image Shadow Removal

Eirini Cholopoulou, Dimitrios E. Diamantis, Dimitra-Christina C. Koutsiou, Dimitris K. Iakovidis

Effective shadow removal is pivotal in enhancing the visual quality of images in various applications, ranging from computer vision to digital photography. During the last decades physics and machine learning -based methodologies have been proposed; however, most of them have limited capacity in capturing complex shadow patterns due to restrictive model assumptions, neglecting the fact that shadows usually appear at different scales. Also, current datasets used for benchmarking shadow removal are composed of a limited number of images with simple scenes containing mainly uniform shadows cast by single objects, whereas only a few of them include both manual shadow annotations and paired shadow-free images. Aiming to address all these limitations in the context of natural scene imaging, including urban environments with complex scenes, the contribution of this study is twofold: a) it proposes a novel deep learning architecture, named Soft-Hard Attention U-net (SHAU), focusing on multiscale shadow removal; b) it provides a novel synthetic dataset, named Multiscale Shadow Removal Dataset (MSRD), containing complex shadow patterns of multiple scales, aiming to serve as a privacy-preserving dataset for a more comprehensive benchmarking of future shadow removal methodologies. Key architectural components of SHAU are the soft and hard attention modules, which along with multiscale feature extraction blocks enable effective shadow removal of different scales and intensities. The results demonstrate the effectiveness of SHAU over the relevant state-of-the-art shadow removal methods across various benchmark datasets, improving the Peak Signal-to-Noise Ratio and Root Mean Square Error for the shadow area by 25.1% and 61.3%, respectively.

8/9/2024

Unveiling Deep Shadows: A Survey on Image and Video Shadow Detection, Removal, and Generation in the Era of Deep Learning

Xiaowei Hu, Zhenghao Xing, Tianyu Wang, Chi-Wing Fu, Pheng-Ann Heng

Shadows are formed when light encounters obstacles, leading to areas of diminished illumination. In computer vision, shadow detection, removal, and generation are crucial for enhancing scene understanding, refining image quality, ensuring visual consistency in video editing, and improving virtual environments. This paper presents a comprehensive survey of shadow detection, removal, and generation in images and videos within the deep learning landscape over the past decade, covering tasks, deep models, datasets, and evaluation metrics. Our key contributions include a comprehensive survey of shadow analysis, standardization of experimental comparisons, exploration of the relationships among model size, speed, and performance, a cross-dataset generalization study, identification of open issues and future directions, and provision of publicly available resources to support further research.

9/4/2024