PromptCIR: Blind Compressed Image Restoration with Prompt Learning

Read original: arXiv:2404.17433 - Published 4/29/2024 by Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen

🖼️

Overview

This paper focuses on the problem of blind compressed image restoration (CIR), which aims to mitigate compression artifacts caused by unknown quality factors, particularly with JPEG codecs.
Existing approaches often rely on a quality factor prediction network to facilitate their restoration network, but this numerical quality factor lacks spatial information, limiting the network's adaptability to image content.
The authors propose a prompt-learning-based compressed image restoration network, dubbed PromptCIR, which can effectively restore images from various compression levels.

Plain English Explanation

The paper discusses a technique called blind compressed image restoration (CIR), which is used to fix issues in digital images that have been compressed, particularly with the common JPEG format. When images are compressed, it can lead to visible artifacts or distortions in the final image. The existing methods for addressing this problem often rely on a separate network to predict the compression level of the image, and then use that information to guide the restoration process.

However, the authors argue that the numerical compression level information alone is not enough. The compression artifacts can vary depending on the actual content of the image, so a more sophisticated approach is needed. The researchers propose a new method called PromptCIR that uses prompt learning to encode the compression information in a more flexible and adaptive way.

The key idea is to use prompts - short text descriptions - that can interact directly with the visual features of the image to provide dynamic, content-aware guidance for the restoration process. This allows the network to better adapt to different compression levels and image contents, without the need for a separate quality prediction network.

The authors show that this PromptCIR approach outperforms previous methods, and they even won a top prize in a recent challenge on blind compressed image enhancement. Overall, this research demonstrates how advanced AI techniques like prompt learning can be applied to solve practical image restoration problems.

Technical Explanation

The paper proposes a prompt-learning-based compressed image restoration network, dubbed PromptCIR, to effectively restore images from various compression levels.

The core innovation is the use of prompts to encode compression information implicitly. These prompts directly interact with soft weights generated from the image features, providing dynamic content-aware and distortion-aware guidance for the restoration process. This approach allows PromptCIR to adapt to different compression levels without relying on a separate quality factor prediction network.

Specifically, the PromptCIR architecture leverages a transformer-based backbone, where the prompts are used to modulate the network's soft weights. This enables the restoration network to learn compression-specific features and adaptively restore the image content. The lightweight prompts introduce minimal parameter overhead, making the approach efficient.

The authors demonstrate the effectiveness of PromptCIR through extensive experiments, including winning the first place in the NTIRE 2024 challenge on blind compressed image enhancement. The results show that PromptCIR outperforms previous state-of-the-art methods in terms of both quantitative and qualitative performance.

Critical Analysis

The paper presents a compelling approach to the problem of blind compressed image restoration, leveraging the power of prompt learning to address the limitations of existing methods.

One potential limitation, not discussed in the paper, is the reliance on the availability of high-quality training data with diverse compression levels. The performance of PromptCIR may be affected by the representativeness and quality of the training data, which can be challenging to obtain in real-world scenarios.

Additionally, the paper does not provide a detailed analysis of the computational and memory efficiency of PromptCIR compared to other approaches. As image restoration tasks can be computationally intensive, the scalability and deployability of the proposed method on various hardware platforms could be an important consideration.

Further research could explore the generalization of PromptCIR to other types of image degradation, such as noise, blur, or mixed degradations, to expand its applicability. Investigating the interpretability and explainability of the prompt-based restoration process could also provide valuable insights into the inner workings of the network.

Conclusion

The paper presents a novel prompt-learning-based approach, PromptCIR, for effectively restoring images compressed with unknown quality factors. By leveraging prompts to encode compression information implicitly, PromptCIR can dynamically adapt to different compression levels without relying on a separate quality prediction network.

The authors have demonstrated the effectiveness of PromptCIR through extensive experiments, including winning a top prize in a recent challenge on blind compressed image enhancement. This research showcases the potential of advanced AI techniques, such as prompt learning, to solve practical image restoration problems and improve the quality of compressed digital imagery.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

PromptCIR: Blind Compressed Image Restoration with Prompt Learning

Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen

Blind Compressed Image Restoration (CIR) has garnered significant attention due to its practical applications. It aims to mitigate compression artifacts caused by unknown quality factors, particularly with JPEG codecs. Existing works on blind CIR often seek assistance from a quality factor prediction network to facilitate their network to restore compressed images. However, the predicted numerical quality factor lacks spatial information, preventing network adaptability toward image contents. Recent studies in prompt-learning-based image restoration have showcased the potential of prompts to generalize across varied degradation types and degrees. This motivated us to design a prompt-learning-based compressed image restoration network, dubbed PromptCIR, which can effectively restore images from various compress levels. Specifically, PromptCIR exploits prompts to encode compression information implicitly, where prompts directly interact with soft weights generated from image features, thus providing dynamic content-aware and distortion-aware guidance for the restoration process. The light-weight prompts enable our method to adapt to different compression levels, while introducing minimal parameter overhead. Overall, PromptCIR leverages the powerful transformer-based backbone with the dynamic prompt module to proficiently handle blind CIR tasks, winning first place in the NTIRE 2024 challenge of blind compressed image enhancement track. Extensive experiments have validated the effectiveness of our proposed PromptCIR. The code is available at https://github.com/lbc12345/PromptCIR-NTIRE24.

4/29/2024

UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt

Xin Li, Bingchen Li, Yeying Jin, Cuiling Lan, Hanxin Zhu, Yulin Ren, Zhibo Chen

Compressed Image Super-resolution (CSR) aims to simultaneously super-resolve the compressed images and tackle the challenging hybrid distortions caused by compression. However, existing works on CSR usually focuses on a single compression codec, i.e., JPEG, ignoring the diverse traditional or learning-based codecs in the practical application, e.g., HEVC, VVC, HIFIC, etc. In this work, we propose the first universal CSR framework, dubbed UCIP, with dynamic prompt learning, intending to jointly support the CSR distortions of any compression codecs/modes. Particularly, an efficient dynamic prompt strategy is proposed to mine the content/spatial-aware task-adaptive contextual information for the universal CSR task, using only a small amount of prompts with spatial size 1x1. To simplify contextual information mining, we introduce the novel MLP-like framework backbone for our UCIP by adapting the Active Token Mixer (ATM) to CSR tasks for the first time, where the global information modeling is only taken in horizontal and vertical directions with offset prediction. We also build an all-in-one benchmark dataset for the CSR task by collecting the datasets with the popular 6 diverse traditional and learning-based codecs, including JPEG, HEVC, VVC, HIFIC, etc., resulting in 23 common degradations. Extensive experiments have shown the consistent and excellent performance of our UCIP on universal CSR tasks. The project can be found in https://lixinustc.github.io/UCIP.github.io

7/19/2024

🖼️

SPIRE: Semantic Prompt-Driven Image Restoration

Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi

Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement. However, it still remains an open research problem to adopt this language-vision paradigm for more fine-level image processing tasks, such as denoising, super-resolution, deblurring, and compression artifact removal. In this paper, we develop SPIRE, a Semantic and restoration Prompt-driven Image Restoration framework that leverages natural language as a user-friendly interface to control the image restoration process. We consider the capacity of prompt information in two dimensions. First, we use content-related prompts to enhance the semantic alignment, effectively alleviating identity ambiguity in the restoration outcomes. Second, our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength, without the need for explicit task-specific design. In addition, we introduce a novel fusion mechanism that augments the existing ControlNet architecture by learning to rescale the generative prior, thereby achieving better restoration fidelity. Our extensive experiments demonstrate the superior restoration performance of SPIRE compared to the state of the arts, alongside offering the flexibility of text-based control over the restoration effects.

7/17/2024

Pseudo-triplet Guided Few-shot Composed Image Retrieval

Bohan Hou, Haoqiang Lin, Haokun Wen, Meng Liu, Xuemeng Song

Composed Image Retrieval (CIR) is a challenging task that aims to retrieve the target image based on a multimodal query, i.e., a reference image and its corresponding modification text. While previous supervised or zero-shot learning paradigms all fail to strike a good trade-off between time-consuming annotation cost and retrieval performance, recent researchers introduced the task of few-shot CIR (FS-CIR) and proposed a textual inversion-based network based on pretrained CLIP model to realize it. Despite its promising performance, the approach suffers from two key limitations: insufficient multimodal query composition training and indiscriminative training triplet selection. To address these two limitations, in this work, we propose a novel two-stage pseudo triplet guided few-shot CIR scheme, dubbed PTG-FSCIR. In the first stage, we employ a masked training strategy and advanced image caption generator to construct pseudo triplets from pure image data to enable the model to acquire primary knowledge related to multimodal query composition. In the second stage, based on active learning, we design a pseudo modification text-based query-target distance metric to evaluate the challenging score for each unlabeled sample. Meanwhile, we propose a robust top range-based random sampling strategy according to the 3-$sigma$ rule in statistics, to sample the challenging samples for fine-tuning the pretrained model. Notably, our scheme is plug-and-play and compatible with any existing supervised CIR models. We tested our scheme across three backbones on three public datasets (i.e., FashionIQ, CIRR, and Birds-to-Words), achieving maximum improvements of 26.4%, 25.5% and 21.6% respectively, demonstrating our scheme's effectiveness.

7/9/2024