Cross-IQA: Unsupervised Learning for Image Quality Assessment

Read original: arXiv:2405.04311 - Published 5/8/2024 by Zhen Zhang

🤷

Overview

Proposes a new no-reference image quality assessment (NR-IQA) method called Cross-IQA based on vision transformer (ViT) models
Cross-IQA can learn image quality features from unlabeled data using a pretext task of synthesized image reconstruction
Pretrained encoder of Cross-IQA is used to fine-tune a linear regression model for image quality score prediction
Experimental results show Cross-IQA achieves state-of-the-art performance on assessing low-frequency degradation (e.g., color change, blurring) compared to classical full-reference and NR-IQA methods

Plain English Explanation

Automatically evaluating the quality of images is a challenging problem that affects billions of people who use the internet and social media daily. To help advance research in this area, the researchers developed a new method called Cross-IQA that can assess image quality without requiring labeled training data.

The key idea behind Cross-IQA is to use a vision transformer (ViT) model to learn about image quality by trying to reconstruct synthetic images with known quality issues, like color changes or blurriness. This "pretext task" allows the model to extract useful information about image quality, even from unlabeled data. The researchers then use the features learned by the ViT encoder to train a simple linear regression model that can predict quality scores for new images.

Importantly, the experiments show that Cross-IQA outperforms existing methods at assessing common low-level image quality problems, like color changes and blurring. This suggests the approach is capturing meaningful information about core aspects of image quality.

Technical Explanation

The proposed Cross-IQA method is a no-reference image quality assessment (NR-IQA) technique based on vision transformer (ViT) models. Unlike traditional full-reference IQA methods that require a high-quality reference image, Cross-IQA can learn image quality features from unlabeled data.

The key innovation is the use of a pretext task - synthesized image reconstruction - to extract image quality information in an unsupervised way. The ViT encoder is trained to reconstruct images with known quality degradations, like color changes or blurring. This allows the model to learn meaningful representations of image quality that are then used to fine-tune a linear regression model for quality score prediction.

Experiments on standard IQA datasets show that Cross-IQA achieves state-of-the-art performance, especially for assessing low-frequency degradations. This suggests the approach is effectively capturing key aspects of image quality that are important for real-world applications, like social media and internet image sharing.

Critical Analysis

The paper provides a compelling approach to no-reference image quality assessment that leverages recent advances in vision transformers. By framing image quality as a pretext task of reconstructing degraded images, the researchers are able to learn robust quality features in an unsupervised way.

However, the paper does not deeply explore the limitations of this approach. For example, it's unclear how Cross-IQA would handle more complex, high-frequency quality issues, like compression artifacts or sensor noise. There may also be concerns around the generalization of the method to diverse real-world image datasets beyond the specific benchmarks used.

Additionally, the paper does not situate Cross-IQA within the broader landscape of NR-IQA techniques or discuss how it compares to other recent innovations in this area. A more thorough discussion of the relative strengths and weaknesses compared to alternative approaches would help readers better evaluate the novelty and significance of this work.

Overall, the Cross-IQA method represents an interesting and promising direction for unsupervised image quality assessment. Further research is needed to fully understand its capabilities and limitations, as well as how it fits into the broader landscape of multi-modal approaches to blind image quality.

Conclusion

The proposed Cross-IQA method offers a novel approach to no-reference image quality assessment by leveraging vision transformer models and unsupervised pretext tasks. The key innovation is the ability to learn meaningful image quality features from unlabeled data, which enables effective assessment of common low-level degradations like color changes and blurring.

The strong experimental results demonstrate the potential of this technique to positively impact real-world applications where automatic image quality evaluation is crucial, such as social media and online image sharing platforms. Further research is needed to fully understand the capabilities and limitations of Cross-IQA, but this work represents an important step forward in the challenging field of blind image quality assessment.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤷

Cross-IQA: Unsupervised Learning for Image Quality Assessment

Zhen Zhang

Automatic perception of image quality is a challenging problem that impacts billions of Internet and social media users daily. To advance research in this field, we propose a no-reference image quality assessment (NR-IQA) method termed Cross-IQA based on vision transformer(ViT) model. The proposed Cross-IQA method can learn image quality features from unlabeled image data. We construct the pretext task of synthesized image reconstruction to unsupervised extract the image quality information based ViT block. The pretrained encoder of Cross-IQA is used to fine-tune a linear regression model for score prediction. Experimental results show that Cross-IQA can achieve state-of-the-art performance in assessing the low-frequency degradation information (e.g., color change, blurring, etc.) of images compared with the classical full-reference IQA and NR-IQA under the same datasets.

5/8/2024

UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment

Hantao Zhou, Longxiang Tang, Rui Yang, Guanyi Qin, Yan Zhang, Runze Hu, Xiu Li

Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate human subjective perception of image visual quality and aesthetic appeal. Existing methods typically address these tasks independently due to distinct learning objectives. However, they neglect the underlying interconnectedness of both tasks, which hinders the learning of task-agnostic shared representations for human subjective perception. To confront this challenge, we propose Unified vision-language pre-training of Quality and Aesthetics (UniQA), to learn general perceptions of two tasks, thereby benefiting them simultaneously. Addressing the absence of text in the IQA datasets and the presence of textual noise in the IAA datasets, (1) we utilize multimodal large language models (MLLMs) to generate high-quality text descriptions; (2) the generated text for IAA serves as metadata to purify noisy IAA data. To effectively adapt the pre-trained UniQA to downstream tasks, we further propose a lightweight adapter that utilizes versatile cues to fully exploit the extensive knowledge of the pre-trained model. Extensive experiments demonstrate that our approach attains a new state-of-the-art performance on both IQA and IAA tasks, while concurrently showcasing exceptional zero-shot and few-label image assessment capabilities. The source code will be available at https://github.com/zht8506/UniQA.

6/4/2024

Attention Down-Sampling Transformer, Relative Ranking and Self-Consistency for Blind Image Quality Assessment

Mohammed Alsaafin, Musab Alsheikh, Saeed Anwar, Muhammad Usman

The no-reference image quality assessment is a challenging domain that addresses estimating image quality without the original reference. We introduce an improved mechanism to extract local and non-local information from images via different transformer encoders and CNNs. The utilization of Transformer encoders aims to mitigate locality bias and generate a non-local representation by sequentially processing CNN features, which inherently capture local visual structures. Establishing a stronger connection between subjective and objective assessments is achieved through sorting within batches of images based on relative distance information. A self-consistency approach to self-supervision is presented, explicitly addressing the degradation of no-reference image quality assessment (NR-IQA) models under equivariant transformations. Our approach ensures model robustness by maintaining consistency between an image and its horizontally flipped equivalent. Through empirical evaluation of five popular image quality assessment datasets, the proposed model outperforms alternative algorithms in the context of no-reference image quality assessment datasets, especially on smaller datasets. Codes are available at href{https://github.com/mas94/ADTRS}{https://github.com/mas94/ADTRS}

9/12/2024

MSLIQA: Enhancing Learning Representations for Image Quality Assessment through Multi-Scale Learning

Nasim Jamshidi Avanaki, Abhijay Ghildyal, Nabajeet Barman, Saman Zadtootaghaj

No-Reference Image Quality Assessment (NR-IQA) remains a challenging task due to the diversity of distortions and the lack of large annotated datasets. Many studies have attempted to tackle these challenges by developing more accurate NR-IQA models, often employing complex and computationally expensive networks, or by bridging the domain gap between various distortions to enhance performance on test datasets. In our work, we improve the performance of a generic lightweight NR-IQA model by introducing a novel augmentation strategy that boosts its performance by almost 28%. This augmentation strategy enables the network to better discriminate between different distortions in various parts of the image by zooming in and out. Additionally, the inclusion of test-time augmentation further enhances performance, making our lightweight network's results comparable to the current state-of-the-art models, simply through the use of augmentations.

9/9/2024