Exploring Rich Subjective Quality Information for Image Quality Assessment in the Wild

Read original: arXiv:2409.05540 - Published 9/10/2024 by Xiongkuo Min, Yixuan Gao, Yuqin Cao, Guangtao Zhai, Wenjun Zhang, Huifang Sun, Chang Wen Chen

🖼️

Overview

Traditional in-the-wild image quality assessment (IQA) models are trained on mean opinion scores (MOS) but miss rich subjective quality information like standard deviation of opinion scores (SOS) and distribution of opinion scores (DOS).
This paper proposes a novel IQA method called RichIQA that explores this rich subjective rating information to predict image quality.
RichIQA has two key novel designs: a three-stage image quality prediction network and a multi-label training strategy using MOS, SOS, and DOS.

Plain English Explanation

RichIQA is a new way to assess the quality of images found in the real world. Traditional methods only look at the average opinion score, but RichIQA also considers the spread and distribution of opinions.

The three-stage network in RichIQA is designed to mimic how the human brain processes images. It has powerful feature extraction capabilities and can remember both short-term and long-term information.

RichIQA also uses a special multi-label training strategy. Instead of just training on the average opinion score, it also uses the spread of opinions and the full distribution of scores. This helps the network make better predictions and be more reliable.

By considering this rich subjective information, RichIQA can predict the full distribution of image quality, not just a single score. This provides more detailed and useful feedback compared to traditional methods.

Technical Explanation

The key innovations in RichIQA are:

Three-stage Quality Prediction Network: This network uses the powerful feature representation of Convolutional Vision Transformers (CvT) and mimics the short-term and long-term memory mechanisms of the human brain to predict image quality.
Multi-label Training Strategy: The network is trained concurrently on MOS, SOS, and DOS subjective quality information, allowing it to fully exploit the rich rating data.

The three-stage architecture is designed to capture different aspects of image quality. The first stage extracts low-level visual features, the second stage models short-term dependencies, and the third stage models long-term dependencies. This hierarchical structure allows the network to make sophisticated quality predictions.

The multi-label training strategy ensures the network learns from the full range of subjective quality information, not just the average opinion score. This enhances the model's performance and generalization ability compared to approaches that only use MOS.

Critical Analysis

The paper provides a thorough evaluation of RichIQA on multiple large-scale in-the-wild IQA datasets, demonstrating its strong performance compared to state-of-the-art methods.

However, the authors do not discuss any limitations or potential issues with their approach. For example, the complexity of the three-stage network may make it computationally expensive or difficult to deploy in real-world applications.

Additionally, the authors could have explored how RichIQA's ability to predict quality distributions, rather than just single scores, could benefit end users. Further research is needed to understand the practical implications of this rich quality information.

Conclusion

RichIQA is a novel IQA method that leverages the full range of subjective quality ratings, including opinion score distribution, to make more nuanced and reliable predictions of image quality. Its three-stage network architecture and multi-label training strategy are key innovations that allow it to outperform existing approaches.

This research highlights the importance of considering rich subjective information beyond just average opinion scores when assessing image quality, particularly for real-world applications. RichIQA's ability to capture the full distribution of quality perceptions could lead to significant improvements in areas like image optimization, quality control, and user experience design.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

Exploring Rich Subjective Quality Information for Image Quality Assessment in the Wild

Xiongkuo Min, Yixuan Gao, Yuqin Cao, Guangtao Zhai, Wenjun Zhang, Huifang Sun, Chang Wen Chen

Traditional in the wild image quality assessment (IQA) models are generally trained with the quality labels of mean opinion score (MOS), while missing the rich subjective quality information contained in the quality ratings, for example, the standard deviation of opinion scores (SOS) or even distribution of opinion scores (DOS). In this paper, we propose a novel IQA method named RichIQA to explore the rich subjective rating information beyond MOS to predict image quality in the wild. RichIQA is characterized by two key novel designs: (1) a three-stage image quality prediction network which exploits the powerful feature representation capability of the Convolutional vision Transformer (CvT) and mimics the short-term and long-term memory mechanisms of human brain; (2) a multi-label training strategy in which rich subjective quality information like MOS, SOS and DOS are concurrently used to train the quality prediction network. Powered by these two novel designs, RichIQA is able to predict the image quality in terms of a distribution, from which the mean image quality can be subsequently obtained. Extensive experimental results verify that the three-stage network is tailored to predict rich quality information, while the multi-label training strategy can fully exploit the potentials within subjective quality rating and enhance the prediction performance and generalizability of the network. RichIQA outperforms state-of-the-art competitors on multiple large-scale in the wild IQA databases with rich subjective rating labels. The code of RichIQA will be made publicly available on GitHub.

9/10/2024

Descriptive Image Quality Assessment in the Wild

Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, Tianfan Xue

With the rapid advancement of Vision Language Models (VLMs), VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression and capture the multifaceted nature of IQA tasks. However, current methods are still far from practical usage. First, prior works focus narrowly on specific sub-tasks or settings, which do not align with diverse real-world applications. Second, their performance is sub-optimal due to limitations in dataset coverage, scale, and quality. To overcome these challenges, we introduce Depicted image Quality Assessment in the Wild (DepictQA-Wild). Our method includes a multi-functional IQA task paradigm that encompasses both assessment and comparison tasks, brief and detailed responses, full-reference and non-reference scenarios. We introduce a ground-truth-informed dataset construction approach to enhance data quality, and scale up the dataset to 495K under the brief-detail joint framework. Consequently, we construct a comprehensive, large-scale, and high-quality dataset, named DQ-495K. We also retain image resolution during training to better handle resolution-related quality issues, and estimate a confidence score that is helpful to filter out low-quality responses. Experimental results demonstrate that DepictQA-Wild significantly outperforms traditional score-based methods, prior VLM-based IQA models, and proprietary GPT-4V in distortion identification, instant rating, and reasoning tasks. Our advantages are further confirmed by real-world applications including assessing the web-downloaded images and ranking model-processed images. Datasets and codes will be released in https://depictqa.github.io/depictqa-wild/.

6/13/2024

🤷

Cross-IQA: Unsupervised Learning for Image Quality Assessment

Zhen Zhang

Automatic perception of image quality is a challenging problem that impacts billions of Internet and social media users daily. To advance research in this field, we propose a no-reference image quality assessment (NR-IQA) method termed Cross-IQA based on vision transformer(ViT) model. The proposed Cross-IQA method can learn image quality features from unlabeled image data. We construct the pretext task of synthesized image reconstruction to unsupervised extract the image quality information based ViT block. The pretrained encoder of Cross-IQA is used to fine-tune a linear regression model for score prediction. Experimental results show that Cross-IQA can achieve state-of-the-art performance in assessing the low-frequency degradation information (e.g., color change, blurring, etc.) of images compared with the classical full-reference IQA and NR-IQA under the same datasets.

5/8/2024

S-IQA Image Quality Assessment With Compressive Sampling

Ronghua Liao, Chen Hui, Lang Yuan, Haiqi Zhu, Feng Jiang

No-Reference Image Quality Assessment (NR-IQA) aims at estimating image quality in accordance with subjective human perception. However, most methods focus on exploring increasingly complex networks to improve the final performance,accompanied by limitations on input images. Especially when applied to high-resolution (HR) images, these methods offen have to adjust the size of original image to meet model input.To further alleviate the aforementioned issue, we propose two networks for NR-IQA with Compressive Sampling (dubbed CL-IQA and CS-IQA). They consist of four components: (1) The Compressed Sampling Module (CSM) to sample the image (2)The Adaptive Embedding Module (AEM). The measurements are embedded by AEM to extract high-level features. (3) The Vision Transformer and Scale Swin TranBlocksformer Moudle(SSTM) to extract deep features. (4) The Dual Branch (DB) to get final quality score. Experiments show that our proposed methods outperform other methods on various datasets with less data usage.

9/12/2024