ExIQA: Explainable Image Quality Assessment Using Distortion Attributes

Read original: arXiv:2409.06853 - Published 9/12/2024 by Sepehr Kazemi Ranjbar, Emad Fatemizadeh

ExIQA: Explainable Image Quality Assessment Using Distortion Attributes

Overview

The paper proposes ExIQA, a new approach for explainable image quality assessment (IQA) using distortion attributes.
ExIQA aims to provide a more interpretable and transparent way to assess image quality compared to traditional "black-box" IQA models.
The model uses an encoder-decoder architecture to jointly predict image quality scores and distortion attribute maps, which can help explain the quality assessment.

Plain English Explanation

The researchers developed a new system called ExIQA that can assess the quality of images in a way that is more understandable to humans. Traditional image quality assessment models often work like "black boxes" - they take an image as input and output a quality score, but it's not clear how they arrived at that score.

In contrast, ExIQA not only predicts a quality score, but also generates a "distortion map" that shows what specific issues or defects were detected in the image. This extra information helps explain the quality assessment in a more transparent way. For example, the distortion map might highlight areas of the image that are blurry, have noise, or have other visual problems.

By providing these explanatory distortion maps alongside the quality score, ExIQA aims to make the image quality assessment process more interpretable and useful, especially for applications where it's important to understand the reasons behind the quality assessment.

Technical Explanation

The key innovation in ExIQA is its use of an encoder-decoder architecture to jointly predict the overall image quality score as well as a set of distortion attribute maps. The encoder takes the input image and learns a compact representation, while the decoder predicts both the quality score and the distortion maps.

The distortion attribute maps correspond to different types of image defects, such as blur, noise, contrast, and color distortion. By learning to predict these distortion maps alongside the quality score, the model is able to provide an explanation for its quality assessment by highlighting the specific issues present in the image.

To train ExIQA, the researchers used a large dataset of distorted images along with human-annotated quality scores and distortion attribute maps. This allowed the model to learn the relationship between image quality, overall distortion, and the individual distortion components.

Experiments showed that ExIQA achieves competitive performance on standard image quality assessment benchmarks, while also providing the added benefit of the explanatory distortion maps. The researchers argue that this extra interpretability can be valuable in applications where understanding the reasons for the quality assessment is important, such as image processing, compression, and enhancement algorithms.

Critical Analysis

One potential limitation of ExIQA is that the distortion attribute maps may not fully capture all the complex factors that contribute to human perception of image quality. While the model can highlight specific issues like blur and noise, there may be more subtle or holistic aspects of quality that are not as easily decomposed.

Additionally, the reliance on human-annotated distortion maps for training data could introduce some subjectivity and potential biases into the model's explanations. It would be interesting to explore ways to learn the distortion attributes in a more unsupervised or data-driven way.

That said, the general approach of ExIQA represents an important step towards more explainable and transparent image quality assessment. As AI systems become more widely deployed, there is a growing need for models that can provide clear justifications for their outputs, rather than acting as black boxes. The distortion maps in ExIQA are a promising step in this direction, and the researchers' emphasis on interpretability is commendable.

Conclusion

The ExIQA model proposed in this paper offers a novel approach to image quality assessment that goes beyond simply providing a quality score. By jointly predicting the overall quality and generating explanatory distortion maps, ExIQA aims to make the quality assessment process more transparent and interpretable.

This extra level of explainability could be valuable in a variety of applications where understanding the reasons behind the quality assessment is important, such as image processing, compression, and enhancement algorithms. While the approach has some limitations, it represents an important step towards more explainable and trustworthy AI systems in the domain of visual perception.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

ExIQA: Explainable Image Quality Assessment Using Distortion Attributes

Sepehr Kazemi Ranjbar, Emad Fatemizadeh

Blind Image Quality Assessment (BIQA) aims to develop methods that estimate the quality scores of images in the absence of a reference image. In this paper, we approach BIQA from a distortion identification perspective, where our primary goal is to predict distortion types and strengths using Vision-Language Models (VLMs), such as CLIP, due to their extensive knowledge and generalizability. Based on these predicted distortions, we then estimate the quality score of the image. To achieve this, we propose an explainable approach for distortion identification based on attribute learning. Instead of prompting VLMs with the names of distortions, we prompt them with the attributes or effects of distortions and aggregate this information to infer the distortion strength. Additionally, we consider multiple distortions per image, making our method more scalable. To support this, we generate a dataset consisting of 100,000 images for efficient training. Finally, attribute probabilities are retrieved and fed into a regressor to predict the image quality score. The results show that our approach, besides its explainability and transparency, achieves state-of-the-art (SOTA) performance across multiple datasets in both PLCC and SRCC metrics. Moreover, the zero-shot results demonstrate the generalizability of the proposed approach.

9/12/2024

CLIP-Guided Attribute Aware Pretraining for Generalizable Image Quality Assessment

Daekyu Kwon, Dongyoung Kim, Sehwan Ki, Younghyun Jo, Hyong-Euk Lee, Seon Joo Kim

In no-reference image quality assessment (NR-IQA), the challenge of limited dataset sizes hampers the development of robust and generalizable models. Conventional methods address this issue by utilizing large datasets to extract rich representations for IQA. Also, some approaches propose vision language models (VLM) based IQA, but the domain gap between generic VLM and IQA constrains their scalability. In this work, we propose a novel pretraining framework that constructs a generalizable representation for IQA by selectively extracting quality-related knowledge from VLM and leveraging the scalability of large datasets. Specifically, we carefully select optimal text prompts for five representative image quality attributes and use VLM to generate pseudo-labels. Numerous attribute-aware pseudo-labels can be generated with large image datasets, allowing our IQA model to learn rich representations about image quality. Our approach achieves state-of-the-art performance on multiple IQA datasets and exhibits remarkable generalization capabilities. Leveraging these strengths, we propose several applications, such as evaluating image generation models and training image enhancement models, demonstrating our model's real-world applicability. We will make the code available for access.

6/4/2024

Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment

Fei Zhou, Zhicong Huang, Tianhao Gu, Guoping Qiu

The visual quality of an image is confounded by a number of intertwined factors including its semantic content, distortion characteristics and appearance properties such as brightness, contrast, sharpness, and colourfulness. Distilling high level knowledge about all these quality bearing attributes is crucial for developing objective Image Quality Assessment (IQA).While existing solutions have modeled some of these aspects, a comprehensive solution that involves all these important quality related attributes has not yet been developed. In this paper, we present a new blind IQA (BIQA) model termed Self-supervision and Vision-Language supervision Image QUality Evaluator (SLIQUE) that features a joint vision-language and visual contrastive representation learning framework for acquiring high level knowledge about the images semantic contents, distortion characteristics and appearance properties for IQA. For training SLIQUE, we have developed a systematic approach to constructing a first of its kind large image database annotated with all three categories of quality relevant texts. The Text Annotated Distortion, Appearance and Content (TADAC) database has over 1.6 million images annotated with textual descriptions of their semantic contents, distortion characteristics and appearance properties. The method for constructing TADAC and the database itself will be particularly useful for exploiting vision-language modeling for advanced IQA applications. Extensive experimental results show that SLIQUE has superior performances over state of the art, demonstrating the soundness of its design principle and the effectiveness of its implementation.

6/24/2024

Descriptive Image Quality Assessment in the Wild

Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, Tianfan Xue

With the rapid advancement of Vision Language Models (VLMs), VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression and capture the multifaceted nature of IQA tasks. However, current methods are still far from practical usage. First, prior works focus narrowly on specific sub-tasks or settings, which do not align with diverse real-world applications. Second, their performance is sub-optimal due to limitations in dataset coverage, scale, and quality. To overcome these challenges, we introduce Depicted image Quality Assessment in the Wild (DepictQA-Wild). Our method includes a multi-functional IQA task paradigm that encompasses both assessment and comparison tasks, brief and detailed responses, full-reference and non-reference scenarios. We introduce a ground-truth-informed dataset construction approach to enhance data quality, and scale up the dataset to 495K under the brief-detail joint framework. Consequently, we construct a comprehensive, large-scale, and high-quality dataset, named DQ-495K. We also retain image resolution during training to better handle resolution-related quality issues, and estimate a confidence score that is helpful to filter out low-quality responses. Experimental results demonstrate that DepictQA-Wild significantly outperforms traditional score-based methods, prior VLM-based IQA models, and proprietary GPT-4V in distortion identification, instant rating, and reasoning tasks. Our advantages are further confirmed by real-world applications including assessing the web-downloaded images and ranking model-processed images. Datasets and codes will be released in https://depictqa.github.io/depictqa-wild/.

6/13/2024