Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity

Read original: arXiv:2312.06158 - Published 5/28/2024 by Xudong Li, Timin Gao, Runze Hu, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Jingyuan Zheng, Yunhang Shen, Ke Li, Yutao Liu and 2 others

Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity

Overview

This paper proposes an adaptive feature selection method for no-reference image quality assessment (NR-IQA) using a contrastive mitigating semantic noise sensitivity approach.
The method aims to improve the performance of NR-IQA models by selecting relevant features while mitigating the impact of semantic noise.

Plain English Explanation

The paper presents a new way to assess the quality of images without having a reference image to compare it to. This is important because in many real-world situations, we don't have a perfect reference image to compare against.

The key idea is to use a <a href="https://aimodels.fyi/papers/arxiv/you-only-train-once-unified-framework-both">contrastive learning approach</a> to select the most relevant features for assessing image quality. Contrastive learning helps the model focus on the important differences between high and low quality images, rather than getting distracted by irrelevant details.

The method also tries to reduce the impact of "semantic noise" - things in the image that are meaningful to humans but don't actually affect the perceived quality. For example, the presence of text or logos might be semantically meaningful but not relevant to assessing image quality.

By adaptively selecting the right features and mitigating semantic noise, the authors show their method can outperform other state-of-the-art NR-IQA models, especially on challenging datasets. This could lead to better automatic image quality assessment in applications like photography, video streaming, and image processing.

Technical Explanation

The paper proposes an <a href="https://aimodels.fyi/papers/arxiv/exploring-vulnerabilities-no-reference-image-quality-assessment">adaptive feature selection</a> method for no-reference image quality assessment (NR-IQA) that leverages a contrastive learning approach to mitigate the impact of semantic noise.

The key elements of the method are:

Vision Transformer-based Feature Extraction: The model uses a Vision Transformer (ViT) to extract multi-scale visual features from the input image. ViT is well-suited for this task as it can capture both local and global visual information.
Contrastive Feature Selection: A contrastive learning approach is used to select the most relevant features for IQA. The model is trained to maximize the similarity between features of high-quality images and minimize the similarity between features of high- and low-quality images.
Semantic Noise Mitigation: The model includes a semantic noise mitigation module that learns to suppress the influence of semantically meaningful but quality-irrelevant information (e.g., text, logos) on the final quality prediction.

The authors evaluate their method on several challenging NR-IQA datasets and show that it outperforms state-of-the-art approaches, especially on datasets with diverse content and semantic noise. This demonstrates the effectiveness of the adaptive feature selection and semantic noise mitigation components.

Critical Analysis

The paper makes a compelling case for the need to address semantic noise in NR-IQA models and proposes a novel solution using contrastive learning. However, there are a few potential limitations and areas for further research:

Generalization to New Domains: While the method shows strong performance on the evaluated datasets, it's unclear how well it would generalize to new domains or types of image distortions. Further testing on a wider range of datasets would help assess the broader applicability of the approach.
Interpretability of Selected Features: The paper doesn't provide much insight into which specific features the model learns to prioritize and how they relate to human perception of image quality. Improving the interpretability of the selected features could lead to better model understanding and potential refinements.
Comparison to Human Judgments: The evaluation focuses on model performance on existing IQA datasets, but doesn't directly compare the model's quality assessments to human judgments. Validating the model's outputs against human raters could provide additional insights into its strengths and limitations.
Computational Complexity: The use of a Vision Transformer and contrastive learning approach may incur higher computational costs compared to some simpler NR-IQA models. The trade-offs between performance and efficiency should be carefully considered for real-world applications.

Overall, the paper presents an innovative and promising approach to address a key challenge in NR-IQA. Further research exploring the areas mentioned above could help strengthen the method and unlock its full potential.

Conclusion

This paper introduces an adaptive feature selection method for no-reference image quality assessment that leverages contrastive learning to mitigate the impact of semantic noise. By carefully selecting the most relevant features and suppressing the influence of quality-irrelevant information, the proposed approach demonstrates improved performance on challenging IQA datasets compared to state-of-the-art models.

The key contributions of this work include the use of a Vision Transformer-based feature extractor, a contrastive learning-based feature selection mechanism, and a semantic noise mitigation module. These elements work together to create a robust and reliable NR-IQA model that can be valuable in a wide range of applications, from photography to video streaming and image processing.

While the paper presents promising results, further research is needed to fully understand the method's generalization capabilities, interpretability, and computational efficiency. Exploring these areas could lead to even more advanced and practical NR-IQA solutions that bring us closer to reliable automatic image quality assessment.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity

Xudong Li, Timin Gao, Runze Hu, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Jingyuan Zheng, Yunhang Shen, Ke Li, Yutao Liu, Pingyang Dai, Rongrong Ji

The current state-of-the-art No-Reference Image Quality Assessment (NR-IQA) methods typically rely on feature extraction from upstream semantic backbone networks, assuming that all extracted features are relevant. However, we make a key observation that not all features are beneficial, and some may even be harmful, necessitating careful selection. Empirically, we find that many image pairs with small feature spatial distances can have vastly different quality scores, indicating that the extracted features may contain a significant amount of quality-irrelevant noise. To address this issue, we propose a Quality-Aware Feature Matching IQA Metric (QFM-IQM) that employs an adversarial perspective to remove harmful semantic noise features from the upstream task. Specifically, QFM-IQM enhances the semantic noise distinguish capabilities by matching image pairs with similar quality scores but varying semantic features as adversarial semantic noise and adaptively adjusting the upstream task's features by reducing sensitivity to adversarial noise perturbation. Furthermore, we utilize a distillation framework to expand the dataset and improve the model's generalization ability. Our approach achieves superior performance to the state-of-the-art NR-IQA methods on eight standard IQA datasets.

5/28/2024

Exploring Vulnerabilities of No-Reference Image Quality Assessment Models: A Query-Based Black-Box Method

Chenxi Yang, Yujia Liu, Dingquan Li, Tingting Jiang

No-Reference Image Quality Assessment (NR-IQA) aims to predict image quality scores consistent with human perception without relying on pristine reference images, serving as a crucial component in various visual tasks. Ensuring the robustness of NR-IQA methods is vital for reliable comparisons of different image processing techniques and consistent user experiences in recommendations. The attack methods for NR-IQA provide a powerful instrument to test the robustness of NR-IQA. However, current attack methods of NR-IQA heavily rely on the gradient of the NR-IQA model, leading to limitations when the gradient information is unavailable. In this paper, we present a pioneering query-based black box attack against NR-IQA methods. We propose the concept of score boundary and leverage an adaptive iterative approach with multiple score boundaries. Meanwhile, the initial attack directions are also designed to leverage the characteristics of the Human Visual System (HVS). Experiments show our method outperforms all compared state-of-the-art attack methods and is far ahead of previous black-box methods. The effective NR-IQA model DBCNN suffers a Spearman's rank-order correlation coefficient (SROCC) decline of 0.6381 attacked by our method, revealing the vulnerability of NR-IQA models to black-box attacks. The proposed attack method also provides a potent tool for further exploration into NR-IQA robustness.

4/29/2024

🖼️

You Only Train Once: A Unified Framework for Both Full-Reference and No-Reference Image Quality Assessment

Yi Ke Yun, Weisi Lin

Although recent efforts in image quality assessment (IQA) have achieved promising performance, there still exists a considerable gap compared to the human visual system (HVS). One significant disparity lies in humans' seamless transition between full reference (FR) and no reference (NR) tasks, whereas existing models are constrained to either FR or NR tasks. This disparity implies the necessity of designing two distinct systems, thereby greatly diminishing the model's versatility. Therefore, our focus lies in unifying FR and NR IQA under a single framework. Specifically, we first employ an encoder to extract multi-level features from input images. Then a Hierarchical Attention (HA) module is proposed as a universal adapter for both FR and NR inputs to model the spatial distortion at each encoder stage. Furthermore, considering that different distortions contaminate encoder stages and damage image semantic meaning differently, a Semantic Distortion Aware (SDA) module is proposed to examine feature correlations between shallow and deep layers of the encoder. By adopting HA and SDA, the proposed network can effectively perform both FR and NR IQA. When our proposed model is independently trained on NR or FR IQA tasks, it outperforms existing models and achieves state-of-the-art performance. Moreover, when trained jointly on NR and FR IQA tasks, it further enhances the performance of NR IQA while achieving on-par performance in the state-of-the-art FR IQA. You only train once to perform both IQA tasks. Code will be released at: https://github.com/BarCodeReader/YOTO.

4/9/2024

🤷

Cross-IQA: Unsupervised Learning for Image Quality Assessment

Zhen Zhang

Automatic perception of image quality is a challenging problem that impacts billions of Internet and social media users daily. To advance research in this field, we propose a no-reference image quality assessment (NR-IQA) method termed Cross-IQA based on vision transformer(ViT) model. The proposed Cross-IQA method can learn image quality features from unlabeled image data. We construct the pretext task of synthesized image reconstruction to unsupervised extract the image quality information based ViT block. The pretrained encoder of Cross-IQA is used to fine-tune a linear regression model for score prediction. Experimental results show that Cross-IQA can achieve state-of-the-art performance in assessing the low-frequency degradation information (e.g., color change, blurring, etc.) of images compared with the classical full-reference IQA and NR-IQA under the same datasets.

5/8/2024