Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment

Read original: arXiv:2404.15163 - Published 4/24/2024 by Tianwei Zhou, Songbai Tan, Wei Zhou, Yu Luo, Yuan-Gen Wang, Guanghui Yue

✨

Overview

As text-to-image and image-to-image generative models mature, AI-generated images (AGIs) have shown great potential in various applications.
However, there has been limited research on designing quality assessment models for these AGIs.
This paper proposes a novel blind image quality assessment (IQA) network called AMFF-Net to evaluate the quality of AGIs.

Plain English Explanation

The paper focuses on developing a way to automatically assess the quality of AI-generated images (AGIs). As AI models for creating images have become more advanced, these AI-generated images are being used in many areas like advertising, entertainment, and education. However, there hasn't been much research on how to evaluate the quality of these AI-generated images.

The researchers propose a new network called AMFF-Net that can assess the quality of AGIs from three key perspectives: visual quality, authenticity, and consistency. AMFF-Net: Adaptive Multi-scale Feature Fusion Network for Blind Image Quality Assessment of AI-Generated Images

The network works by taking the original AI-generated image and also scaling it up and down. It uses these multi-scale versions of the image to extract visual features. An "adaptive feature fusion" component then combines these multi-scale features in a smart way. Additionally, AMFF-Net compares the semantic features of the image to the text prompt used to generate it, to check how well the image matches the prompt.

The key ideas are using multi-scale inputs to capture local and global image qualities, adaptively combining these features, and aligning the image to the text prompt. This allows AMFF-Net to provide a more comprehensive assessment of AGI quality compared to previous approaches.

Technical Explanation

The paper proposes the AMFF-Net architecture for blind image quality assessment of AI-generated images (AGIs). AMFF-Net evaluates AGI quality from three perspectives: visual quality, authenticity, and consistency.

To capture both local and global visual quality characteristics, AMFF-Net takes the original AGI as well as scaled up and down versions as inputs. These multi-scale images are fed into a convolutional neural network to extract visual features. An Adaptive Feature Fusion (AFF) block is then used to adaptively combine the multi-scale features, learning optimal fusion weights.

Additionally, AMFF-Net compares the semantic features extracted from the image and the text prompt used to generate it. This allows the model to assess the alignment between the image content and the underlying text description.

The paper conducts extensive experiments on three AGI quality assessment datasets. The results show that AMFF-Net outperforms nine state-of-the-art blind IQA methods. Ablation studies further demonstrate the effectiveness of the multi-scale input strategy and AFF block.

Critical Analysis

The paper presents a novel and comprehensive approach to assessing the quality of AI-generated images. By considering visual quality, authenticity, and consistency, as well as the alignment between image and text, AMFF-Net provides a more holistic evaluation compared to prior blind IQA methods.

However, the paper does not discuss potential limitations or caveats of the proposed approach. For example, it's unclear how AMFF-Net would perform on highly diverse or unconventional AGI datasets, or how sensitive the model is to different types of AI-based image generation techniques.

Additionally, the paper could have explored potential biases or failure cases of the AMFF-Net model, and discussed avenues for further research to address these issues. AGFSYNC: Leveraging AI-Generated Feedback for Preference Optimization, Multi-Level Aggregation Recursive Alignment Architecture for Efficient, and AIGIQA-20K: A Large Database of AI-Generated Images are relevant papers that could provide additional context.

Overall, the AMFF-Net model presents an interesting and promising approach to AGI quality assessment, but the paper could have delved deeper into the limitations and future research directions.

Conclusion

This paper introduces AMFF-Net, a novel blind image quality assessment network for evaluating the quality of AI-generated images (AGIs). By considering visual quality, authenticity, and consistency, as well as the alignment between image and text, AMFF-Net provides a more comprehensive quality assessment compared to previous methods.

The key innovations of AMFF-Net include using multi-scale image inputs to capture local and global features, adaptively fusing these multi-scale features, and leveraging the relationship between the image and its generating text prompt. Extensive experiments demonstrate the superior performance of AMFF-Net on AGI quality assessment tasks.

The research presented in this paper represents an important step towards developing robust and reliable quality evaluation tools for the growing field of AI-generated content. As text-to-image and image-to-image models continue to advance, such quality assessment frameworks will be crucial for ensuring the responsible and ethical deployment of these generative AI technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

✨

Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment

Tianwei Zhou, Songbai Tan, Wei Zhou, Yu Luo, Yuan-Gen Wang, Guanghui Yue

With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a novel blind image quality assessment (IQA) network, named AMFF-Net, for AGIs. AMFF-Net evaluates AGI quality from three dimensions, i.e., visual quality, authenticity, and consistency. Specifically, inspired by the characteristics of the human visual system and motivated by the observation that visual quality and authenticity are characterized by both local and global aspects, AMFF-Net scales the image up and down and takes the scaled images and original-sized image as the inputs to obtain multi-scale features. After that, an Adaptive Feature Fusion (AFF) block is used to adaptively fuse the multi-scale features with learnable weights. In addition, considering the correlation between the image and prompt, AMFF-Net compares the semantic features from text encoder and image encoder to evaluate the text-to-image alignment. We carry out extensive experiments on three AGI quality assessment databases, and the experimental results show that our AMFF-Net obtains better performance than nine state-of-the-art blind IQA methods. The results of ablation experiments further demonstrate the effectiveness of the proposed multi-scale input strategy and AFF block.

4/24/2024

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

Puyi Wang, Wei Sun, Zicheng Zhang, Jun Jia, Yanwei Jiang, Zhichao Zhang, Xiongkuo Min, Guangtao Zhai

Traditional deep neural network (DNN)-based image quality assessment (IQA) models leverage convolutional neural networks (CNN) or Transformer to learn the quality-aware feature representation, achieving commendable performance on natural scene images. However, when applied to AI-Generated images (AGIs), these DNN-based IQA models exhibit subpar performance. This situation is largely due to the semantic inaccuracies inherent in certain AGIs caused by uncontrollable nature of the generation process. Thus, the capability to discern semantic content becomes crucial for assessing the quality of AGIs. Traditional DNN-based IQA models, constrained by limited parameter complexity and training data, struggle to capture complex fine-grained semantic features, making it challenging to grasp the existence and coherence of semantic content of the entire image. To address the shortfall in semantic content perception of current IQA models, we introduce a large Multi-modality model Assisted AI-Generated Image Quality Assessment (MA-AGIQA) model, which utilizes semantically informed guidance to sense semantic information and extract semantic vectors through carefully designed text prompts. Moreover, it employs a mixture of experts (MoE) structure to dynamically integrate the semantic information with the quality-aware features extracted by traditional DNN-based IQA models. Comprehensive experiments conducted on two AI-generated content datasets, AIGCQA-20k and AGIQA-3k show that MA-AGIQA achieves state-of-the-art performance, and demonstrate its superior generalization capabilities on assessing the quality of AGIs. Code is available at https://github.com/wangpuyi/MA-AGIQA.

4/30/2024

Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics

Zhangkai Ni, Yue Liu, Keyan Ding, Wenhan Yang, Hanli Wang, Shiqi Wang

Deep learning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-effective for training but face challenges in effectively extracting features aligned with human visual perception. To bridge these gaps, we propose integrating deep features from pre-trained visual models with a statistical analysis model into a Multi-scale Deep Feature Statistics (MDFS) model for achieving opinion-unaware BIQA (OU-BIQA), thereby eliminating the reliance on human rating data and significantly improving training efficiency. Specifically, we extract patch-wise multi-scale features from pre-trained vision models, which are subsequently fitted into a multivariate Gaussian (MVG) model. The final quality score is determined by quantifying the distance between the MVG model derived from the test image and the benchmark MVG model derived from the high-quality image set. A comprehensive series of experiments conducted on various datasets show that our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models. Furthermore, it shows improved generalizability across diverse target-specific BIQA tasks. Our code is available at: https://github.com/eezkni/MDFS

5/30/2024

Bringing Textual Prompt to AI-Generated Image Quality Assessment

Bowen Qu, Haohui Li, Wei Gao

AI-Generated Images (AGIs) have inherent multimodal nature. Unlike traditional image quality assessment (IQA) on natural scenarios, AGIs quality assessment (AGIQA) takes the correspondence of image and its textual prompt into consideration. This is coupled in the ground truth score, which confuses the unimodal IQA methods. To solve this problem, we introduce IP-IQA (AGIs Quality Assessment via Image and Prompt), a multimodal framework for AGIQA via corresponding image and prompt incorporation. Specifically, we propose a novel incremental pretraining task named Image2Prompt for better understanding of AGIs and their corresponding textual prompts. An effective and efficient image-prompt fusion module, along with a novel special [QA] token, are also applied. Both are plug-and-play and beneficial for the cooperation of image and its corresponding prompt. Experiments demonstrate that our IP-IQA achieves the state-of-the-art on AGIQA-1k and AGIQA-3k datasets. Code will be available at https://github.com/Coobiw/IP-IQA.

5/22/2024