Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

Read original: arXiv:2404.17762 - Published 4/30/2024 by Puyi Wang, Wei Sun, Zicheng Zhang, Jun Jia, Yanwei Jiang, Zhichao Zhang, Xiongkuo Min, Guangtao Zhai

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

Overview

Presents a novel approach for assessing the quality of AI-generated images using a large multi-modality model
Introduces the AIGIQA-20K dataset, a large-scale database of AI-generated images
Proposes a "Mixture of Experts" model that combines multiple modalities to provide accurate and robust image quality predictions

Plain English Explanation

The paper addresses the challenge of evaluating the quality of images generated by AI systems. As AI technology advances, the ability to generate highly realistic and compelling images has improved dramatically. However, assessing the quality of these AI-generated images is crucial for various applications, such as content creation, digital art, and photo editing.

The researchers developed a novel approach that leverages a large multi-modality model to assess the quality of AI-generated images. This means they combined multiple types of information, such as image features, textual descriptions, and other relevant data, to create a more comprehensive and accurate quality assessment system.

To support this research, the team also introduced the AIGIQA-20K dataset, a large-scale collection of AI-generated images. This dataset provides a diverse and representative sample of AI-generated content, which is essential for training and evaluating the quality assessment model.

The key innovation in this paper is the "Mixture of Experts" model, which combines multiple specialized sub-models (or "experts") to make a final quality prediction. Each expert focuses on a particular aspect of image quality, such as realism, aesthetics, or semantic coherence. By integrating these diverse perspectives, the model can provide more accurate and reliable assessments of AI-generated images.

Technical Explanation

The paper presents a novel approach for assessing the quality of AI-generated images using a large multi-modality model. The core of the proposed system is a "Mixture of Experts" model, which combines multiple specialized sub-models to make a final quality prediction.

To support this research, the authors introduced the AIGIQA-20K dataset, a large-scale database of AI-generated images. This dataset contains a diverse range of AI-generated content, including images from various sources and with varying levels of quality.

The Mixture of Experts model integrates multiple modalities, such as image features, textual descriptions, and other relevant information, to provide a comprehensive assessment of image quality. Each expert sub-model focuses on a specific aspect of quality, such as realism, aesthetics, or semantic coherence. By combining the outputs of these specialized experts, the final model can make more accurate and robust predictions.

The authors also conducted extensive experiments to evaluate the performance of their proposed approach. They compared the Mixture of Experts model to various baseline methods, including AMBA, PKU-AIGIQA-4K, and Multi-Modal Prompt Learning on the AIGIQA-20K dataset. The results demonstrate the superiority of the proposed Mixture of Experts model in terms of accuracy and robustness for assessing the quality of AI-generated images.

Critical Analysis

The paper presents a well-designed and comprehensive approach to assessing the quality of AI-generated images. The introduction of the AIGIQA-20K dataset is a significant contribution, as it provides a large and diverse set of AI-generated images for training and evaluating quality assessment models.

The Mixture of Experts model is a compelling innovation that leverages the strengths of multiple specialized sub-models to provide a more accurate and reliable quality assessment. This approach is particularly relevant as the complexity and diversity of AI-generated content continue to increase, requiring more sophisticated evaluation methods.

However, the paper does not address the potential biases and limitations inherent in the dataset or the model itself. The authors should discuss the potential issues, such as dataset imbalances, representation biases, or the model's sensitivity to different types of AI-generated content. Additionally, the paper could explore the generalization of the proposed approach to other domains or types of AI-generated content beyond images.

Furthermore, the paper could delve deeper into the interpretability and explainability of the Mixture of Experts model. Understanding the underlying reasons for the model's quality assessments could be valuable for users and stakeholders, especially in critical applications where transparency and accountability are paramount.

Conclusion

The paper presents a novel approach for assessing the quality of AI-generated images using a large multi-modality model. The key contributions include the AIGIQA-20K dataset and the Mixture of Experts model, which combines multiple specialized sub-models to provide accurate and robust quality predictions.

The research addresses an important challenge in the field of AI-generated content, as the quality and diversity of such content continue to grow. The proposed approach demonstrates the potential of leveraging large multi-modality models and ensemble techniques to deliver more comprehensive and reliable quality assessments.

While the paper presents promising results, there are opportunities for further research, including addressing potential biases and limitations, exploring the generalization of the approach to other domains, and enhancing the interpretability and explainability of the Mixture of Experts model. Overall, this work represents a significant step forward in the field of AI-generated image quality assessment.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

Puyi Wang, Wei Sun, Zicheng Zhang, Jun Jia, Yanwei Jiang, Zhichao Zhang, Xiongkuo Min, Guangtao Zhai

Traditional deep neural network (DNN)-based image quality assessment (IQA) models leverage convolutional neural networks (CNN) or Transformer to learn the quality-aware feature representation, achieving commendable performance on natural scene images. However, when applied to AI-Generated images (AGIs), these DNN-based IQA models exhibit subpar performance. This situation is largely due to the semantic inaccuracies inherent in certain AGIs caused by uncontrollable nature of the generation process. Thus, the capability to discern semantic content becomes crucial for assessing the quality of AGIs. Traditional DNN-based IQA models, constrained by limited parameter complexity and training data, struggle to capture complex fine-grained semantic features, making it challenging to grasp the existence and coherence of semantic content of the entire image. To address the shortfall in semantic content perception of current IQA models, we introduce a large Multi-modality model Assisted AI-Generated Image Quality Assessment (MA-AGIQA) model, which utilizes semantically informed guidance to sense semantic information and extract semantic vectors through carefully designed text prompts. Moreover, it employs a mixture of experts (MoE) structure to dynamically integrate the semantic information with the quality-aware features extracted by traditional DNN-based IQA models. Comprehensive experiments conducted on two AI-generated content datasets, AIGCQA-20k and AGIQA-3k show that MA-AGIQA achieves state-of-the-art performance, and demonstrate its superior generalization capabilities on assessing the quality of AGIs. Code is available at https://github.com/wangpuyi/MA-AGIQA.

4/30/2024

Bringing Textual Prompt to AI-Generated Image Quality Assessment

Bowen Qu, Haohui Li, Wei Gao

AI-Generated Images (AGIs) have inherent multimodal nature. Unlike traditional image quality assessment (IQA) on natural scenarios, AGIs quality assessment (AGIQA) takes the correspondence of image and its textual prompt into consideration. This is coupled in the ground truth score, which confuses the unimodal IQA methods. To solve this problem, we introduce IP-IQA (AGIs Quality Assessment via Image and Prompt), a multimodal framework for AGIQA via corresponding image and prompt incorporation. Specifically, we propose a novel incremental pretraining task named Image2Prompt for better understanding of AGIs and their corresponding textual prompts. An effective and efficient image-prompt fusion module, along with a novel special [QA] token, are also applied. Both are plug-and-play and beneficial for the cooperation of image and its corresponding prompt. Experiments demonstrate that our IP-IQA achieves the state-of-the-art on AGIQA-1k and AGIQA-3k datasets. Code will be available at https://github.com/Coobiw/IP-IQA.

5/22/2024

✨

Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment

Tianwei Zhou, Songbai Tan, Wei Zhou, Yu Luo, Yuan-Gen Wang, Guanghui Yue

With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a novel blind image quality assessment (IQA) network, named AMFF-Net, for AGIs. AMFF-Net evaluates AGI quality from three dimensions, i.e., visual quality, authenticity, and consistency. Specifically, inspired by the characteristics of the human visual system and motivated by the observation that visual quality and authenticity are characterized by both local and global aspects, AMFF-Net scales the image up and down and takes the scaled images and original-sized image as the inputs to obtain multi-scale features. After that, an Adaptive Feature Fusion (AFF) block is used to adaptively fuse the multi-scale features with learnable weights. In addition, considering the correlation between the image and prompt, AMFF-Net compares the semantic features from text encoder and image encoder to evaluate the text-to-image alignment. We carry out extensive experiments on three AGI quality assessment databases, and the experimental results show that our AMFF-Net obtains better performance than nine state-of-the-art blind IQA methods. The results of ablation experiments further demonstrate the effectiveness of the proposed multi-scale input strategy and AFF block.

4/24/2024

AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment

Chunyi Li, Tengchuan Kou, Yixuan Gao, Yuqin Cao, Wei Sun, Zicheng Zhang, Yingjie Zhou, Zhichao Zhang, Weixia Zhang, Haoning Wu, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai

With the rapid advancements in AI-Generated Content (AIGC), AI-Generated Images (AIGIs) have been widely applied in entertainment, education, and social media. However, due to the significant variance in quality among different AIGIs, there is an urgent need for models that consistently match human subjective ratings. To address this issue, we organized a challenge towards AIGC quality assessment on NTIRE 2024 that extensively considers 15 popular generative models, utilizing dynamic hyper-parameters (including classifier-free guidance, iteration epochs, and output image resolution), and gather subjective scores that consider perceptual quality and text-to-image alignment altogether comprehensively involving 21 subjects. This approach culminates in the creation of the largest fine-grained AIGI subjective quality database to date with 20,000 AIGIs and 420,000 subjective ratings, known as AIGIQA-20K. Furthermore, we conduct benchmark experiments on this database to assess the correspondence between 16 mainstream AIGI quality models and human perception. We anticipate that this large-scale quality database will inspire robust quality indicators for AIGIs and propel the evolution of AIGC for vision. The database is released on https://www.modelscope.cn/datasets/lcysyzxdxc/AIGCQA-30K-Image.

4/5/2024