PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition

Read original: arXiv:2404.13299 - Published 4/23/2024 by Xi Fang, Weigang Wang, Xiaoxin Lv, Jun Yan

PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition

Overview

This paper proposes a new method called PCQA (Prompt Condition Quality Assessment) for evaluating the quality of AI-generated content (AIGC).
PCQA uses the prompt condition, or the input provided to the AI model, as the basis for assessing the quality of the generated output.
The authors argue that this approach provides a strong baseline for AIGC quality assessment, outperforming existing methods.

Plain English Explanation

The paper introduces a new way to judge the quality of content generated by AI models. The key idea is to focus on the prompt, which is the input given to the AI model to generate the output. By analyzing the prompt and comparing it to the generated output, the PCQA method can assess how well the AI did at completing the task.

This is different from other approaches that might look at the generated output in isolation, without considering the original prompt. The authors believe that the prompt condition is a crucial factor in determining the quality of the AI-generated content. Exploring AIGC Video Quality: Focus on Visual Harmony and AIGIQA-20K: A Large Database of AI-Generated Image Quality Assessment are examples of other research looking at AIGC quality assessment.

The PCQA method aims to provide a strong baseline for evaluating AIGC, which could be useful for improving AI models and ensuring the quality of their outputs.

Technical Explanation

The PCQA method works by comparing the input prompt to the generated output to assess its quality. The authors argue that the prompt condition is a crucial factor that has been overlooked in previous AIGC quality assessment approaches.

PCQA uses a neural network architecture that takes the prompt and the generated output as inputs and produces a quality score. The model is trained on a dataset of prompts and their corresponding high-quality and low-quality outputs, allowing it to learn the relationship between the prompt, the generated content, and the quality assessment.

The authors evaluate PCQA on several AIGC tasks, including Cross-Modal Generative Semantic Communications for Mobile AIGC, Enhancing Visual Question Answering through Question-Driven Attention, and Quality Assessment of Prompts Used for Code Generation. The results show that PCQA outperforms existing AIGC quality assessment methods, demonstrating its effectiveness as a strong baseline.

Critical Analysis

The PCQA approach is a promising new method for AIGC quality assessment, as it focuses on a key factor that has been overlooked in previous research: the prompt condition. By considering the input prompt along with the generated output, PCQA provides a more holistic and context-aware way to evaluate the quality of AI-generated content.

However, the paper does not discuss the limitations of the PCQA method or potential areas for further research. For example, it would be interesting to explore how PCQA performs on a wider range of AIGC tasks and datasets, and to investigate its sensitivity to different types of prompts and generated outputs.

Additionally, the authors could have provided more details on the neural network architecture and training process used for PCQA, as well as a more thorough comparison to other AIGC quality assessment methods beyond the reported results.

Overall, the PCQA approach represents a valuable contribution to the field of AIGC quality assessment, but there is still room for further exploration and refinement of the method.

Conclusion

The PCQA method proposed in this paper offers a strong baseline for evaluating the quality of AI-generated content by focusing on the prompt condition. By considering the input prompt along with the generated output, PCQA provides a more comprehensive and context-aware approach to quality assessment compared to previous methods.

The results demonstrate the effectiveness of PCQA across several AIGC tasks, suggesting that this approach could be valuable for improving AI models and ensuring the quality of their outputs. While the paper does not address the limitations of the method, the PCQA framework represents an important step forward in the field of AIGC quality assessment and could inspire further advancements in this area.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition

Xi Fang, Weigang Wang, Xiaoxin Lv, Jun Yan

The development of Large Language Models (LLM) and Diffusion Models brings the boom of Artificial Intelligence Generated Content (AIGC). It is essential to build an effective quality assessment framework to provide a quantifiable evaluation of different images or videos based on the AIGC technologies. The content generated by AIGC methods is driven by the crafted prompts. Therefore, it is intuitive that the prompts can also serve as the foundation of the AIGC quality assessment. This study proposes an effective AIGC quality assessment (QA) framework. First, we propose a hybrid prompt encoding method based on a dual-source CLIP (Contrastive Language-Image Pre-Training) text encoder to understand and respond to the prompt conditions. Second, we propose an ensemble-based feature mixer module to effectively blend the adapted prompt and vision features. The empirical study practices in two datasets: AIGIQA-20K (AI-Generated Image Quality Assessment database) and T2VQA-DB (Text-to-Video Quality Assessment DataBase), which validates the effectiveness of our proposed method: Prompt Condition Quality Assessment (PCQA). Our proposed simple and feasible framework may promote research development in the multimodal generation field.

4/23/2024

Bringing Textual Prompt to AI-Generated Image Quality Assessment

Bowen Qu, Haohui Li, Wei Gao

AI-Generated Images (AGIs) have inherent multimodal nature. Unlike traditional image quality assessment (IQA) on natural scenarios, AGIs quality assessment (AGIQA) takes the correspondence of image and its textual prompt into consideration. This is coupled in the ground truth score, which confuses the unimodal IQA methods. To solve this problem, we introduce IP-IQA (AGIs Quality Assessment via Image and Prompt), a multimodal framework for AGIQA via corresponding image and prompt incorporation. Specifically, we propose a novel incremental pretraining task named Image2Prompt for better understanding of AGIs and their corresponding textual prompts. An effective and efficient image-prompt fusion module, along with a novel special [QA] token, are also applied. Both are plug-and-play and beneficial for the cooperation of image and its corresponding prompt. Experiments demonstrate that our IP-IQA achieves the state-of-the-art on AGIQA-1k and AGIQA-3k datasets. Code will be available at https://github.com/Coobiw/IP-IQA.

5/22/2024

Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model

Zhichao Zhang, Xinyue Li, Wei Sun, Jun Jia, Xiongkuo Min, Zicheng Zhang, Chunyi Li, Zijian Chen, Puyi Wang, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Guangtao Zhai

In recent years, artificial intelligence (AI) driven video generation has garnered significant attention due to advancements in stable diffusion and large language model techniques. Thus, there is a great demand for accurate video quality assessment (VQA) models to measure the perceptual quality of AI-generated content (AIGC) videos as well as optimize video generation techniques. However, assessing the quality of AIGC videos is quite challenging due to the highly complex distortions they exhibit (e.g., unnatural action, irrational objects, etc.). Therefore, in this paper, we try to systemically investigate the AIGC-VQA problem from both subjective and objective quality assessment perspectives. For the subjective perspective, we construct a Large-scale Generated Vdeo Quality assessment (LGVQ) dataset, consisting of 2,808 AIGC videos generated by 6 video generation models using 468 carefully selected text prompts. Unlike previous subjective VQA experiments, we evaluate the perceptual quality of AIGC videos from three dimensions: spatial quality, temporal quality, and text-to-video alignment, which hold utmost importance for current video generation techniques. For the objective perspective, we establish a benchmark for evaluating existing quality assessment metrics on the LGVQ dataset, which reveals that current metrics perform poorly on the LGVQ dataset. Thus, we propose a Unify Generated Video Quality assessment (UGVQ) model to comprehensively and accurately evaluate the quality of AIGC videos across three aspects using a unified model, which uses visual, textual and motion features of video and corresponding prompt, and integrates key features to enhance feature expression. We hope that our benchmark can promote the development of quality evaluation metrics for AIGC videos. The LGVQ dataset and the UGVQ metric will be publicly released.

8/1/2024

Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment

Jun Fu, Wei Zhou, Qiuping Jiang, Hantao Liu, Guangtao Zhai

Recently, textual prompt tuning has shown inspirational performance in adapting Contrastive Language-Image Pre-training (CLIP) models to natural image quality assessment. However, such uni-modal prompt learning method only tunes the language branch of CLIP models. This is not enough for adapting CLIP models to AI generated image quality assessment (AGIQA) since AGIs visually differ from natural images. In addition, the consistency between AGIs and user input text prompts, which correlates with the perceptual quality of AGIs, is not investigated to guide AGIQA. In this letter, we propose vision-language consistency guided multi-modal prompt learning for blind AGIQA, dubbed CLIP-AGIQA. Specifically, we introduce learnable textual and visual prompts in language and vision branches of CLIP models, respectively. Moreover, we design a text-to-image alignment quality prediction task, whose learned vision-language consistency knowledge is used to guide the optimization of the above multi-modal prompts. Experimental results on two public AGIQA datasets demonstrate that the proposed method outperforms state-of-the-art quality assessment models. The source code is available at https://github.com/JunFu1995/CLIP-AGIQA.

6/26/2024