Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning

Read original: arXiv:2405.07346 - Published 5/14/2024 by Jiarui Wang, Huiyu Duan, Guangtao Zhai, Xiongkuo Min

🤔

Overview

Researchers have developed a new database and model to better understand and evaluate human preferences for AI-generated images (AIGIs).
The AIGCIQA2023+ database provides human visual preference scores and detailed explanations for AIGIs from multiple perspectives.
The MINT-IQA model uses this database to learn and explain human preferences for AIGIs.

Plain English Explanation

As AI-generated images (AIGIs) become more common, it's important to understand how people feel about them. The researchers created a new database called AIGCIQA2023+ that has ratings and comments from people on the quality, authenticity, and "fit" of different AIGIs.

Using this database, the researchers developed a model called MINT-IQA that can evaluate and explain human preferences for AIGIs. The model looks at AIGIs from multiple angles to understand what people like and don't like about them.

This work is important because it can help improve AI systems that generate images. By understanding what humans prefer, the systems can be fine-tuned to create images that people find more appealing and authentic.

Technical Explanation

The researchers first established the AIGCIQA2023+ database, which includes human preference scores and detailed explanations for AIGIs across three key dimensions: quality, authenticity, and correspondence (how well the image matches the intended subject). This database builds on previous efforts like the PKU-AIGIQA-4K and PCQA databases.

Based on the AIGCIQA2023+ database, the researchers developed the MINT-IQA model. MINT-IQA first learns to evaluate human preferences for AIGIs from multiple perspectives. It then uses a vision-language instruction tuning strategy to attain powerful understanding and explanation capabilities for these preferences. This allows the model to provide detailed feedback that can further improve AI image generation systems.

Experiments show that MINT-IQA achieves state-of-the-art performance in understanding and evaluating human preferences for AIGIs. It also performs well on traditional image quality assessment tasks compared to other leading models.

Critical Analysis

The researchers acknowledge that the AIGCIQA2023+ database and MINT-IQA model have some limitations. For example, the database may not capture the full diversity of human opinions on AIGIs, and the model's performance could be improved with further refinements.

Additionally, while the ability to explain human preferences is valuable, the model's explanations may not always align perfectly with how people actually perceive and judge AIGIs. More research is needed to better understand the nuances of human visual perception in this domain.

Overall, this work represents an important step forward in understanding and evaluating human reactions to AI-generated content. However, there are still many open questions and avenues for future research in this rapidly evolving field.

Conclusion

The AIGCIQA2023+ database and MINT-IQA model developed in this research provide valuable tools for studying human preferences for AI-generated images. By better understanding what people like and don't like about AIGIs, this work can help improve the development of these technologies and ensure they better meet human needs and expectations. As AI image generation continues to advance, this type of research will become increasingly important for shaping the future of visual media.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤔

Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning

Jiarui Wang, Huiyu Duan, Guangtao Zhai, Xiongkuo Min

Artificial Intelligence Generated Content (AIGC) has grown rapidly in recent years, among which AI-based image generation has gained widespread attention due to its efficient and imaginative image creation ability. However, AI-generated Images (AIGIs) may not satisfy human preferences due to their unique distortions, which highlights the necessity to understand and evaluate human preferences for AIGIs. To this end, in this paper, we first establish a novel Image Quality Assessment (IQA) database for AIGIs, termed AIGCIQA2023+, which provides human visual preference scores and detailed preference explanations from three perspectives including quality, authenticity, and correspondence. Then, based on the constructed AIGCIQA2023+ database, this paper presents a MINT-IQA model to evaluate and explain human preferences for AIGIs from Multi-perspectives with INstruction Tuning. Specifically, the MINT-IQA model first learn and evaluate human preferences for AI-generated Images from multi-perspectives, then via the vision-language instruction tuning strategy, MINT-IQA attains powerful understanding and explanation ability for human visual preference on AIGIs, which can be used for feedback to further improve the assessment capabilities. Extensive experimental results demonstrate that the proposed MINT-IQA model achieves state-of-the-art performance in understanding and evaluating human visual preferences for AIGIs, and the proposed model also achieves competing results on traditional IQA tasks compared with state-of-the-art IQA models. The AIGCIQA2023+ database and MINT-IQA model will be released to facilitate future research.

5/14/2024

AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment

Chunyi Li, Tengchuan Kou, Yixuan Gao, Yuqin Cao, Wei Sun, Zicheng Zhang, Yingjie Zhou, Zhichao Zhang, Weixia Zhang, Haoning Wu, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai

With the rapid advancements in AI-Generated Content (AIGC), AI-Generated Images (AIGIs) have been widely applied in entertainment, education, and social media. However, due to the significant variance in quality among different AIGIs, there is an urgent need for models that consistently match human subjective ratings. To address this issue, we organized a challenge towards AIGC quality assessment on NTIRE 2024 that extensively considers 15 popular generative models, utilizing dynamic hyper-parameters (including classifier-free guidance, iteration epochs, and output image resolution), and gather subjective scores that consider perceptual quality and text-to-image alignment altogether comprehensively involving 21 subjects. This approach culminates in the creation of the largest fine-grained AIGI subjective quality database to date with 20,000 AIGIs and 420,000 subjective ratings, known as AIGIQA-20K. Furthermore, we conduct benchmark experiments on this database to assess the correspondence between 16 mainstream AIGI quality models and human perception. We anticipate that this large-scale quality database will inspire robust quality indicators for AIGIs and propel the evolution of AIGC for vision. The database is released on https://www.modelscope.cn/datasets/lcysyzxdxc/AIGCQA-30K-Image.

4/5/2024

🔍

PKU-AIGIQA-4K: A Perceptual Quality Assessment Database for Both Text-to-Image and Image-to-Image AI-Generated Images

Jiquan Yuan, Fanyi Yang, Jihe Li, Xinyan Cao, Jinming Che, Jinlong Lin, Xixin Cao

In recent years, image generation technology has rapidly advanced, resulting in the creation of a vast array of AI-generated images (AIGIs). However, the quality of these AIGIs is highly inconsistent, with low-quality AIGIs severely impairing the visual experience of users. Due to the widespread application of AIGIs, the AI-generated image quality assessment (AIGIQA), aimed at evaluating the quality of AIGIs from the perspective of human perception, has garnered increasing interest among scholars. Nonetheless, current research has not yet fully explored this field. We have observed that existing databases are limited to images generated from single scenario settings. Databases such as AGIQA-1K, AGIQA-3K, and AIGCIQA2023, for example, only include images generated by text-to-image generative models. This oversight highlights a critical gap in the current research landscape, underscoring the need for dedicated databases catering to image-to-image scenarios, as well as more comprehensive databases that encompass a broader range of AI-generated image scenarios. Addressing these issues, we have established a large scale perceptual quality assessment database for both text-to-image and image-to-image AIGIs, named PKU-AIGIQA-4K. We then conduct a well-organized subjective experiment to collect quality labels for AIGIs and perform a comprehensive analysis of the PKU-AIGIQA-4K database. Regarding the use of image prompts during the training process, we propose three image quality assessment (IQA) methods based on pre-trained models that include a no-reference method NR-AIGCIQA, a full-reference method FR-AIGCIQA, and a partial-reference method PR-AIGCIQA. Finally, leveraging the PKU-AIGIQA-4K database, we conduct extensive benchmark experiments and compare the performance of the proposed methods and the current IQA methods.

4/30/2024

Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model

Zhichao Zhang, Xinyue Li, Wei Sun, Jun Jia, Xiongkuo Min, Zicheng Zhang, Chunyi Li, Zijian Chen, Puyi Wang, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Guangtao Zhai

In recent years, artificial intelligence (AI) driven video generation has garnered significant attention due to advancements in stable diffusion and large language model techniques. Thus, there is a great demand for accurate video quality assessment (VQA) models to measure the perceptual quality of AI-generated content (AIGC) videos as well as optimize video generation techniques. However, assessing the quality of AIGC videos is quite challenging due to the highly complex distortions they exhibit (e.g., unnatural action, irrational objects, etc.). Therefore, in this paper, we try to systemically investigate the AIGC-VQA problem from both subjective and objective quality assessment perspectives. For the subjective perspective, we construct a Large-scale Generated Vdeo Quality assessment (LGVQ) dataset, consisting of 2,808 AIGC videos generated by 6 video generation models using 468 carefully selected text prompts. Unlike previous subjective VQA experiments, we evaluate the perceptual quality of AIGC videos from three dimensions: spatial quality, temporal quality, and text-to-video alignment, which hold utmost importance for current video generation techniques. For the objective perspective, we establish a benchmark for evaluating existing quality assessment metrics on the LGVQ dataset, which reveals that current metrics perform poorly on the LGVQ dataset. Thus, we propose a Unify Generated Video Quality assessment (UGVQ) model to comprehensively and accurately evaluate the quality of AIGC videos across three aspects using a unified model, which uses visual, textual and motion features of video and corresponding prompt, and integrates key features to enhance feature expression. We hope that our benchmark can promote the development of quality evaluation metrics for AIGC videos. The LGVQ dataset and the UGVQ metric will be publicly released.

8/1/2024