Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images

Read original: arXiv:2405.09426 - Published 5/17/2024 by Memoona Aziz, Umair Rehman, Muhammad Umair Danish, Katarina Grolinger

Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images

Overview

• This paper introduces a new metric called the Global-Local Image Perceptual Score (GLIPS) for evaluating the photorealistic quality of AI-generated images.

• GLIPS combines global and local image features to provide a more comprehensive assessment of visual realism compared to existing metrics.

• The authors demonstrate that GLIPS correlates better with human judgments of image quality than other commonly used metrics, such as Similarity Metrics for Mixed-Reality Image-to-Image Translation and A Semantic Approach to Quantifying Consistency in Diffusion Models.

Plain English Explanation

The paper presents a new way to measure how realistic AI-generated images look. Existing methods focus on either the overall image or specific local details, but GLIPS combines both global and local features to provide a more comprehensive assessment.

The key insight is that realistic images need to look good both at a high level (e.g., natural-looking composition, lighting, color) and in the fine details (e.g., textures, edges, small objects). By considering both aspects, GLIPS can more accurately capture human judgments of image quality compared to other metrics.

This is important as the field of AI-generated imagery continues to advance, and being able to reliably assess photorealism becomes crucial for applications like How to Evaluate Semantic Communications of Images with VITScore, LIPSim: A Provably Robust Perceptual Similarity Metric, and Mixture of Low-rank Experts for Transferable AI-Generated content.

Technical Explanation

The paper proposes the Global-Local Image Perceptual Score (GLIPS) as a new metric for evaluating the photorealistic quality of AI-generated images. GLIPS combines global and local image features to provide a more comprehensive assessment of visual realism.

The global component of GLIPS captures high-level attributes like natural composition, lighting, and color. The local component assesses fine-grained details such as textures, edges, and small objects. By considering both global and local aspects, GLIPS can better align with human judgments of image quality compared to existing metrics that focus on only one or the other.

The authors conduct extensive experiments to validate GLIPS. They show that GLIPS correlates more strongly with human perceptual scores than other commonly used metrics like Similarity Metrics for Mixed-Reality Image-to-Image Translation and A Semantic Approach to Quantifying Consistency in Diffusion Models. GLIPS also demonstrates robustness to different image resolutions and types of AI-generated content.

Critical Analysis

The paper provides a thoughtful and well-designed approach to evaluating photorealistic quality of AI-generated images. The authors acknowledge that while GLIPS outperforms existing metrics, it still has limitations. For example, GLIPS may not capture more abstract or subjective aspects of image quality that humans consider.

Additionally, the paper does not explore how GLIPS might perform on emerging AI generation techniques, such as How to Evaluate Semantic Communications of Images with VITScore or LIPSim: A Provably Robust Perceptual Similarity Metric. Further research is needed to assess the generalizability of GLIPS across a wider range of AI-generated content.

Overall, the GLIPS metric represents a valuable contribution to the field of image quality assessment. By considering both global and local features, it provides a more holistic evaluation that can help advance the development of high-quality, Mixture of Low-rank Experts for Transferable AI-Generated imagery.

Conclusion

The Global-Local Image Perceptual Score (GLIPS) introduced in this paper offers a more comprehensive way to evaluate the photorealistic quality of AI-generated images. By combining global and local image features, GLIPS better aligns with human judgments of visual realism compared to existing metrics.

As the field of AI-generated imagery continues to progress, having reliable and versatile evaluation tools like GLIPS will be crucial for assessing the quality and fidelity of the generated content. This, in turn, can help drive further advancements in areas such as How to Evaluate Semantic Communications of Images with VITScore, LIPSim: A Provably Robust Perceptual Similarity Metric, and Mixture of Low-rank Experts for Transferable AI-Generated content.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images

Memoona Aziz, Umair Rehman, Muhammad Umair Danish, Katarina Grolinger

This paper introduces the Global-Local Image Perceptual Score (GLIPS), an image metric designed to assess the photorealistic image quality of AI-generated images with a high degree of alignment to human visual perception. Traditional metrics such as FID and KID scores do not align closely with human evaluations. The proposed metric incorporates advanced transformer-based attention mechanisms to assess local similarity and Maximum Mean Discrepancy (MMD) to evaluate global distributional similarity. To evaluate the performance of GLIPS, we conducted a human study on photorealistic image quality. Comprehensive tests across various generative models demonstrate that GLIPS consistently outperforms existing metrics like FID, SSIM, and MS-SSIM in terms of correlation with human scores. Additionally, we introduce the Interpolative Binning Scale (IBS), a refined scaling method that enhances the interpretability of metric scores by aligning them more closely with human evaluative standards. The proposed metric and scaling approach not only provides more reliable assessments of AI-generated images but also suggest pathways for future enhancements in image generation technologies.

5/17/2024

🌐

Visual Verity in AI-Generated Imagery: Computational Metrics and Human-Centric Analysis

Memoona Aziz, Umair Rehman, Syed Ali Safi, Amir Zaib Abbasi

The rapid advancements in AI technologies have revolutionized the production of graphical content across various sectors, including entertainment, advertising, and e-commerce. These developments have spurred the need for robust evaluation methods to assess the quality and realism of AI-generated images. To address this, we conducted three studies. First, we introduced and validated a questionnaire called Visual Verity, which measures photorealism, image quality, and text-image alignment. Second, we applied this questionnaire to assess images from AI models (DALL-E2, DALL-E3, GLIDE, Stable Diffusion) and camera-generated images, revealing that camera-generated images excelled in photorealism and text-image alignment, while AI models led in image quality. We also analyzed statistical properties, finding that camera-generated images scored lower in hue, saturation, and brightness. Third, we evaluated computational metrics' alignment with human judgments, identifying MS-SSIM and CLIP as the most consistent with human assessments. Additionally, we proposed the Neural Feature Similarity Score (NFSS) for assessing image quality. Our findings highlight the need for refining computational metrics to better capture human visual perception, thereby enhancing AI-generated content evaluation.

9/4/2024

🧪

Similarity Metrics for MR Image-To-Image Translation

Melanie Dohmen, Mark Klemens, Ivo Baltruschat, Tuan Truong, Matthias Lenga

Image-to-image translation can create large impact in medical imaging, for instance the possibility to synthetically transform images to other modalities, sequence types, higher resolutions or lower noise levels. In order to assure a high level of patient safety, these methods are mostly validated by human reader studies, which require a considerable amount of time and costs. Quantitative metrics have been used to complement such studies and to provide reproducible and objective assessment of synthetic images. Even though the SSIM and PSNR metrics are extensively used, they do not detect all types of errors in synthetic images as desired. Other metrics could provide additional useful evaluation. In this study, we give an overview and a quantitative analysis of 15 metrics for assessing the quality of synthetically generated images. We include 11 full-reference metrics (SSIM, MS-SSIM, CW-SSIM, PSNR, MSE, NMSE, MAE, LPIPS, DISTS, NMI and PCC), three non-reference metrics (BLUR, MLC, MSLC) and one downstream task segmentation metric (DICE) to detect 11 kinds of typical distortions and artifacts that occur in MR images. In addition, we analyze the influence of four prominent normalization methods (Minmax, cMinmax, Zscore and Quantile) on the different metrics and distortions. Finally, we provide adverse examples to highlight pitfalls in metric assessment and derive recommendations for effective usage of the analyzed similarity metrics for evaluation of image-to-image translation models.

6/19/2024

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

Yirui Chen, Xudong Huang, Quan Zhang, Wei Li, Mingjian Zhu, Qiangyu Yan, Simiao Li, Hanting Chen, Hailin Hu, Jie Yang, Wei Liu, Jie Hu

The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location(IMDL). However, the lack of a large-scale data foundation makes IMDL task unattainable. In this paper, a local manipulation pipeline is designed, incorporating the powerful SAM, ChatGPT and generative models. Upon this basis, We propose the GIM dataset, which has the following advantages: 1) Large scale, including over one million pairs of AI-manipulated images and real images. 2) Rich Image Content, encompassing a broad range of image classes 3) Diverse Generative Manipulation, manipulated images with state-of-the-art generators and various manipulation tasks. The aforementioned advantages allow for a more comprehensive evaluation of IMDL methods, extending their applicability to diverse images. We introduce two benchmark settings to evaluate the generalization capability and comprehensive performance of baseline methods. In addition, we propose a novel IMDL framework, termed GIMFormer, which consists of a ShadowTracer, Frequency-Spatial Block (FSB), and a Multi-window Anomalous Modelling (MWAM) Module. Extensive experiments on the GIM demonstrate that GIMFormer surpasses previous state-of-the-art works significantly on two different benchmarks.

6/26/2024