ColorVideoVDP: A visual difference predictor for image, video and display distortions

Read original: arXiv:2401.11485 - Published 7/4/2024 by Rafal K. Mantiuk, Param Hanji, Maliha Ashraf, Yuta Asano, Alexandre Chapiro

ColorVideoVDP: A visual difference predictor for image, video and display distortions

Overview

The paper introduces ColorVideoVDP, a visual difference predictor for evaluating image, video, and display distortions.
It combines a color-based visual difference model with a temporal extension to handle video content.
The model is designed to accurately predict human perception of visual differences, with applications in areas like display quality assessment and image/video compression.

Plain English Explanation

The paper describes a new tool called ColorVideoVDP that can analyze images and videos to predict how humans will perceive visual differences or distortions. This is useful for evaluating the quality of displays, compressed video, and other applications where it's important to understand how people will respond to visual changes.

The key innovation in ColorVideoVDP is that it combines a color-based model of visual perception with a way to account for changes over time in video. This allows it to more accurately capture how people see and process visual information, compared to previous tools that were limited to only images or lacked color awareness.

By having a better understanding of human visual perception, researchers and developers can design displays, compression algorithms, and other technologies that provide a more natural and appealing visual experience. ColorVideoVDP gives them a way to test and optimize these systems before releasing them to the public.

Technical Explanation

The ColorVideoVDP model builds on previous work on visual difference predictors (VDPs) for images. It extends these by incorporating a color-based visual difference model and adding temporal processing to handle video content.

The color-based component uses a color space transformation to account for how humans perceive color differences, going beyond basic luminance-based models. The temporal extension applies a spatio-temporal filter to incorporate changes over time, rather than just analyzing individual frames.

Experiments were conducted to validate the model's performance against human judgments of visual differences. The results show that ColorVideoVDP is better able to predict perceptual similarity compared to previous image-only or luminance-based VDPs. It achieves state-of-the-art performance on benchmark datasets for both images and videos.

The model has applications in areas like display quality assessment, where it can help evaluate how noticeable various visual distortions will be to end users. It can also be used to guide the development of image and video compression algorithms to maintain perceptual quality.

Critical Analysis

The paper provides a thorough validation of the ColorVideoVDP model, demonstrating its advantages over prior VDP approaches. However, the authors acknowledge some limitations:

The model relies on specific color space transformations and spatio-temporal filtering, which may not capture all aspects of human visual processing.
It was primarily evaluated on synthetic distortions, and its performance on real-world artifacts is not as well-established.
The computational complexity of the model could limit its practical application in some real-time systems.

Additionally, while the experiments cover a range of image and video distortions, there may be other types of visual changes or artifacts not represented in the test data. Further research could explore the model's generalization to a wider variety of visual content and distortions.

Overall, ColorVideoVDP represents an important advancement in visual difference prediction, but as with any model, there is room for continued refinement and validation to ensure its effectiveness across diverse applications.

Conclusion

The ColorVideoVDP model introduced in this paper provides a more sophisticated way to predict how humans will perceive visual differences in images and videos. By incorporating color-based visual processing and temporal factors, it achieves better alignment with human judgments compared to previous approaches.

This advance has significant implications for industries and researchers focused on display quality, image/video compression, and other areas where understanding human visual perception is crucial. ColorVideoVDP offers a valuable tool for evaluating and optimizing these systems to provide a more natural and appealing visual experience.

While the model has some limitations, the thorough validation and state-of-the-art performance demonstrate its potential to drive further progress in visual quality assessment and related fields. Continued research and real-world application of ColorVideoVDP can lead to improved display technologies, more efficient media compression, and other innovations that enhance the way we consume and interact with visual content.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

ColorVideoVDP: A visual difference predictor for image, video and display distortions

Rafal K. Mantiuk, Param Hanji, Maliha Ashraf, Yuta Asano, Alexandre Chapiro

ColorVideoVDP is a video and image quality metric that models spatial and temporal aspects of vision, for both luminance and color. The metric is built on novel psychophysical models of chromatic spatiotemporal contrast sensitivity and cross-channel contrast masking. It accounts for the viewing conditions, geometric, and photometric characteristics of the display. It was trained to predict common video streaming distortions (e.g. video compression, rescaling, and transmission errors), and also 8 new distortion types related to AR/VR displays (e.g. light source and waveguide non-uniformities). To address the latter application, we collected our novel XR-Display-Artifact-Video quality dataset (XR-DAVID), comprised of 336 distorted videos. Extensive testing on XR-DAVID, as well as several datasets from the literature, indicate a significant gain in prediction performance compared to existing metrics. ColorVideoVDP opens the doors to many novel applications which require the joint automated spatiotemporal assessment of luminance and color distortions, including video streaming, display specification and design, visual comparison of results, and perceptually-guided quality optimization.

7/4/2024

On the Content Bias in Fr'echet Video Distance

Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun-Yan Zhu, Jia-Bin Huang

Fr'echet Video Distance (FVD), a prominent metric for evaluating video generation models, is known to conflict with human perception occasionally. In this paper, we aim to explore the extent of FVD's bias toward per-frame quality over temporal realism and identify its sources. We first quantify the FVD's sensitivity to the temporal axis by decoupling the frame and motion quality and find that the FVD increases only slightly with large temporal corruption. We then analyze the generated videos and show that via careful sampling from a large set of generated videos that do not contain motions, one can drastically decrease FVD without improving the temporal quality. Both studies suggest FVD's bias towards the quality of individual frames. We further observe that the bias can be attributed to the features extracted from a supervised video classifier trained on the content-biased dataset. We show that FVD with features extracted from the recent large-scale self-supervised video models is less biased toward image quality. Finally, we revisit a few real-world examples to validate our hypothesis.

4/19/2024

🧠

LatentColorization: Latent Diffusion-Based Speaker Video Colorization

Rory Ward, Dan Bigioi, Shubhajit Basak, John G. Breslin, Peter Corcoran

While current research predominantly focuses on image-based colorization, the domain of video-based colorization remains relatively unexplored. Most existing video colorization techniques operate on a frame-by-frame basis, often overlooking the critical aspect of temporal coherence between successive frames. This approach can result in inconsistencies across frames, leading to undesirable effects like flickering or abrupt color transitions between frames. To address these challenges, we harness the generative capabilities of a fine-tuned latent diffusion model designed specifically for video colorization, introducing a novel solution for achieving temporal consistency in video colorization, as well as demonstrating strong improvements on established image quality metrics compared to other existing methods. Furthermore, we perform a subjective study, where users preferred our approach to the existing state of the art. Our dataset encompasses a combination of conventional datasets and videos from television/movies. In short, by leveraging the power of a fine-tuned latent diffusion-based colorization system with a temporal consistency mechanism, we can improve the performance of automatic video colorization by addressing the challenges of temporal inconsistency. A short demonstration of our results can be seen in some example videos available at https://youtu.be/vDbzsZdFuxM.

5/10/2024

Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures

Jiaqi He, Zhihua Wang, Leon Wang, Tsein-I Liu, Yuming Fang, Qilin Sun, Kede Ma

Contemporary color difference (CD) measures for photographic images typically operate by comparing co-located pixels, patches in a ``perceptually uniform'' color space, or features in a learned latent space. Consequently, these measures inadequately capture the human color perception of misaligned image pairs, which are prevalent in digital photography (e.g., the same scene captured by different smartphones). In this paper, we describe a perceptual CD measure based on the multiscale sliced Wasserstein distance, which facilitates efficient comparisons between non-local patches of similar color and structure. This aligns with the modern understanding of color perception, where color and structure are inextricably interdependent as a unitary process of perceptual organization. Meanwhile, our method is easy to implement and training-free. Experimental results indicate that our CD measure performs favorably in assessing CDs in photographic images, and consistently surpasses competing models in the presence of image misalignment. Additionally, we empirically verify that our measure functions as a metric in the mathematical sense, and show its promise as a loss function for image and video color transfer tasks. The code is available at https://github.com/real-hjq/MS-SWD.

7/16/2024