Study of the effect of Sharpness on Blind Video Quality Assessment

2404.05764

Published 4/10/2024 by Anantha Prabhu, David Pratap, Narayana Darapeni, Anwesh P R

🏋️

Abstract

Introduction: Video Quality Assessment (VQA) is one of the important areas of study in this modern era, where video is a crucial component of communication with applications in every field. Rapid technology developments in mobile technology enabled anyone to create videos resulting in a varied range of video quality scenarios. Objectives: Though VQA was present for some time with the classical metrices like SSIM and PSNR, the advent of machine learning has brought in new techniques of VQAs which are built upon Convolutional Neural Networks (CNNs) or Deep Neural Networks (DNNs). Methods: Over the past years various research studies such as the BVQA which performed video quality assessment of nature-based videos using DNNs exposed the powerful capabilities of machine learning algorithms. BVQA using DNNs explored human visual system effects such as content dependency and time-related factors normally known as temporal effects. Results: This study explores the sharpness effect on models like BVQA. Sharpness is the measure of the clarity and details of the video image. Sharpness typically involves analyzing the edges and contrast of the image to determine the overall level of detail and sharpness. Conclusion: This study uses the existing video quality databases such as CVD2014. A comparative study of the various machine learning parameters such as SRCC and PLCC during the training and testing are presented along with the conclusion.

Create account to get full access

Overview

Video Quality Assessment (VQA) is an important field of study as video has become a crucial part of modern communication and technology.
The rise of mobile devices has led to a wide range of video quality scenarios, prompting the need for improved VQA techniques.
Classical VQA metrics like SSIM and PSNR have been enhanced by machine learning approaches using Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs).
Studies like BVQA have explored how machine learning can capture human visual system effects in VQA.

Plain English Explanation

Video has become an essential part of how we communicate and share information today, from social media to professional applications. As technology has advanced, more people can easily create and share videos, leading to a wide variety of video quality scenarios.

Classical methods for assessing video quality, such as SSIM and PSNR, have been around for some time. However, the rise of machine learning has enabled new VQA techniques that can better capture the nuances of how humans perceive video quality.

Techniques like BVQA, which uses deep neural networks, have shown how machine learning can account for factors like content dependency and temporal effects - in other words, how the video's subject matter and changes over time impact our perception of its quality.

This particular study explores how the "sharpness" of a video - the clarity and detail of the image - affects video quality assessment models like BVQA. Sharpness is an important factor in perceived video quality, as it reflects how well the video can capture fine details and edges.

Technical Explanation

This study leverages existing video quality databases, such as CVD2014, to explore the impact of sharpness on VQA models. The researchers conduct a comparative analysis of various machine learning parameters, including Spearman Rank Correlation Coefficient (SRCC) and Pearson Linear Correlation Coefficient (PLCC), during both the training and testing phases of the VQA models.

The findings from this study build upon previous research in the field, such as Deep Feature Statistics Mapping and Exploring Quantum-Enhanced Machine Learning for Computer Vision, which have explored various approaches to enhancing video quality assessment using advanced machine learning techniques.

Critical Analysis

The paper provides a comprehensive analysis of the impact of sharpness on VQA models, but it does not address potential limitations or areas for further research. For example, the study only examines a single video quality database, and it would be valuable to see how the findings hold up across a broader range of datasets.

Additionally, while the paper demonstrates the effectiveness of machine learning-based VQA approaches, it does not delve into the computational complexity or real-world deployment challenges of these models. Further research could explore ways to optimize the performance and efficiency of such VQA systems, especially for resource-constrained environments like mobile devices.

Conclusion

This study contributes to the ongoing research in video quality assessment by highlighting the importance of sharpness as a key factor in the performance of machine learning-based VQA models. The findings suggest that accurately capturing and accounting for sharpness is crucial for developing robust and reliable video quality assessment systems, which have applications across various industries and domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤖

Multiview Contrastive Learning for Completely Blind Video Quality Assessment of User Generated Content

Shankhanil Mitra, Rajiv Soundararajan

Completely blind video quality assessment (VQA) refers to a class of quality assessment methods that do not use any reference videos, human opinion scores or training videos from the target database to learn a quality model. The design of this class of methods is particularly important since it can allow for superior generalization in performance across various datasets. We consider the design of completely blind VQA for user generated content. While several deep feature extraction methods have been considered in supervised and weakly supervised settings, such approaches have not been studied in the context of completely blind VQA. We bridge this gap by presenting a self-supervised multiview contrastive learning framework to learn spatio-temporal quality representations. In particular, we capture the common information between frame differences and frames by treating them as a pair of views and similarly obtain the shared representations between frame differences and optical flow. The resulting features are then compared with a corpus of pristine natural video patches to predict the quality of the distorted video. Detailed experiments on multiple camera captured VQA datasets reveal the superior performance of our method over other features when evaluated without training on human scores.

6/25/2024

eess.IV

🔄

Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models

Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, Kede Ma

Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by contrasting our model generalizability on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.

4/4/2024

cs.CV cs.MM eess.IV

Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai

In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features to help the BVQA model to handle complex distortions and diverse content of social media videos. Specifically, we use SimpleVQA, a BVQA model that consists of a trainable Swin Transformer-B and a fixed SlowFast, as our base model. The Swin Transformer-B and SlowFast components are responsible for extracting spatial and motion features, respectively. Then, we extract three kinds of features from Q-Align, LIQE, and FAST-VQA to capture frame-level quality-aware features, frame-level quality-aware along with scene-specific features, and spatiotemporal quality-aware features, respectively. Through concatenating these features, we employ a multi-layer perceptron (MLP) network to regress them into quality scores. Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets. Moreover, the proposed model won first place in the CVPR NTIRE 2024 Short-form UGC Video Quality Assessment Challenge. The code is available at url{https://github.com/sunwei925/RQ-VQA.git}.

5/15/2024

eess.IV cs.CV cs.MM

🗣️

RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video Content

Tianhao Peng, Chen Feng, Duolikun Danier, Fan Zhang, David Bull

With recent advances in deep learning, numerous algorithms have been developed to enhance video quality, reduce visual artefacts and improve perceptual quality. However, little research has been reported on the quality assessment of enhanced content - the evaluation of enhancement methods is often based on quality metrics that were designed for compression applications. In this paper, we propose a novel blind deep video quality assessment (VQA) method specifically for enhanced video content. It employs a new Recurrent Memory Transformer (RMT) based network architecture to obtain video quality representations, which is optimised through a novel content-quality-aware contrastive learning strategy based on a new database containing 13K training patches with enhanced content. The extracted quality representations are then combined through linear regression to generate video-level quality indices. The proposed method, RMT-BVQA, has been evaluated on the VDPVE (VQA Dataset for Perceptual Video Enhancement) database through a five-fold cross validation. The results show its superior correlation performance when compared to ten existing no-reference quality metrics.

5/16/2024

eess.IV cs.CV