Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models

2307.13981

Published 4/4/2024 by Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, Kede Ma

🔄

Abstract

Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by contrasting our model generalizability on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.

Create account to get full access

Overview

This paper examines the state of video quality assessment (VQA) datasets, which are used to evaluate and improve video quality assessment models.
The authors develop minimalistic VQA models to computationally analyze eight existing VQA datasets with realistic distortions.
They find that many of these datasets suffer from the "easy dataset problem," where simple models can perform well, and some even allow blind image quality assessment (BIQA) solutions to work.
The findings cast doubt on the current progress in blind video quality assessment (BVQA) and provide guidance for constructing better VQA datasets and models.

Plain English Explanation

Evaluating the quality of videos is crucial for companies that deliver video content to users, as it allows them to monitor and improve the viewer's experience. This process is known as video quality assessment (VQA). Researchers have developed VQA models to automate this evaluation, and they test these models on human-rated VQA datasets.

However, the authors of this paper argue that we need a better understanding of the existing VQA datasets in order to properly assess the progress in VQA. They develop simple, minimalistic VQA models and use them to analyze eight VQA datasets with realistic distortions, such as blurriness or compression artifacts.

The authors find that many of these datasets have the "easy dataset problem," meaning that even basic VQA models can perform well on them. Some datasets are so easy that blind image quality assessment (BIQA) solutions, which only look at individual frames and not the entire video, can work well.

These findings suggest that the current progress in blind video quality assessment (BVQA), which aims to assess video quality without relying on reference videos, may not be as significant as it appears. The authors provide guidance on how to construct better VQA datasets and models to drive more meaningful progress in the field.

Technical Explanation

The authors develop minimalistic BVQA models, which consist of a video preprocessor, a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor. These models are "minimalistic" in the sense that they use the simplest possible instantiations of these components.

By comparing the quality prediction performance of different model variants on eight VQA datasets, the authors find that nearly all of the datasets suffer from the "easy dataset problem" to varying degrees. This means that even these simple BVQA models can achieve good performance on the datasets, suggesting that the datasets may not be challenging enough to drive meaningful progress in BVQA.

Furthermore, the authors show that some of the datasets even allow blind image quality assessment (BIQA) solutions, which only consider individual frames and not the temporal aspects of the video, to perform well. This further highlights the lack of difficulty in these datasets.

The authors justify their claims by contrasting the generalizability of their models across the VQA datasets and by ablating a wide range of BVQA design choices related to the basic building blocks of their models.

Critical Analysis

The authors' findings raise important concerns about the current state of VQA datasets and the progress in BVQA. By demonstrating that many datasets suffer from the "easy dataset problem," the paper calls into question the reliability of the benchmarks used to evaluate BVQA models.

However, the authors acknowledge that their study is limited to a specific set of minimalistic BVQA models. It is possible that more sophisticated BVQA models could still benefit from and drive progress on these datasets. Additionally, the authors do not provide a comprehensive solution for constructing better VQA datasets, leaving room for further research in this area.

The paper also does not address the potential biases or limitations of the human-rated VQA datasets themselves, which could also contribute to the "easy dataset problem." Exploring the sources of these biases and how to create more diverse and challenging datasets could be an interesting area for future work.

Conclusion

This paper provides a critical examination of the existing VQA datasets and their suitability for evaluating BVQA models. The authors' use of minimalistic BVQA models to analyze these datasets reveals that many suffer from the "easy dataset problem," where even simple models can perform well.

These findings cast doubt on the current progress in BVQA and highlight the need for more rigorous and challenging VQA datasets. By guiding the construction of next-generation VQA datasets and models, this research has the potential to drive more meaningful advancements in the field of video quality assessment.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai

In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features to help the BVQA model to handle complex distortions and diverse content of social media videos. Specifically, we use SimpleVQA, a BVQA model that consists of a trainable Swin Transformer-B and a fixed SlowFast, as our base model. The Swin Transformer-B and SlowFast components are responsible for extracting spatial and motion features, respectively. Then, we extract three kinds of features from Q-Align, LIQE, and FAST-VQA to capture frame-level quality-aware features, frame-level quality-aware along with scene-specific features, and spatiotemporal quality-aware features, respectively. Through concatenating these features, we employ a multi-layer perceptron (MLP) network to regress them into quality scores. Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets. Moreover, the proposed model won first place in the CVPR NTIRE 2024 Short-form UGC Video Quality Assessment Challenge. The code is available at url{https://github.com/sunwei925/RQ-VQA.git}.

5/15/2024

eess.IV cs.CV cs.MM

🤖

Multiview Contrastive Learning for Completely Blind Video Quality Assessment of User Generated Content

Shankhanil Mitra, Rajiv Soundararajan

Completely blind video quality assessment (VQA) refers to a class of quality assessment methods that do not use any reference videos, human opinion scores or training videos from the target database to learn a quality model. The design of this class of methods is particularly important since it can allow for superior generalization in performance across various datasets. We consider the design of completely blind VQA for user generated content. While several deep feature extraction methods have been considered in supervised and weakly supervised settings, such approaches have not been studied in the context of completely blind VQA. We bridge this gap by presenting a self-supervised multiview contrastive learning framework to learn spatio-temporal quality representations. In particular, we capture the common information between frame differences and frames by treating them as a pair of views and similarly obtain the shared representations between frame differences and optical flow. The resulting features are then compared with a corpus of pristine natural video patches to predict the quality of the distorted video. Detailed experiments on multiple camera captured VQA datasets reveal the superior performance of our method over other features when evaluated without training on human scores.

6/25/2024

eess.IV

🏋️

Study of the effect of Sharpness on Blind Video Quality Assessment

Anantha Prabhu, David Pratap, Narayana Darapeni, Anwesh P R

Introduction: Video Quality Assessment (VQA) is one of the important areas of study in this modern era, where video is a crucial component of communication with applications in every field. Rapid technology developments in mobile technology enabled anyone to create videos resulting in a varied range of video quality scenarios. Objectives: Though VQA was present for some time with the classical metrices like SSIM and PSNR, the advent of machine learning has brought in new techniques of VQAs which are built upon Convolutional Neural Networks (CNNs) or Deep Neural Networks (DNNs). Methods: Over the past years various research studies such as the BVQA which performed video quality assessment of nature-based videos using DNNs exposed the powerful capabilities of machine learning algorithms. BVQA using DNNs explored human visual system effects such as content dependency and time-related factors normally known as temporal effects. Results: This study explores the sharpness effect on models like BVQA. Sharpness is the measure of the clarity and details of the video image. Sharpness typically involves analyzing the edges and contrast of the image to determine the overall level of detail and sharpness. Conclusion: This study uses the existing video quality databases such as CVD2014. A comparative study of the various machine learning parameters such as SRCC and PLCC during the training and testing are presented along with the conclusion.

4/10/2024

eess.IV cs.CV

PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild

Kun Yuan, Hongbo Liu, Mading Li, Muyi Sun, Ming Sun, Jiachao Gong, Jinhua Hao, Chao Zhou, Yansong Tang

Video quality assessment (VQA) is a challenging problem due to the numerous factors that can affect the perceptual quality of a video, eg, content attractiveness, distortion type, motion pattern, and level. However, annotating the Mean opinion score (MOS) for videos is expensive and time-consuming, which limits the scale of VQA datasets, and poses a significant obstacle for deep learning-based methods. In this paper, we propose a VQA method named PTM-VQA, which leverages PreTrained Models to transfer knowledge from models pretrained on various pre-tasks, enabling benefits for VQA from different aspects. Specifically, we extract features of videos from different pretrained models with frozen weights and integrate them to generate representation. Since these models possess various fields of knowledge and are often trained with labels irrelevant to quality, we propose an Intra-Consistency and Inter-Divisibility (ICID) loss to impose constraints on features extracted by multiple pretrained models. The intra-consistency constraint ensures that features extracted by different pretrained models are in the same unified quality-aware latent space, while the inter-divisibility introduces pseudo clusters based on the annotation of samples and tries to separate features of samples from different clusters. Furthermore, with a constantly growing number of pretrained models, it is crucial to determine which models to use and how to use them. To address this problem, we propose an efficient scheme to select suitable candidates. Models with better clustering performance on VQA datasets are chosen to be our candidates. Extensive experiments demonstrate the effectiveness of the proposed method.

5/29/2024

cs.CV