Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

2405.08745

Published 5/15/2024 by Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai

eess.IV cs.CV cs.MM

Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

Abstract

In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features to help the BVQA model to handle complex distortions and diverse content of social media videos. Specifically, we use SimpleVQA, a BVQA model that consists of a trainable Swin Transformer-B and a fixed SlowFast, as our base model. The Swin Transformer-B and SlowFast components are responsible for extracting spatial and motion features, respectively. Then, we extract three kinds of features from Q-Align, LIQE, and FAST-VQA to capture frame-level quality-aware features, frame-level quality-aware along with scene-specific features, and spatiotemporal quality-aware features, respectively. Through concatenating these features, we employ a multi-layer perceptron (MLP) network to regress them into quality scores. Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets. Moreover, the proposed model won first place in the CVPR NTIRE 2024 Short-form UGC Video Quality Assessment Challenge. The code is available at url{https://github.com/sunwei925/RQ-VQA.git}.

Create account to get full access

Overview

The paper presents a new approach for enhancing blind video quality assessment (BVQA) by incorporating rich quality-aware features.
It introduces a novel Recurrent Memory Transformer (RMT) architecture that can effectively capture temporal and spatial dependencies in video quality degradation.
The proposed method outperforms existing state-of-the-art BVQA models on popular video quality datasets.

Plain English Explanation

The research paper discusses a new way to assess the quality of videos without having a reference video to compare against. This is called blind video quality assessment (BVQA). The key idea is to use "quality-aware" features that can provide more information about the factors affecting video quality, such as blurriness, compression artifacts, and temporal fluctuations.

The researchers developed a new model architecture called Recurrent Memory Transformer (RMT) that is able to effectively capture both the spatial (within each frame) and temporal (across frames) relationships in video quality degradation. This allows the model to better understand and predict the overall perceived quality of a video.

The RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment approach outperformed other state-of-the-art BVQA methods when tested on popular video quality benchmark datasets. This suggests the quality-aware features and RMT architecture are valuable contributions to improving blind video quality assessment.

Technical Explanation

The paper introduces a new BVQA model called RMT-BVQA that leverages rich quality-aware features to enhance performance. The model utilizes a Recurrent Memory Transformer (RMT) architecture, which combines a recurrent neural network (RNN) and a Transformer network to effectively capture both spatial and temporal dependencies in video quality degradation.

The quality-aware features used by RMT-BVQA include measures of blurriness, compression artifacts, and temporal variations, in addition to more traditional video features. These features are fed into the RMT network, which learns to map the video characteristics to an overall quality score.

The Analysis of Video Quality Datasets via Design of Minimalistic quality datasets are used to train and evaluate the RMT-BVQA model. Experiments show that it outperforms prior state-of-the-art BVQA methods, such as the Dual-Branch Network for Portrait Image Quality Assessment and the Study of the Effect of Sharpness on Blind Video Quality Assessment approaches, on multiple benchmark datasets.

Critical Analysis

The paper provides a compelling approach for enhancing BVQA by incorporating rich quality-aware features. The RMT architecture appears to be a valuable contribution, effectively capturing both spatial and temporal dependencies in video quality degradation.

However, the paper does not deeply explore the limitations of the proposed method. For example, it would be useful to understand how RMT-BVQA performs on videos with more complex quality degradations, such as those affected by multiple impairments simultaneously. Additionally, the Multi-Modal Prompt Learning for Blind Image Quality assessment suggests that incorporating additional modalities, such as audio, could further improve BVQA performance.

Overall, the research represents a strong step forward in blind video quality assessment, but there are still opportunities for further exploration and refinement of the techniques.

Conclusion

The paper presents a novel approach for enhancing blind video quality assessment (BVQA) by incorporating rich quality-aware features and a Recurrent Memory Transformer (RMT) architecture. The proposed RMT-BVQA model outperforms existing state-of-the-art BVQA methods on popular video quality datasets, demonstrating the value of the quality-aware features and the RMT design.

This research contributes to the ongoing efforts to improve the reliability and accuracy of BVQA, which has important applications in video streaming, video conferencing, and other multimedia systems. The incorporation of additional modalities and further exploration of complex quality degradations could lead to even more robust and versatile BVQA models in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤖

Multiview Contrastive Learning for Completely Blind Video Quality Assessment of User Generated Content

Shankhanil Mitra, Rajiv Soundararajan

Completely blind video quality assessment (VQA) refers to a class of quality assessment methods that do not use any reference videos, human opinion scores or training videos from the target database to learn a quality model. The design of this class of methods is particularly important since it can allow for superior generalization in performance across various datasets. We consider the design of completely blind VQA for user generated content. While several deep feature extraction methods have been considered in supervised and weakly supervised settings, such approaches have not been studied in the context of completely blind VQA. We bridge this gap by presenting a self-supervised multiview contrastive learning framework to learn spatio-temporal quality representations. In particular, we capture the common information between frame differences and frames by treating them as a pair of views and similarly obtain the shared representations between frame differences and optical flow. The resulting features are then compared with a corpus of pristine natural video patches to predict the quality of the distorted video. Detailed experiments on multiple camera captured VQA datasets reveal the superior performance of our method over other features when evaluated without training on human scores.

6/25/2024

eess.IV

🗣️

RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video Content

Tianhao Peng, Chen Feng, Duolikun Danier, Fan Zhang, David Bull

With recent advances in deep learning, numerous algorithms have been developed to enhance video quality, reduce visual artefacts and improve perceptual quality. However, little research has been reported on the quality assessment of enhanced content - the evaluation of enhancement methods is often based on quality metrics that were designed for compression applications. In this paper, we propose a novel blind deep video quality assessment (VQA) method specifically for enhanced video content. It employs a new Recurrent Memory Transformer (RMT) based network architecture to obtain video quality representations, which is optimised through a novel content-quality-aware contrastive learning strategy based on a new database containing 13K training patches with enhanced content. The extracted quality representations are then combined through linear regression to generate video-level quality indices. The proposed method, RMT-BVQA, has been evaluated on the VDPVE (VQA Dataset for Perceptual Video Enhancement) database through a five-fold cross validation. The results show its superior correlation performance when compared to ten existing no-reference quality metrics.

5/16/2024

eess.IV cs.CV

🔄

Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models

Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, Kede Ma

Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by contrasting our model generalizability on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.

4/4/2024

cs.CV cs.MM eess.IV

Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics

Zhangkai Ni, Yue Liu, Keyan Ding, Wenhan Yang, Hanli Wang, Shiqi Wang

Deep learning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-effective for training but face challenges in effectively extracting features aligned with human visual perception. To bridge these gaps, we propose integrating deep features from pre-trained visual models with a statistical analysis model into a Multi-scale Deep Feature Statistics (MDFS) model for achieving opinion-unaware BIQA (OU-BIQA), thereby eliminating the reliance on human rating data and significantly improving training efficiency. Specifically, we extract patch-wise multi-scale features from pre-trained vision models, which are subsequently fitted into a multivariate Gaussian (MVG) model. The final quality score is determined by quantifying the distance between the MVG model derived from the test image and the benchmark MVG model derived from the high-quality image set. A comprehensive series of experiments conducted on various datasets show that our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models. Furthermore, it shows improved generalizability across diverse target-specific BIQA tasks. Our code is available at: https://github.com/eezkni/MDFS

5/30/2024

cs.CV cs.MM eess.IV