Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment

2405.04167

Published 5/8/2024 by Aobo Li, Jinjian Wu, Yongxu Liu, Leida Li

Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment

Abstract

The annotation of blind image quality assessment (BIQA) is labor-intensive and time-consuming, especially for authentic images. Training on synthetic data is expected to be beneficial, but synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that introducing more distortion types in the synthetic dataset may not improve or even be harmful to generalizing authentic image quality assessment. To solve this challenge, we propose distortion-guided unsupervised domain adaptation for BIQA (DGQA), a novel framework that leverages adaptive multi-domain selection via prior knowledge from distortion to match the data distribution between the source domains and the target domain, thereby reducing negative transfer from the outlier source domains. Extensive experiments on two cross-domain settings (synthetic distortion to authentic distortion and synthetic distortion to algorithmic distortion) have demonstrated the effectiveness of our proposed DGQA. Besides, DGQA is orthogonal to existing model-based BIQA methods, and can be used in combination with such models to improve performance with less training data.

Create account to get full access

Overview

This paper presents a novel unsupervised domain adaptation approach for blind image quality assessment (BIQA) models.
The key idea is to bridge the gap between synthetic distortion-based training data and authentic real-world images using distortion-guided unsupervised learning.
The proposed method outperforms state-of-the-art BIQA models on several benchmark datasets, demonstrating its effectiveness in improving model performance across different image domains.

Plain English Explanation

Image quality assessment (IQA) is the task of automatically evaluating the quality of digital images. Blind IQA (BIQA) models are particularly useful as they can assess image quality without requiring reference or original "high-quality" images for comparison.

However, training BIQA models is challenging because it's difficult to obtain high-quality labeled data covering the diverse range of real-world image distortions. To address this, researchers often rely on synthetically generated images with known distortion types and levels. But there is a significant "gap" between these synthetic images and authentic real-world photos, which can limit the performance of BIQA models in practical scenarios.

This paper proposes a new unsupervised domain adaptation approach to "bridge the gap" between synthetic and real-world images for BIQA. The key idea is to use the known distortion information from the synthetic training data to guide the model's learning process on unlabeled real-world images. This allows the model to better adapt to the characteristics of authentic photos, while still leveraging the valuable information provided by the synthetic training data.

The authors demonstrate that their distortion-guided unsupervised domain adaptation method outperforms state-of-the-art BIQA models on several benchmark datasets, suggesting it is an effective way to improve model performance across different image domains.

Technical Explanation

The paper introduces a novel unsupervised domain adaptation framework for BIQA, called Distortion-Guided Unsupervised Domain Adaptation (DG-UDA). The key idea is to leverage the known distortion information from synthetic training data to guide the model's learning process on unlabeled real-world images, bridging the "synthetic-to-authentic gap."

The proposed DG-UDA framework consists of three main components:

Distortion Embedding Module: This module learns a latent representation that encodes the distortion characteristics of synthetic training images.
Distortion-Guided Discriminator: This adversarial discriminator aims to align the feature distributions of synthetic and real-world images by exploiting the distortion information.
Distortion-Guided Reconstruction: The model is trained to reconstruct the distortion characteristics of real-world images, further guiding the adaptation process.

The authors evaluate their DG-UDA approach on several BIQA benchmark datasets, including LIVE, CSIQ, and TID2013. The results show that DG-UDA outperforms state-of-the-art BIQA models, demonstrating its effectiveness in bridging the synthetic-to-authentic gap and improving model performance on real-world images.

Critical Analysis

The proposed DG-UDA framework is a promising approach to address the challenge of limited labeled data for BIQA models. By leveraging the distortion information from synthetic training data, the method is able to better adapt to the characteristics of real-world images, leading to improved performance.

However, the paper does not discuss the potential limitations of the approach. For example, it's unclear how well the method would perform if the synthetic training data does not adequately cover the types of distortions present in the real-world images. Additionally, the authors do not explore the scalability of the framework to larger and more diverse datasets.

Further research could investigate the robustness of DG-UDA to different data distributions, as well as explore ways to integrate multi-modal information or leverage style adaptation techniques to enhance the domain adaptation capabilities. Alignment-invariant approaches could also be explored to improve the generalization of BIQA models.

Conclusion

This paper presents a novel unsupervised domain adaptation framework, called Distortion-Guided Unsupervised Domain Adaptation (DG-UDA), for improving the performance of blind image quality assessment (BIQA) models. By leveraging the known distortion information from synthetic training data, the proposed method is able to effectively bridge the gap between synthetic and real-world images, leading to significant improvements in BIQA performance across various benchmark datasets.

The key innovation of DG-UDA is its ability to guide the model's adaptation process using the distortion characteristics, which allows it to better capture the nuances of authentic real-world images. This approach demonstrates the potential of incorporating domain-specific knowledge to enhance the robustness and generalization of machine learning models, and could have broader implications for other computer vision tasks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild

Honghao Fu, Yufei Wang, Wenhan Yang, Bihan Wen

Image quality assessment (IQA) plays a critical role in selecting high-quality images and guiding compression and enhancement methods in a series of applications. The blind IQA, which assesses the quality of in-the-wild images containing complex authentic distortions without reference images, poses greater challenges. Existing methods are limited to modeling a uniform distribution with local patches and are bothered by the gap between low and high-level visions (caused by widely adopted pre-trained classification networks). In this paper, we propose a novel IQA method called diffusion priors-based IQA (DP-IQA), which leverages the prior knowledge from the pre-trained diffusion model with its excellent powers to bridge semantic gaps in the perception of the visual quality of images. Specifically, we use pre-trained stable diffusion as the backbone, extract multi-level features from the denoising U-Net during the upsampling process at a specified timestep, and decode them to estimate the image quality score. The text and image adapters are adopted to mitigate the domain gap for downstream tasks and correct the information loss caused by the variational autoencoder bottleneck. Finally, we distill the knowledge in the above model into a CNN-based student model, significantly reducing the parameter to enhance applicability, with the student model performing similarly or even better than the teacher model surprisingly. Experimental results demonstrate that our DP-IQA achieves state-of-the-art results on various in-the-wild datasets with better generalization capability, which shows the superiority of our method in global modeling and utilizing the hierarchical feature clues of diffusion for evaluating image quality.

6/4/2024

cs.CV cs.AI

🖼️

GAN Inversion for Image Editing via Unsupervised Domain Adaptation

Siyu Xing, Chen Gong, Hewei Guo, Xiao-Yu Zhang, Xinwen Hou, Yu Liu

Existing GAN inversion methods work brilliantly in reconstructing high-quality (HQ) images while struggling with more common low-quality (LQ) inputs in practical application. To address this issue, we propose Unsupervised Domain Adaptation (UDA) in the inversion process, namely UDA-inversion, for effective inversion and editing of both HQ and LQ images. Regarding unpaired HQ images as the source domain and LQ images as the unlabeled target domain, we introduce a theoretical guarantee: loss value in the target domain is upper-bounded by loss in the source domain and a novel discrepancy function measuring the difference between two domains. Following that, we can only minimize this upper bound to obtain accurate latent codes for HQ and LQ images. Thus, constructive representations of HQ images can be spontaneously learned and transformed into LQ images without supervision. UDA-Inversion achieves a better PSNR of 22.14 on FFHQ dataset and performs comparably to supervised methods.

5/31/2024

cs.CV

Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images

Yiran Luo, Joshua Feinglass, Tejas Gokhale, Kuan-Cheng Lee, Chitta Baral, Yezhou Yang

Domain Generalization (DG) is a challenging task in machine learning that requires a coherent ability to comprehend shifts across various domains through extraction of domain-invariant features. DG performance is typically evaluated by performing image classification in domains of various image styles. However, current methodology lacks quantitative understanding about shifts in stylistic domain, and relies on a vast amount of pre-training data, such as ImageNet1K, which are predominantly in photo-realistic style with weakly supervised class labels. Such a data-driven practice could potentially result in spurious correlation and inflated performance on DG benchmarks. In this paper, we introduce a new DG paradigm to address these risks. We first introduce two new quantitative measures ICV and IDD to describe domain shifts in terms of consistency of classes within one domain and similarity between two stylistic domains. We then present SuperMarioDomains (SMD), a novel synthetic multi-domain dataset sampled from video game scenes with more consistent classes and sufficient dissimilarity compared to ImageNet1K. We demonstrate our DG method SMOS. SMOS first uses SMD to train a precursor model, which is then used to ground the training on a DG benchmark. We observe that SMOS contributes to state-of-the-art performance across five DG benchmarks, gaining large improvements to performances on abstract domains along with on-par or slight improvements to those on photo-realistic domains. Our qualitative analysis suggests that these improvements can be attributed to reduced distributional divergence between originally distant domains. Our data are available at https://github.com/fpsluozi/SMD-SMOS .

5/28/2024

cs.CV

Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics

Zhangkai Ni, Yue Liu, Keyan Ding, Wenhan Yang, Hanli Wang, Shiqi Wang

Deep learning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-effective for training but face challenges in effectively extracting features aligned with human visual perception. To bridge these gaps, we propose integrating deep features from pre-trained visual models with a statistical analysis model into a Multi-scale Deep Feature Statistics (MDFS) model for achieving opinion-unaware BIQA (OU-BIQA), thereby eliminating the reliance on human rating data and significantly improving training efficiency. Specifically, we extract patch-wise multi-scale features from pre-trained vision models, which are subsequently fitted into a multivariate Gaussian (MVG) model. The final quality score is determined by quantifying the distance between the MVG model derived from the test image and the benchmark MVG model derived from the high-quality image set. A comprehensive series of experiments conducted on various datasets show that our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models. Furthermore, it shows improved generalizability across diverse target-specific BIQA tasks. Our code is available at: https://github.com/eezkni/MDFS

5/30/2024

cs.CV cs.MM eess.IV