Deep Weakly-Supervised Domain Adaptation for Pain Localization in Videos

Read original: arXiv:1910.08173 - Published 7/9/2024 by R. Gnana Praveen, Eric Granger, Patrick Cardinal

🤿

Overview

Automatic pain assessment can be valuable for populations unable to communicate their pain experiences.
Facial expressions have been widely studied as a way to estimate pain intensity, but using state-of-the-art deep learning models poses challenges.
This paper proposes a weakly-supervised domain adaptation (WSDA) technique to train 3D convolutional neural networks (CNNs) for spatio-temporal pain intensity estimation using weakly labeled videos.

Plain English Explanation

Automatically assessing pain can be very helpful for people who can't easily describe their pain, like young children or those with communication difficulties. Researchers have focused a lot on using facial expressions as a way to estimate how much pain someone is experiencing. However, using the latest deep learning models to do this in real-world situations is tricky.

The main problems are that people's facial expressions can vary a lot, the video recording conditions may not be perfect, and getting enough labeled training data is costly. To address this, the researchers developed a new weakly-supervised domain adaptation (WSDA) technique. This allows them to train 3D convolutional neural networks to estimate pain intensity levels using videos that only have partial, periodic labeling instead of full labeling for every frame.

The WSDA approach combines multiple instance learning with adversarial deep domain adaptation to train the 3D CNN model. This helps it accurately estimate pain levels even when the training data and real-world conditions are different. The results show this WSDA method outperforms other state-of-the-art approaches for both sequence-level and frame-level pain localization.

Technical Explanation

The paper proposes a weakly-supervised domain adaptation (WSDA) technique to train 3D convolutional neural networks (CNNs) for spatio-temporal pain intensity estimation. This addresses challenges in using state-of-the-art deep learning models for real-world pain assessment, such as subjective variations in facial expressions, imperfect recording conditions, and lack of fully labeled training data.

The WSDA approach integrates multiple instance learning into an adversarial deep domain adaptation framework to train an Inflated 3D-CNN (I3D) model. This allows the model to be trained on weakly labeled videos, where labels are only provided periodically rather than for every frame.

The training process optimizes the I3D model using a weak target loss, along with domain loss and source loss to adapt it to the target operational domain. Experiments were conducted using labeled source domain RECOLA videos and weakly-labeled target domain UNBC-McMaster videos. The results show the proposed WSDA approach achieves significantly higher sequence-level and frame-level pain localization accuracy compared to related state-of-the-art methods.

Critical Analysis

The paper presents a novel and promising approach to address the challenges of using deep learning for real-world pain assessment. The WSDA technique's ability to leverage weakly labeled data is a key strength, as acquiring fully annotated video data can be very costly and time-consuming.

However, the paper does not explore the performance limits or potential failure modes of the WSDA approach. For example, it is unclear how the method would scale to more diverse or noisier target domain data, or how robust it would be to variations in the quality or frequency of the weak labels.

Additionally, while the results show improvements over other state-of-the-art methods, the absolute performance levels are not reported. This makes it difficult to assess the practical viability of the approach for real-world deployment.

Further research could investigate the trade-offs between the degree of weak supervision and model performance, as well as strategies for source-free domain adaptation to reduce reliance on labeled source data. Evaluating the WSDA technique on a wider range of real-world pain assessment datasets would also help validate its broader applicability.

Conclusion

This paper presents a novel weakly-supervised domain adaptation (WSDA) technique for training 3D convolutional neural networks to estimate pain intensity from facial expressions. By integrating multiple instance learning and adversarial deep domain adaptation, the WSDA approach can effectively leverage weakly labeled training data to achieve strong performance on pain localization tasks.

The ability to work with partially annotated videos is a significant advantage, as it reduces the cost and effort required to obtain training data. If the WSDA technique can be further refined and validated across diverse real-world scenarios, it could lead to more accessible and practical automatic pain assessment systems, with important applications for populations unable to easily communicate their pain experiences.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤿

Deep Weakly-Supervised Domain Adaptation for Pain Localization in Videos

R. Gnana Praveen, Eric Granger, Patrick Cardinal

Automatic pain assessment has an important potential diagnostic value for populations that are incapable of articulating their pain experiences. As one of the dominating nonverbal channels for eliciting pain expression events, facial expressions has been widely investigated for estimating the pain intensity of individual. However, using state-of-the-art deep learning (DL) models in real-world pain estimation applications poses several challenges related to the subjective variations of facial expressions, operational capture conditions, and lack of representative training videos with labels. Given the cost of annotating intensity levels for every video frame, we propose a weakly-supervised domain adaptation (WSDA) technique that allows for training 3D CNNs for spatio-temporal pain intensity estimation using weakly labeled videos, where labels are provided on a periodic basis. In particular, WSDA integrates multiple instance learning into an adversarial deep domain adaptation framework to train an Inflated 3D-CNN (I3D) model such that it can accurately estimate pain intensities in the target operational domain. The training process relies on weak target loss, along with domain loss and source loss for domain adaptation of the I3D model. Experimental results obtained using labeled source domain RECOLA videos and weakly-labeled target domain UNBC-McMaster videos indicate that the proposed deep WSDA approach can achieve significantly higher level of sequence (bag)-level and frame (instance)-level pain localization accuracy than related state-of-the-art approaches.

7/9/2024

🤿

Deep Domain Adaptation for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labelled Videos

R. Gnana Praveen, Eric Granger, Patrick Cardinal

Estimation of pain intensity from facial expressions captured in videos has an immense potential for health care applications. Given the challenges related to subjective variations of facial expressions, and operational capture conditions, the accuracy of state-of-the-art DL models for recognizing facial expressions may decline. Domain adaptation has been widely explored to alleviate the problem of domain shifts that typically occur between video data captured across various source and target domains. Moreover, given the laborious task of collecting and annotating videos, and subjective bias due to ambiguity among adjacent intensity levels, weakly-supervised learning is gaining attention in such applications. State-of-the-art WSL models are typically formulated as regression problems, and do not leverage the ordinal relationship among pain intensity levels, nor temporal coherence of multiple consecutive frames. This paper introduces a new DL model for weakly-supervised DA with ordinal regression that can be adapted using target domain videos with coarse labels provided on a periodic basis. The WSDA-OR model enforces ordinal relationships among intensity levels assigned to target sequences, and associates multiple relevant frames to sequence-level labels. In particular, it learns discriminant and domain-invariant feature representations by integrating multiple instance learning with deep adversarial DA, where soft Gaussian labels are used to efficiently represent the weak ordinal sequence-level labels from target domain. The proposed approach was validated using RECOLA video dataset as fully-labeled source domain data, and UNBC-McMaster shoulder pain video dataset as weakly-labeled target domain data. We have also validated WSDA-OR on BIOVID and Fatigue datasets for sequence level estimation.

7/9/2024

👁️

Subject-Based Domain Adaptation for Facial Expression Recognition

Muhammad Osama Zeeshan, Muhammad Haseeb Aslam, Soufiane Belharbi, Alessandro Lameiras Koerich, Marco Pedersoli, Simon Bacon, Eric Granger

Adapting a deep learning model to a specific target individual is a challenging facial expression recognition (FER) task that may be achieved using unsupervised domain adaptation (UDA) methods. Although several UDA methods have been proposed to adapt deep FER models across source and target data sets, multiple subject-specific source domains are needed to accurately represent the intra- and inter-person variability in subject-based adaption. This paper considers the setting where domains correspond to individuals, not entire datasets. Unlike UDA, multi-source domain adaptation (MSDA) methods can leverage multiple source datasets to improve the accuracy and robustness of the target model. However, previous methods for MSDA adapt image classification models across datasets and do not scale well to a more significant number of source domains. This paper introduces a new MSDA method for subject-based domain adaptation in FER. It efficiently leverages information from multiple source subjects (labeled source domain data) to adapt a deep FER model to a single target individual (unlabeled target domain data). During adaptation, our subject-based MSDA first computes a between-source discrepancy loss to mitigate the domain shift among data from several source subjects. Then, a new strategy is employed to generate augmented confident pseudo-labels for the target subject, allowing a reduction in the domain shift between source and target subjects. Experiments performed on the challenging BioVid heat and pain dataset with 87 subjects and the UNBC-McMaster shoulder pain dataset with 25 subjects show that our subject-based MSDA can outperform state-of-the-art methods yet scale well to multiple subject-based source domains.

4/30/2024

🤷

Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey

Yuecong Xu, Haozhi Cao, Zhenghua Chen, Xiaoli Li, Lihua Xie, Jianfei Yang

Video analysis tasks such as action recognition have received increasing research interest with growing applications in fields such as smart healthcare, thanks to the introduction of large-scale datasets and deep learning-based representations. However, video models trained on existing datasets suffer from significant performance degradation when deployed directly to real-world applications due to domain shifts between the training public video datasets (source video domains) and real-world videos (target video domains). Further, with the high cost of video annotation, it is more practical to use unlabeled videos for training. To tackle performance degradation and address concerns in high video annotation cost uniformly, the video unsupervised domain adaptation (VUDA) is introduced to adapt video models from the labeled source domain to the unlabeled target domain by alleviating video domain shift, improving the generalizability and portability of video models. This paper surveys recent progress in VUDA with deep learning. We begin with the motivation of VUDA, followed by its definition, and recent progress of methods for both closed-set VUDA and VUDA under different scenarios, and current benchmark datasets for VUDA research. Eventually, future directions are provided to promote further VUDA research. The repository of this survey is provided at https://github.com/xuyu0010/awesome-video-domain-adaptation.

7/30/2024