Orientation-conditioned Facial Texture Mapping for Video-based Facial Remote Photoplethysmography Estimation

2404.09378

Published 5/2/2024 by Sam Cantrill, David Ahmedt-Aristizabal, Lars Petersson, Hanna Suominen, Mohammad Ali Armin

Orientation-conditioned Facial Texture Mapping for Video-based Facial Remote Photoplethysmography Estimation

Abstract

Camera-based remote photoplethysmography (rPPG) enables contactless measurement of important physiological signals such as pulse rate (PR). However, dynamic and unconstrained subject motion introduces significant variability into the facial appearance in video, confounding the ability of video-based methods to accurately extract the rPPG signal. In this study, we leverage the 3D facial surface to construct a novel orientation-conditioned facial texture video representation which improves the motion robustness of existing video-based facial rPPG estimation methods. Our proposed method achieves a significant 18.2% performance improvement in cross-dataset testing on MMPD over our baseline using the PhysNet model trained on PURE, highlighting the efficacy and generalization benefits of our designed video representation. We demonstrate significant performance improvements of up to 29.6% in all tested motion scenarios in cross-dataset testing on MMPD, even in the presence of dynamic and unconstrained subject motion, emphasizing the benefits of disentangling motion through modeling the 3D facial surface for motion robust facial rPPG estimation. We validate the efficacy of our design decisions and the impact of different video processing steps through an ablation study. Our findings illustrate the potential strengths of exploiting the 3D facial surface as a general strategy for addressing dynamic and unconstrained subject motion in videos. The code is available at https://samcantrill.github.io/orientation-uv-rppg/.

Create account to get full access

Overview

The paper presents a method for estimating video-based facial remote photoplethysmography (rPPG) using orientation-conditioned facial texture mapping.
The proposed approach aims to address challenges in extracting reliable rPPG signals from facial videos under varying head orientations and illumination conditions.
The method leverages a deep learning model to map facial textures to an orientation-invariant representation, enabling more robust rPPG signal extraction.

Plain English Explanation

Measuring Heart Rate from Videos The paper discusses a technique for measuring a person's heart rate using only a video of their face. This is known as remote photoplethysmography (rPPG), and it could be useful for applications like health monitoring or remote physiological sensing.

One of the key challenges with rPPG is that the quality of the heart rate signal can be affected by how the person's head is oriented in the video, as well as changes in lighting conditions. [This is relevant to the internal links on camera-based remote physiology sensing and measuring domain shifts.]

To address this, the researchers developed a method that uses a deep learning model to extract an orientation-invariant representation of the facial textures in the video. This allows the heart rate to be measured more reliably, even when the person's head is turned or the lighting changes. [This relates to the concept of resolving domain conflicts for robust remote physiological measurement.]

Technical Explanation

The key elements of the paper's technical approach are:

Facial Texture Mapping: The method uses a deep learning model to map the facial textures in each video frame to an orientation-invariant representation. This helps compensate for changes in head pose and lighting that can disrupt the rPPG signal.
Orientation-Conditioning: The model is trained to explicitly condition the facial texture mapping on the head orientation, further improving the robustness to changes in head pose.
rPPG Signal Extraction: Once the orientation-invariant facial textures are obtained, the method applies standard rPPG signal processing techniques to extract the heart rate information from the video.

The researchers evaluated their approach on a large dataset of facial videos collected from hundreds of participants under various conditions. The results demonstrate that the orientation-conditioned facial texture mapping can significantly improve the accuracy and reliability of video-based heart rate estimation compared to previous methods. [This builds on prior work on camera-based remote physiology sensing and fast remote physiological measurement.]

Critical Analysis

The paper presents a well-designed and thorough evaluation of the proposed method, including comparisons to state-of-the-art approaches. However, some potential limitations and areas for future research are:

The dataset used for evaluation, while large, may not capture the full diversity of real-world conditions that could be encountered in practical applications. Further testing in more diverse and challenging scenarios would be valuable.
The paper does not deeply explore the potential biases or fairness implications of the method, such as how it may perform for people with different skin tones or facial features. [This relates to the concept of analyzing participant engagement and the importance of considering diversity and inclusion in remote physiological sensing.]
While the orientation-conditioning approach is a key innovation, the paper does not provide a detailed analysis of the specific head pose and lighting conditions that most significantly impact rPPG performance. A deeper understanding of these factors could inform future improvements.

Conclusion

The paper presents a novel approach for improving the reliability of video-based heart rate estimation using orientation-conditioned facial texture mapping. By addressing the challenges of head pose and lighting variations, the method represents an important step forward in making remote physiological sensing more robust and practical for a wide range of applications. [This aligns with the broader goals of resolving domain conflicts and measuring domain shifts in remote physiological measurement.]

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Camera-Based Remote Physiology Sensing for Hundreds of Subjects Across Skin Tones

Jiankai Tang, Xinyi Li, Jiacheng Liu, Xiyuxing Zhang, Zeyu Wang, Yuntao Wang

Remote photoplethysmography (rPPG) emerges as a promising method for non-invasive, convenient measurement of vital signs, utilizing the widespread presence of cameras. Despite advancements, existing datasets fall short in terms of size and diversity, limiting comprehensive evaluation under diverse conditions. This paper presents an in-depth analysis of the VitalVideo dataset, the largest real-world rPPG dataset to date, encompassing 893 subjects and 6 Fitzpatrick skin tones. Our experimentation with six unsupervised methods and three supervised models demonstrates that datasets comprising a few hundred subjects(i.e., 300 for UBFC-rPPG, 500 for PURE, and 700 for MMPD-Simple) are sufficient for effective rPPG model training. Our findings highlight the importance of diversity and consistency in skin tones for precise performance evaluation across different datasets.

4/9/2024

cs.CV cs.AI

RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos

Bochao Zou, Zizheng Guo, Xiaocheng Hu, Huimin Ma

Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing. Existing deep learning methods struggle to address two core issues of rPPG simultaneously: extracting weak rPPG signals from video segments with large spatiotemporal redundancy and understanding the periodic patterns of rPPG among long contexts. This represents a trade-off between computational complexity and the ability to capture long-range dependencies, posing a challenge for rPPG that is suitable for deployment on mobile devices. Based on the in-depth exploration of Mamba's comprehension of spatial and temporal information, this paper introduces RhythmMamba, an end-to-end Mamba-based method that employs multi-temporal Mamba to constrain both periodic patterns and short-term trends, coupled with frequency domain feed-forward to enable Mamba to robustly understand the quasi-periodic patterns of rPPG. Extensive experiments show that RhythmMamba achieves state-of-the-art performance with reduced parameters and lower computational complexity. The proposed RhythmMamba can be applied to video segments of any length without performance degradation. The codes are available at https://github.com/zizheng-guo/RhythmMamba.

4/10/2024

cs.CV

SiNC+: Adaptive Camera-Based Vitals with Unsupervised Learning of Periodic Signals

Jeremy Speth, Nathan Vance, Patrick Flynn, Adam Czajka

Subtle periodic signals, such as blood volume pulse and respiration, can be extracted from RGB video, enabling noncontact health monitoring at low cost. Advancements in remote pulse estimation -- or remote photoplethysmography (rPPG) -- are currently driven by deep learning solutions. However, modern approaches are trained and evaluated on benchmark datasets with ground truth from contact-PPG sensors. We present the first non-contrastive unsupervised learning framework for signal regression to mitigate the need for labelled video data. With minimal assumptions of periodicity and finite bandwidth, our approach discovers the blood volume pulse directly from unlabelled videos. We find that encouraging sparse power spectra within normal physiological bandlimits and variance over batches of power spectra is sufficient for learning visual features of periodic signals. We perform the first experiments utilizing unlabelled video data not specifically created for rPPG to train robust pulse rate estimators. Given the limited inductive biases, we successfully applied the same approach to camera-based respiration by changing the bandlimits of the target signal. This shows that the approach is general enough for unsupervised learning of bandlimited quasi-periodic signals from different domains. Furthermore, we show that the framework is effective for finetuning models on unlabelled video from a single subject, allowing for personalized and adaptive signal regressors.

4/23/2024

cs.CV cs.AI cs.LG

Analyzing Participants' Engagement during Online Meetings Using Unsupervised Remote Photoplethysmography with Behavioral Features

Alexander Vedernikov, Zhaodong Sun, Virpi-Liisa Kykyri, Mikko Pohjola, Miriam Nokia, Xiaobai Li

Engagement measurement finds application in healthcare, education, services. The use of physiological and behavioral features is viable, but the impracticality of traditional physiological measurement arises due to the need for contact sensors. We demonstrate the feasibility of unsupervised remote photoplethysmography (rPPG) as an alternative for contact sensors in deriving heart rate variability (HRV) features, then fusing these with behavioral features to measure engagement in online group meetings. Firstly, a unique Engagement Dataset of online interactions among social workers is collected with granular engagement labels, offering insight into virtual meeting dynamics. Secondly, a pre-trained rPPG model is customized to reconstruct rPPG signals from video meetings in an unsupervised manner, enabling the calculation of HRV features. Thirdly, the feasibility of estimating engagement from HRV features using short observation windows, with a notable enhancement when using longer observation windows of two to four minutes, is demonstrated. Fourthly, the effectiveness of behavioral cues is evaluated when fused with physiological data, which further enhances engagement estimation performance. An accuracy of 94% is achieved when only HRV features are used, eliminating the need for contact sensors or ground truth signals; use of behavioral cues raises the accuracy to 96%. Facial analysis offers precise engagement measurement, beneficial for future applications.

5/15/2024

cs.CV