AUGlasses: Continuous Action Unit based Facial Reconstruction with Low-power IMUs on Smart Glasses

2405.13289

YC

0

Reddit

0

Published 5/24/2024 by Yanrong Li, Tengxiang Zhang, Xin Zeng, Yuntao Wang, Haotian Zhang, Yiqiang Chen

📉

Abstract

Recent advancements in augmented reality (AR) have enabled the use of various sensors on smart glasses for applications like facial reconstruction, which is vital to improve AR experiences for virtual social activities. However, the size and power constraints of smart glasses demand a miniature and low-power sensing solution. AUGlasses achieves unobtrusive low-power facial reconstruction by placing inertial measurement units (IMU) against the temporal area on the face to capture the skin deformations, which are caused by facial muscle movements. These IMU signals, along with historical data on facial action units (AUs), are processed by a transformer-based deep learning model to estimate AU intensities in real-time, which are then used for facial reconstruction. Our results show that AUGlasses accurately predicts the strength (0-5 scale) of 14 key AUs with a cross-user mean absolute error (MAE) of 0.187 (STD = 0.025) and achieves facial reconstruction with a cross-user MAE of 1.93 mm (STD = 0.353). We also integrated various preprocessing and training techniques to ensure robust performance for continuous sensing. Micro-benchmark tests indicate that our system consistently performs accurate continuous facial reconstruction with a fine-tuned cross-user model, achieving an AU MAE of 0.35.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper introduces AUGlasses, a system that uses inertial measurement units (IMUs) on smart glasses to enable unobtrusive, low-power facial reconstruction for virtual social activities.
  • AUGlasses captures skin deformations caused by facial muscle movements using IMUs placed on the temporal area, and a transformer-based deep learning model estimates the intensity of 14 key facial action units (AUs) in real-time.
  • This facial reconstruction data is then used to enhance augmented reality (AR) experiences for virtual social interactions.

Plain English Explanation

AUGlasses: Unobtrusive Low-Power Facial Reconstruction using Smart Glasses

Imagine you're in a virtual video chat with friends, and you want your avatar to mimic your facial expressions. Traditional methods for facial tracking often rely on bulky sensors or cameras, which can be uncomfortable and distracting.

AUGlasses solves this problem by using small inertial measurement units (IMUs) placed on the temples of smart glasses. These IMUs can detect the subtle movements of your facial muscles as you make different expressions. A machine learning model then translates these muscle movements into the specific facial actions (like raising your eyebrows or smiling) that can be used to animate your virtual avatar.

The key advantage of this approach is that it's unobtrusive and low-power, allowing you to engage in virtual social activities without being weighed down by heavy sensors. The MeciFace and ActSonic systems demonstrate the potential of using IMUs for facial and activity recognition.

By accurately reconstructing your facial expressions in real-time, AUGlasses can create a more immersive and natural experience for virtual social interactions and avatar animation. This could be particularly useful for remote work, gaming, or other applications where you want your digital self to authentically reflect your emotions and nonverbal cues.

Technical Explanation

AUGlasses uses a transformer-based deep learning model to estimate the intensity of 14 key facial action units (AUs) in real-time from the IMU data collected on the smart glasses. The model is trained on historical data linking IMU signals to facial AU intensities, allowing it to accurately predict AU strengths on a 0-5 scale with a cross-user mean absolute error (MAE) of 0.187.

This facial reconstruction data is then used to animate a virtual avatar, achieving a cross-user MAE of 1.93 mm. The researchers also incorporated various preprocessing and training techniques to ensure robust continuous performance, with a fine-tuned cross-user model achieving an AU MAE of 0.35 in micro-benchmark tests.

Critical Analysis

The AUGlasses system presents an innovative approach to facial reconstruction using the unobtrusive sensors available on smart glasses. By focusing on skin deformations rather than facial landmarks or images, the system can operate in a low-power, continuous manner suitable for extended virtual social interactions.

However, the paper acknowledges several limitations. The current model relies on historical data to map IMU signals to facial AUs, which may not generalize well to all users. Additionally, the system has only been evaluated in controlled settings and may face challenges in real-world environments with more varied facial movements and occlusions.

Further research could explore techniques to personalize the model for individual users, perhaps by incorporating active learning or adaptation mechanisms. Addressing the limitations of the current approach could help AUGlasses achieve even more robust and accurate facial reconstruction for a wide range of virtual social applications.

Conclusion

The AUGlasses system demonstrates the potential of using smart glasses and inertial sensors to enable unobtrusive, low-power facial reconstruction for virtual social activities. By translating subtle facial muscle movements into precise estimates of facial action unit intensities, AUGlasses can create more immersive and natural avatar animations, enhancing the experience of remote collaboration, gaming, and other virtual interactions.

While the current system has some limitations, the core principles and techniques presented in this research could be further refined and expanded to bring us closer to seamless augmented reality experiences that authentically reflect our nonverbal communication and emotional expression.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🎲

IMUSE: IMU-based Facial Expression Capture

Youjia Wang, Yiwen Wu, Hengan Zhou, Hongyang Lin, Xingyue Peng, Yingwenqi Jiang, Yingsheng Zhu, Guanpeng Long, Yatu Zhang, Jingya Wang, Lan Xu, Jingyi Yu

YC

0

Reddit

0

For facial motion capture and analysis, the dominated solutions are generally based on visual cues, which cannot protect privacy and are vulnerable to occlusions. Inertial measurement units (IMUs) serve as potential rescues yet are mainly adopted for full-body motion capture. In this paper, we propose IMUSE to fill the gap, a novel path for facial expression capture using purely IMU signals, significantly distant from previous visual solutions.The key design in our IMUSE is a trilogy. We first design micro-IMUs to suit facial capture, companion with an anatomy-driven IMU placement scheme. Then, we contribute a novel IMU-ARKit dataset, which provides rich paired IMU/visual signals for diverse facial expressions and performances. Such unique multi-modality brings huge potential for future directions like IMU-based facial behavior analysis. Moreover, utilizing IMU-ARKit, we introduce a strong baseline approach to accurately predict facial blendshape parameters from purely IMU signals. The IMUSE framework empowers us to perform accurate facial capture in scenarios where visual methods falter and simultaneously safeguard user privacy. We conduct extensive experiments about both the IMU configuration and technical components to validate the effectiveness of our IMUSE approach. Notably, IMUSE enables various potential and novel applications, i.e., facial capture against occlusions or in a moving performance. We will release our dataset and implementations to enrich more possibilities of facial capture and analysis in our community.

Read more

6/13/2024

👁️

Causal Intervention for Subject-Deconfounded Facial Action Unit Recognition

Yingjie Chen, Diqi Chen, Tao Wang, Yizhou Wang, Yun Liang

YC

0

Reddit

0

Subject-invariant facial action unit (AU) recognition remains challenging for the reason that the data distribution varies among subjects. In this paper, we propose a causal inference framework for subject-invariant facial action unit recognition. To illustrate the causal effect existing in AU recognition task, we formulate the causalities among facial images, subjects, latent AU semantic relations, and estimated AU occurrence probabilities via a structural causal model. By constructing such a causal diagram, we clarify the causal effect among variables and propose a plug-in causal intervention module, CIS, to deconfound the confounder emph{Subject} in the causal diagram. Extensive experiments conducted on two commonly used AU benchmark datasets, BP4D and DISFA, show the effectiveness of our CIS, and the model with CIS inserted, CISNet, has achieved state-of-the-art performance.

Read more

4/4/2024

MeciFace: Mechanomyography and Inertial Fusion-based Glasses for Edge Real-Time Recognition of Facial and Eating Activities

MeciFace: Mechanomyography and Inertial Fusion-based Glasses for Edge Real-Time Recognition of Facial and Eating Activities

Hymalai Bello, Sungho Suh, Bo Zhou, Paul Lukowicz

YC

0

Reddit

0

The increasing prevalence of stress-related eating behaviors and their impact on overall health highlights the importance of effective and ubiquitous monitoring systems. In this paper, we present MeciFace, an innovative wearable technology designed to monitor facial expressions and eating activities in real-time on-the-edge (RTE). MeciFace aims to provide a low-power, privacy-conscious, and highly accurate tool for promoting healthy eating behaviors and stress management. We employ lightweight convolutional neural networks as backbone models for facial expression and eating monitoring scenarios. The MeciFace system ensures efficient data processing with a tiny memory footprint, ranging from 11KB to 19 KB. During RTE evaluation, the system achieves an F1-score of < 86% for facial expression recognition and 94% for eating/drinking monitoring, for the RTE of unseen users (user-independent case).

Read more

4/4/2024

ActSonic: Everyday Activity Recognition on Smart Glasses using Active Acoustic Sensing

ActSonic: Everyday Activity Recognition on Smart Glasses using Active Acoustic Sensing

Saif Mahmud, Vineet Parikh, Qikang Liang, Ke Li, Ruidong Zhang, Ashwin Ajit, Vipin Gunda, Devansh Agarwal, Franc{c}ois Guimbreti`ere, Cheng Zhang

YC

0

Reddit

0

We present ActSonic, an intelligent, low-power active acoustic sensing system integrated into eyeglasses that can recognize 27 different everyday activities (e.g., eating, drinking, toothbrushing) from inaudible acoustic waves around the body with a time resolution of one second. It only needs a pair of miniature speakers and microphones mounted on each hinge of eyeglasses to emit ultrasonic waves to create an acoustic aura around the body. Based on the position and motion of various body parts, the acoustic signals are reflected with unique patterns captured by the microphone and analyzed by a customized self-supervised deep learning framework to infer the performed activities. ActSonic was deployed in a user study with 19 participants across 19 households to evaluate its efficacy. Without requiring any training data from a new user (leave-one-participant-out evaluation), ActSonic was able to detect 27 activities, achieving an average F1-score of 86.6% in fully unconstrained scenarios and 93.4% in prompted settings at participants' homes.

Read more

5/9/2024