MeciFace: Mechanomyography and Inertial Fusion-based Glasses for Edge Real-Time Recognition of Facial and Eating Activities

2306.13674

Published 4/4/2024 by Hymalai Bello, Sungho Suh, Bo Zhou, Paul Lukowicz

MeciFace: Mechanomyography and Inertial Fusion-based Glasses for Edge Real-Time Recognition of Facial and Eating Activities

Abstract

The increasing prevalence of stress-related eating behaviors and their impact on overall health highlights the importance of effective and ubiquitous monitoring systems. In this paper, we present MeciFace, an innovative wearable technology designed to monitor facial expressions and eating activities in real-time on-the-edge (RTE). MeciFace aims to provide a low-power, privacy-conscious, and highly accurate tool for promoting healthy eating behaviors and stress management. We employ lightweight convolutional neural networks as backbone models for facial expression and eating monitoring scenarios. The MeciFace system ensures efficient data processing with a tiny memory footprint, ranging from 11KB to 19 KB. During RTE evaluation, the system achieves an F1-score of < 86% for facial expression recognition and 94% for eating/drinking monitoring, for the RTE of unseen users (user-independent case).

Create account to get full access

Overview

The paper presents MeciFace, a system that uses mechanomyography (MMG) and inertial sensors in glasses to recognize facial expressions and eating activities in real-time.
The system is designed to run on edge devices, allowing for low-latency and privacy-preserving recognition of these activities.
Key applications include human-computer interaction, health monitoring, and activity tracking.

Plain English Explanation

MeciFace is a new technology that can detect facial expressions and eating behaviors in real-time using special glasses. The glasses have sensors that measure subtle muscle movements and motion, allowing them to recognize things like smiling, frowning, and chewing.

This is useful for a few reasons. First, it could improve how we interact with computers and other devices. Imagine being able to control a computer or smart home just by moving your face. Second, it could help monitor health and track eating habits, which is important for managing conditions like obesity or swallowing disorders.

The key innovation of MeciFace is that it can perform this recognition directly on the glasses, without needing to send data to a central server. This keeps the information private and allows for fast, responsive performance, even in situations where an internet connection is not available.

Technical Explanation

The MeciFace system consists of mechanomyography (MMG) sensors and inertial measurement unit (IMU) sensors embedded in a pair of eyeglasses. The MMG sensors measure subtle muscle movements on the face, while the IMU sensors track head and facial motions.

A custom deep learning model is trained to fuse the MMG and IMU data and recognize a range of facial expressions and eating activities in real-time. The model is optimized to run efficiently on edge devices, allowing the recognition to happen directly on the glasses without relying on cloud processing.

The researchers evaluated MeciFace in a user study, demonstrating its ability to accurately detect activities like smiling, frowning, chewing, drinking, and swallowing. They also showed that the system can run with low latency and low power consumption on a mobile system-on-chip.

Critical Analysis

The paper provides a thorough technical description of the MeciFace system and its capabilities. However, it does not address some potential limitations or areas for future work.

For example, the system is currently limited to a pre-defined set of facial expressions and eating activities. It's unclear how well it would generalize to other, more nuanced facial movements or unconventional eating behaviors. Additionally, the paper does not discuss the robustness of the system to factors like variations in lighting, skin tone, or eyewear.

Further research could explore expanding the recognition capabilities, improving robustness, and investigating privacy implications of this type of sensing technology. Ethical considerations around the use of such systems, especially in health and wellness applications, should also be carefully examined.

Conclusion

MeciFace presents a promising approach for real-time, edge-based recognition of facial expressions and eating activities using a combination of mechanomyography and inertial sensing. This technology could enable a wide range of applications, from improved human-computer interaction to enhanced health monitoring and activity tracking. While the current system demonstrates strong performance, further research is needed to address potential limitations and explore the broader implications of this type of sensing technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

EchoGuide: Active Acoustic Guidance for LLM-Based Eating Event Analysis from Egocentric Videos

Vineet Parikh, Saif Mahmud, Devansh Agarwal, Ke Li, Franc{c}ois Guimbreti`ere, Cheng Zhang

Self-recording eating behaviors is a step towards a healthy lifestyle recommended by many health professionals. However, the current practice of manually recording eating activities using paper records or smartphone apps is often unsustainable and inaccurate. Smart glasses have emerged as a promising wearable form factor for tracking eating behaviors, but existing systems primarily identify when eating occurs without capturing details of the eating activities (E.g., what is being eaten). In this paper, we present EchoGuide, an application and system pipeline that leverages low-power active acoustic sensing to guide head-mounted cameras to capture egocentric videos, enabling efficient and detailed analysis of eating activities. By combining active acoustic sensing for eating detection with video captioning models and large-scale language models for retrieval augmentation, EchoGuide intelligently clips and analyzes videos to create concise, relevant activity records on eating. We evaluated EchoGuide with 9 participants in naturalistic settings involving eating activities, demonstrating high-quality summarization and significant reductions in video data needed, paving the way for practical, scalable eating activity tracking.

6/18/2024

cs.HC

MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses

Saif Mahmud, Devansh Agarwal, Ashwin Ajit, Qikang Liang, Thalia Viranda, Francois Guimbretiere, Cheng Zhang

We introduce MunchSonic, an AI-powered active acoustic sensing system integrated into eyeglasses, designed to track fine-grained dietary actions like hand-to-mouth movements for food intake, chewing, and drinking. MunchSonic emits inaudible ultrasonic waves from a commodity eyeglass frame. The reflected signals contain rich information about the position and movements of various body parts, including the mouth, jaw, arms, and hands, all of which are involved in eating activities. These signals are then processed by a custom deep-learning pipeline to classify six actions: food intake, chewing, drinking, talking, face-hand touching, and other activities (null). In an unconstrained user study with 12 participants, MunchSonic achieves a 93.5% macro F1-score in a user-independent evaluation with a 2-second time resolution, demonstrating its effectiveness. Additionally, MunchSonic accurately tracks eating episodes and the frequency of food intake within those episodes.

6/3/2024

cs.HC cs.ET

📉

AUGlasses: Continuous Action Unit based Facial Reconstruction with Low-power IMUs on Smart Glasses

Yanrong Li, Tengxiang Zhang, Xin Zeng, Yuntao Wang, Haotian Zhang, Yiqiang Chen

Recent advancements in augmented reality (AR) have enabled the use of various sensors on smart glasses for applications like facial reconstruction, which is vital to improve AR experiences for virtual social activities. However, the size and power constraints of smart glasses demand a miniature and low-power sensing solution. AUGlasses achieves unobtrusive low-power facial reconstruction by placing inertial measurement units (IMU) against the temporal area on the face to capture the skin deformations, which are caused by facial muscle movements. These IMU signals, along with historical data on facial action units (AUs), are processed by a transformer-based deep learning model to estimate AU intensities in real-time, which are then used for facial reconstruction. Our results show that AUGlasses accurately predicts the strength (0-5 scale) of 14 key AUs with a cross-user mean absolute error (MAE) of 0.187 (STD = 0.025) and achieves facial reconstruction with a cross-user MAE of 1.93 mm (STD = 0.353). We also integrated various preprocessing and training techniques to ensure robust performance for continuous sensing. Micro-benchmark tests indicate that our system consistently performs accurate continuous facial reconstruction with a fine-tuned cross-user model, achieving an AU MAE of 0.35.

5/24/2024

cs.HC cs.CV

👁️

Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable Sensor

Yuning Huang, Mohamed Abul Hassan, Jiangpeng He, Janine Higgins, Megan McCrory, Heather Eicher-Miller, Graham Thomas, Edward O Sazonov, Fengqing Maggie Zhu

Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage training framework that tactfully combines fine-tuning and transfer learning techniques. Our method is evaluated on a newly collected dataset called ``UA Free Living Study, which uses an egocentric wearable camera, AIM-2 sensor, to simulate food consumption in free-living conditions. The proposed training framework is applied to common neural network backbones, combined with approaches in the general imbalanced classification field. Experimental results on the collected dataset show that our proposed method for automatic ingestion environment recognition successfully addresses the challenging data imbalance problem in the dataset and achieves a promising overall classification accuracy of 96.63%.

5/14/2024

cs.MM cs.AI cs.CV