Meta-Auxiliary Learning for Micro-Expression Recognition

Read original: arXiv:2404.12024 - Published 4/19/2024 by Jingyao Wang, Yunhan Tian, Yuxuan Yang, Xiaoxin Chen, Changwen Zheng, Wenwen Qiang

Meta-Auxiliary Learning for Micro-Expression Recognition

Overview

The research paper proposes a novel approach called "Meta-Auxiliary Learning" for micro-expression recognition, a challenging task in emotion recognition.
The method leverages few-shot learning techniques to enable effective micro-expression recognition with limited training data.
It introduces an auxiliary task that helps the model learn more robust feature representations, leading to improved micro-expression recognition performance.

Plain English Explanation

Micro-expressions are very brief, subtle facial movements that can reveal a person's true emotions, even if they're trying to hide them. Recognizing micro-expressions is an important task in emotion recognition, but it's quite challenging because there's usually only a small amount of training data available.

The researchers in this paper developed a new approach called "Meta-Auxiliary Learning" to address this challenge. The key idea is to have the model learn an additional "auxiliary" task alongside the main micro-expression recognition task. This auxiliary task helps the model learn more general and robust features that can be effectively applied to the micro-expression recognition problem, even with limited training data.

The auxiliary task is designed in a way that allows the model to quickly adapt to new micro-expression recognition tasks, using techniques from few-shot learning. This meta-learning approach enables the model to perform well on micro-expression recognition, even when there's only a small amount of training data available for a particular task.

Technical Explanation

The paper proposes a "Meta-Auxiliary Learning" (MAL) framework for micro-expression recognition. The main components are:

Auxiliary Task: The model is trained to learn an auxiliary task, such as emotion intensity prediction, in addition to the primary micro-expression recognition task. This auxiliary task helps the model learn more general and robust feature representations.
Meta-Learning: The model is trained using a meta-learning approach, where it learns to quickly adapt to new micro-expression recognition tasks with limited data. This is achieved by simulating few-shot learning scenarios during training, as described in MEEL and MESEN.
Multimodal Fusion: The model integrates both visual and temporal information from micro-expression videos, as demonstrated in JMTE.

The researchers evaluate their approach on several micro-expression recognition benchmarks and show that it outperforms state-of-the-art methods, especially in few-shot learning scenarios.

Critical Analysis

The paper presents a well-designed and thorough approach to addressing the challenge of micro-expression recognition with limited training data. The use of an auxiliary task and meta-learning techniques is a clever strategy that helps the model learn more robust and generalizable features.

However, the paper does not discuss potential limitations or edge cases of the proposed method. For example, it's unclear how the approach would perform on micro-expressions with very subtle or ambiguous emotional cues, or how sensitive the method is to factors like video quality, lighting conditions, or individual variations in facial expressions.

Additionally, the paper could have provided more insights into the specific nature of the auxiliary task and how it contributes to the improved micro-expression recognition performance. A deeper analysis of the learned features and their transferability to other micro-expression recognition tasks would also have been valuable.

Overall, the research is a significant contribution to the field of micro-expression recognition, but further investigation into the method's limitations and potential refinements could strengthen the work.

Conclusion

The "Meta-Auxiliary Learning" approach presented in this paper offers a promising solution to the challenge of micro-expression recognition, particularly in scenarios with limited training data. By leveraging an auxiliary task and meta-learning techniques, the model is able to learn more robust and generalizable feature representations that lead to improved micro-expression recognition performance.

This research has important implications for emotion recognition systems, which could benefit from the ability to accurately detect subtle facial cues like micro-expressions. The method could also be adapted to other few-shot learning problems in computer vision and related domains.

Future research could explore the application of this approach to a wider range of micro-expression recognition tasks, as well as investigate the transferability of the learned features to other emotion-related problems. Advancements in this area have the potential to significantly enhance our understanding and interpretation of human emotions, with applications in fields such as healthcare, human-computer interaction, and social psychology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Meta-Auxiliary Learning for Micro-Expression Recognition

Jingyao Wang, Yunhan Tian, Yuxuan Yang, Xiaoxin Chen, Changwen Zheng, Wenwen Qiang

Micro-expressions (MEs) are involuntary movements revealing people's hidden feelings, which has attracted numerous interests for its objectivity in emotion detection. However, despite its wide applications in various scenarios, micro-expression recognition (MER) remains a challenging problem in real life due to three reasons, including (i) data-level: lack of data and imbalanced classes, (ii) feature-level: subtle, rapid changing, and complex features of MEs, and (iii) decision-making-level: impact of individual differences. To address these issues, we propose a dual-branch meta-auxiliary learning method, called LightmanNet, for fast and robust micro-expression recognition. Specifically, LightmanNet learns general MER knowledge from limited data through a dual-branch bi-level optimization process: (i) In the first level, it obtains task-specific MER knowledge by learning in two branches, where the first branch is for learning MER features via primary MER tasks, while the other branch is for guiding the model obtain discriminative features via auxiliary tasks, i.e., image alignment between micro-expressions and macro-expressions since their resemblance in both spatial and temporal behavioral patterns. The two branches of learning jointly constrain the model of learning meaningful task-specific MER knowledge while avoiding learning noise or superficial connections between MEs and emotions that may damage its generalization ability. (ii) In the second level, LightmanNet further refines the learned task-specific knowledge, improving model generalization and efficiency. Extensive experiments on various benchmark datasets demonstrate the superior robustness and efficiency of LightmanNet.

4/19/2024

Micro-Expression Recognition by Motion Feature Extraction based on Pre-training

Ruolin Li, Lu Wang, Tingting Yang, Lisheng Xu, Bingyang Ma, Yongchun Li, Hongchao Wei

Micro-expressions (MEs) are spontaneous, unconscious facial expressions that have promising applications in various fields such as psychotherapy and national security. Thus, micro-expression recognition (MER) has attracted more and more attention from researchers. Although various MER methods have emerged especially with the development of deep learning techniques, the task still faces several challenges, e.g. subtle motion and limited training data. To address these problems, we propose a novel motion extraction strategy (MoExt) for the MER task and use additional macro-expression data in the pre-training process. We primarily pretrain the feature separator and motion extractor using the contrastive loss, thus enabling them to extract representative motion features. In MoExt, shape features and texture features are first extracted separately from onset and apex frames, and then motion features related to MEs are extracted based on the shape features of both frames. To enable the model to more effectively separate features, we utilize the extracted motion features and the texture features from the onset frame to reconstruct the apex frame. Through pre-training, the module is enabled to extract inter-frame motion features of facial expressions while excluding irrelevant information. The feature separator and motion extractor are ultimately integrated into the MER network, which is then fine-tuned using the target ME data. The effectiveness of proposed method is validated on three commonly used datasets, i.e., CASME II, SMIC, SAMM, and CAS(ME)3 dataset. The results show that our method performs favorably against state-of-the-art methods.

7/11/2024

From Macro to Micro: Boosting micro-expression recognition via pre-training on macro-expression videos

Hanting Li, Hongjing Niu, Feng Zhao

Micro-expression recognition (MER) has drawn increasing attention in recent years due to its potential applications in intelligent medical and lie detection. However, the shortage of annotated data has been the major obstacle to further improve deep-learning based MER methods. Intuitively, utilizing sufficient macro-expression data to promote MER performance seems to be a feasible solution. However, the facial patterns of macro-expressions and micro-expressions are significantly different, which makes naive transfer learning methods difficult to deploy directly. To tacle this issue, we propose a generalized transfer learning paradigm, called textbf{MA}cro-expression textbf{TO} textbf{MI}cro-expression (MA2MI). Under our paradigm, networks can learns the ability to represent subtle facial movement by reconstructing future frames. In addition, we also propose a two-branch micro-action network (MIACNet) to decouple facial position features and facial action features, which can help the network more accurately locate facial action locations. Extensive experiments on three popular MER benchmarks demonstrate the superiority of our method.

6/5/2024

Micro-expression recognition based on depth map to point cloud

Ren Zhang, Jianqin Yin, Chao Qi, Zehao Wang, Zhicheng Zhang, Yonghao Dang

Micro-expressions are nonverbal facial expressions that reveal the covert emotions of individuals, making the micro-expression recognition task receive widespread attention. However, the micro-expression recognition task is challenging due to the subtle facial motion and brevity in duration. Many 2D image-based methods have been developed in recent years to recognize MEs effectively, but, these approaches are restricted by facial texture information and are susceptible to environmental factors, such as lighting. Conversely, depth information can effectively represent motion information related to facial structure changes and is not affected by lighting. Motion information derived from facial structures can describe motion features that pixel textures cannot delineate. We proposed a network for micro-expression recognition based on facial depth information, and our experiments have demonstrated the crucial role of depth maps in the micro-expression recognition task. Initially, we transform the depth map into a point cloud and obtain the motion information for each point by aligning the initiating frame with the apex frame and performing a differential operation. Subsequently, we adjusted all point cloud motion feature input dimensions and used them as inputs for multiple point cloud networks to assess the efficacy of this representation. PointNet++ was chosen as the ultimate outcome for micro-expression recognition due to its superior performance. Our experiments show that our proposed method significantly outperforms the existing deep learning methods, including the baseline, on the $CAS(ME)^3$ dataset, which includes depth information.

6/13/2024