Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition

Read original: arXiv:2406.17538 - Published 7/30/2024 by Guanghao Zhu, Lin Liu, Yuhao Hu, Haixin Sun, Fang Liu, Xiaohui Du, Ruqian Hao, Juanxiu Liu, Yong Liu, Hao Deng and 1 other

Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition

Overview

Proposes a new deep learning model called SKD-TSTSAN for micro-expression recognition
Incorporates three data streams and a self-knowledge distillation mechanism
Claims improved performance compared to existing methods

Plain English Explanation

The research paper introduces a new deep learning model called SKD-TSTSAN (Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation) for the task of micro-expression recognition. Micro-expressions are subtle, fleeting facial expressions that can reveal a person's true emotions.

The SKD-TSTSAN model has three data streams that capture different aspects of the micro-expression, such as appearance, motion, and temporal information. It also incorporates a self-knowledge distillation mechanism, which allows the model to learn from its own intermediate representations and improve its performance.

The authors claim that their SKD-TSTSAN model outperforms existing micro-expression recognition methods in terms of accuracy and other performance metrics.

Technical Explanation

The SKD-TSTSAN model consists of three main components: temporal-shift attention modules, a multi-stream architecture, and a self-knowledge distillation mechanism.

The temporal-shift attention modules capture the temporal dynamics of micro-expressions by modeling the relationships between consecutive frames. The multi-stream architecture includes three data streams: one for appearance features, one for motion features, and one for temporal features. This allows the model to learn complementary representations from different aspects of the data.

The self-knowledge distillation mechanism transfers knowledge from the intermediate representations of the model to the final classification layer, which can improve the model's overall performance.

The authors evaluated the SKD-TSTSAN model on several micro-expression recognition benchmarks and reported state-of-the-art results, demonstrating the effectiveness of their approach.

Critical Analysis

The paper provides a thorough technical explanation of the SKD-TSTSAN model and its components. However, the authors do not address certain limitations or potential issues with their approach.

For example, the model's reliance on temporal information may make it sensitive to variations in micro-expression duration or speed, which could affect its robustness in real-world scenarios. Additionally, the self-knowledge distillation mechanism may be computationally expensive and require careful hyperparameter tuning to achieve optimal performance.

Further research could explore ways to make the SKD-TSTSAN model more adaptable to different micro-expression characteristics or investigate alternative knowledge distillation techniques that are more efficient and effective.

Conclusion

The SKD-TSTSAN model proposed in this paper represents a significant advancement in the field of micro-expression recognition. By leveraging multiple data streams and a self-knowledge distillation mechanism, the model is able to achieve state-of-the-art performance on benchmark datasets.

While the paper provides a strong technical foundation, there are opportunities for further research to address potential limitations and enhance the model's robustness and efficiency. As the field of micro-expression recognition continues to evolve, the insights and techniques presented in this paper can serve as a valuable reference for future work in this area.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition

Guanghao Zhu, Lin Liu, Yuhao Hu, Haixin Sun, Fang Liu, Xiaohui Du, Ruqian Hao, Juanxiu Liu, Yong Liu, Hao Deng, Jing Zhang

Micro-expressions are subtle facial movements that occur spontaneously when people try to conceal real emotions. Micro-expression recognition is crucial in many fields, including criminal analysis and psychotherapy. However, micro-expression recognition is challenging since micro-expressions have low intensity and public datasets are small in size. To this end, a three-stream temporal-shift attention network based on self-knowledge distillation called SKD-TSTSAN is proposed in this paper. Firstly, to address the low intensity of muscle movements, we utilize learning-based motion magnification modules to enhance the intensity of muscle movements. Secondly, we employ efficient channel attention modules in the local-spatial stream to make the network focus on facial regions that are highly relevant to micro-expressions. In addition, temporal shift modules are used in the dynamic-temporal stream, which enables temporal modeling with no additional parameters by mixing motion information from two different temporal domains. Furthermore, we introduce self-knowledge distillation into the micro-expression recognition task by introducing auxiliary classifiers and using the deepest section of the network for supervision, encouraging all blocks to fully explore the features of the training set. Finally, extensive experiments are conducted on four public datasets: CASME II, SAMM, MMEW, and CAS(ME)3. The experimental results demonstrate that our SKD-TSTSAN outperforms other existing methods and achieves new state-of-the-art performance. Our code will be available at https://github.com/GuanghaoZhu663/SKD-TSTSAN.

7/30/2024

New!Synergistic Spotting and Recognition of Micro-Expression via Temporal State Transition

Bochao Zou, Zizheng Guo, Wenfeng Qin, Xin Li, Kangsheng Wang, Huimin Ma

Micro-expressions are involuntary facial movements that cannot be consciously controlled, conveying subtle cues with substantial real-world applications. The analysis of micro-expressions generally involves two main tasks: spotting micro-expression intervals in long videos and recognizing the emotions associated with these intervals. Previous deep learning methods have primarily relied on classification networks utilizing sliding windows. However, fixed window sizes and window-level hard classification introduce numerous constraints. Additionally, these methods have not fully exploited the potential of complementary pathways for spotting and recognition. In this paper, we present a novel temporal state transition architecture grounded in the state space model, which replaces conventional window-level classification with video-level regression. Furthermore, by leveraging the inherent connections between spotting and recognition tasks, we propose a synergistic strategy that enhances overall analysis performance. Extensive experiments demonstrate that our method achieves state-of-the-art performance. The codes and pre-trained models are available at https://github.com/zizheng-guo/ME-TST.

9/17/2024

Hierarchical Space-Time Attention for Micro-Expression Recognition

Haihong Hao, Shuo Wang, Huixia Ben, Yanbin Hao, Yansong Wang, Weiwei Wang

Micro-expression recognition (MER) aims to recognize the short and subtle facial movements from the Micro-expression (ME) video clips, which reveal real emotions. Recent MER methods mostly only utilize special frames from ME video clips or extract optical flow from these special frames. However, they neglect the relationship between movements and space-time, while facial cues are hidden within these relationships. To solve this issue, we propose the Hierarchical Space-Time Attention (HSTA). Specifically, we first process ME video frames and special frames or data parallelly by our cascaded Unimodal Space-Time Attention (USTA) to establish connections between subtle facial movements and specific facial areas. Then, we design Crossmodal Space-Time Attention (CSTA) to achieve a higher-quality fusion for crossmodal data. Finally, we hierarchically integrate USTA and CSTA to grasp the deeper facial cues. Our model emphasizes temporal modeling without neglecting the processing of special data, and it fuses the contents in different modalities while maintaining their respective uniqueness. Extensive experiments on the four benchmarks show the effectiveness of our proposed HSTA. Specifically, compared with the latest method on the CASME3 dataset, it achieves about 3% score improvement in seven-category classification.

5/7/2024

Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

Fengyuan Zhang, Zhaopei Huang, Xinjie Zhang, Qin Jin

Micro-expressions serve as essential cues for understanding individuals' genuine emotional states. Recognizing micro-expressions attracts increasing research attention due to its various applications in fields such as business negotiation and psychotherapy. However, the intricate and transient nature of micro-expressions poses a significant challenge to their accurate recognition. Most existing works either neglect temporal dependencies or suffer from redundancy issues in clip-level recognition. In this work, we propose a novel framework for micro-expression recognition, named the Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN). Our framework excels at capturing temporal dependencies between frames across the entire clip, thereby enhancing micro-expression recognition at the clip level. Specifically, the integration of Adaptive Temporal Motion layers empowers our method to aggregate global and local motion features inherent in micro-expressions. Experimental results demonstrate that ATM-GCN not only surpasses existing state-of-the-art methods, particularly on the Composite dataset, but also achieves superior performance on the latest micro-expression dataset CAS(ME)$^3$.

6/14/2024