HSEmotion Team at the 7th ABAW Challenge: Multi-Task Learning and Compound Facial Expression Recognition

Read original: arXiv:2407.13184 - Published 7/19/2024 by Andrey V. Savchenko

HSEmotion Team at the 7th ABAW Challenge: Multi-Task Learning and Compound Facial Expression Recognition

Overview

Researchers from the HSEmotion team participated in the 7th Affective Behavior Analysis in-the-Wild (ABAW) Challenge
They explored multi-task learning and compound facial expression recognition approaches
Their work focused on improving performance in tasks like valence-arousal prediction, action unit detection, and compound expression recognition

Plain English Explanation

The researchers from the HSEmotion team took part in a competition called the 7th ABAW Challenge, which was focused on analyzing people's emotions and behaviors using facial expressions. They tried out different machine learning techniques, including a method called "multi-task learning" and an approach for recognizing complex "compound" facial expressions.

In multi-task learning, the model is trained to perform multiple related tasks at the same time, like predicting both the intensity of emotions (valence and arousal) and identifying specific facial movements (action units). This can help the model learn more effective features that are useful across different tasks.

The researchers also worked on recognizing compound facial expressions, which are complex emotional expressions that involve a combination of basic emotions like happiness, sadness, anger, etc. This is a challenging problem because people's facial expressions in real-life situations are often nuanced and don't fit neatly into a small set of categories.

By pursuing these innovative approaches, the HSEmotion team aimed to advance the state-of-the-art in facial expression recognition and emotion analysis, which has important applications in areas like human-computer interaction, mental health monitoring, and customer experience analysis.

Technical Explanation

The HSEmotion team participated in the 7th ABAW Challenge, which involved multiple tasks related to affective behavior analysis in unconstrained settings. They explored the use of multi-task learning to jointly optimize performance on tasks like valence-arousal prediction, action unit detection, and compound expression recognition.

For the valence-arousal prediction task, the team used a lightweight neural network architecture with attention mechanisms to enhance facial expression recognition. They also incorporated temporal smoothing of frame-wise predictions to improve robustness.

In the compound expression recognition task, the team experimented with an ensemble of multiple models, including a transformer-based network and a convolutional neural network. This multi-model ensemble approach aimed to capture the complementary strengths of different architectures for recognizing complex emotional expressions.

The team's work demonstrated the potential benefits of multi-task learning and advanced techniques for compound facial expression recognition in the context of the ABAW challenge. Their findings contribute to the ongoing research efforts to develop more accurate and robust affective computing systems.

Critical Analysis

The paper provides a comprehensive overview of the HSEmotion team's approach and results in the 7th ABAW Challenge. The researchers made several interesting methodological choices, such as the use of multi-task learning and ensemble modeling, which are generally well-justified and aligned with the state-of-the-art in the field.

However, the paper does not delve deeply into the limitations of their approaches or potential areas for improvement. For example, the performance gains from multi-task learning are not quantified, and the relative contributions of the individual components (e.g., attention mechanism, temporal smoothing) are not clearly isolated. Additionally, the ensemble modeling approach for compound expression recognition is not compared to simpler baselines, making it difficult to assess the true value added by the multi-model ensemble.

Further research could explore more rigorous ablation studies and additional benchmark comparisons to better understand the strengths and weaknesses of the proposed techniques. Exploring the generalizability of the methods to other datasets or real-world applications would also be valuable.

Overall, the paper presents a solid contribution to the field of affective computing, but there is room for deeper analysis and more comprehensive evaluation to fully understand the merits and limitations of the HSEmotion team's approaches.

Conclusion

The HSEmotion team's participation in the 7th ABAW Challenge showcased their innovative work in multi-task learning and compound facial expression recognition. By leveraging techniques like attention mechanisms, temporal smoothing, and model ensembling, the team demonstrated the potential for improving the performance and robustness of affective computing systems.

Their research efforts contribute to the ongoing advancements in facial expression analysis and emotion recognition, which have important implications for a wide range of applications, from human-computer interaction and mental health monitoring to customer experience analysis and beyond. As the field of affective computing continues to evolve, the insights and methodologies explored by the HSEmotion team can inspire further research and development in this important area of study.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

HSEmotion Team at the 7th ABAW Challenge: Multi-Task Learning and Compound Facial Expression Recognition

Andrey V. Savchenko

In this paper, we describe the results of the HSEmotion team in two tasks of the seventh Affective Behavior Analysis in-the-wild (ABAW) competition, namely, multi-task learning for simultaneous prediction of facial expression, valence, arousal, and detection of action units, and compound expression recognition. We propose an efficient pipeline based on frame-level facial feature extractors pre-trained in multi-task settings to estimate valence-arousal and basic facial expressions given a facial photo. We ensure the privacy-awareness of our techniques by using the lightweight architectures of neural networks, such as MT-EmotiDDAMFN, MT-EmotiEffNet, and MT-EmotiMobileFaceNet, that can run even on a mobile device without the need to send facial video to a remote server. It was demonstrated that a significant step in improving the overall accuracy is the smoothing of neural network output scores using Gaussian or box filters. It was experimentally demonstrated that such a simple post-processing of predictions from simple blending of two top visual models improves the F1-score of facial expression recognition up to 7%. At the same time, the mean Concordance Correlation Coefficient (CCC) of valence and arousal is increased by up to 1.25 times compared to each model's frame-level predictions. As a result, our final performance score on the validation set from the multi-task learning challenge is 4.5 times higher than the baseline (1.494 vs 0.32).

7/19/2024

7th ABAW Competition: Multi-Task Learning and Compound Expression Recognition

Dimitrios Kollias, Stefanos Zafeiriou, Irene Kotsia, Abhinav Dhall, Shreya Ghosh, Chunchang Shao, Guanyu Hu

This paper describes the 7th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with ECCV 2024. The 7th ABAW Competition addresses novel challenges in understanding human expressions and behaviors, crucial for the development of human-centered technologies. The Competition comprises of two sub-challenges: i) Multi-Task Learning (the goal is to learn at the same time, in a multi-task learning setting, to estimate two continuous affect dimensions, valence and arousal, to recognise between the mutually exclusive classes of the 7 basic expressions and 'other'), and to detect 12 Action Units); and ii) Compound Expression Recognition (the target is to recognise between the 7 mutually exclusive compound expression classes). s-Aff-Wild2, which is a static version of the A/V Aff-Wild2 database and contains annotations for valence-arousal, expressions and Action Units, is utilized for the purposes of the Multi-Task Learning Challenge; a part of C-EXPR-DB, which is an A/V in-the-wild database with compound expression annotations, is utilized for the purposes of the Compound Expression Recognition Challenge. In this paper, we introduce the two challenges, detailing their datasets and the protocols followed for each. We also outline the evaluation metrics, and highlight the baseline systems and their results. Additional information about the competition can be found at url{https://affective-behavior-analysis-in-the-wild.github.io/7th}.

7/9/2024

👁️

Enhancing Facial Expression Recognition through Dual-Direction Attention Mixed Feature Networks: Application to 7th ABAW Challenge

Josep Cabacas-Maso, Elena Ortega-Beltr'an, Ismael Benito-Altamirano, Carles Ventura

We present our contribution to the 7th ABAW challenge at ECCV 2024, by utilizing a Dual-Direction Attention Mixed Feature Network (DDAMFN) for multitask facial expression recognition, we achieve results far beyond the proposed baseline for the Multi-Task ABAW challenge. Our proposal uses the well-known DDAMFN architecture as base to effectively predict valence-arousal, emotion recognition, and facial action units. We demonstrate the architecture ability to handle these tasks simultaneously, providing insights into its architecture and the rationale behind its design. Additionally, we compare our results for a multitask solution with independent single-task performance.

9/6/2024

Affective Behaviour Analysis via Progressive Learning

Chen Liu, Wei Zhang, Feng Qiu, Lincheng Li, Xin Yu

Affective Behavior Analysis aims to develop emotionally intelligent technology that can recognize and respond to human emotions. To advance this, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition establishes two tracks: i.e., the Multi-task Learning (MTL) Challenge and the Compound Expression (CE) challenge based on Aff-Wild2 and C-EXPR-DB datasets. In this paper, we present our methods and experimental results for the two competition tracks. Specifically, it can be summarized in the following four aspects: 1) To attain high-quality facial features, we train a Masked-Auto Encoder in a self-supervised manner. 2) We devise a temporal convergence module to capture the temporal information between video frames and explore the impact of window size and sequence length on each sub-task. 3) To facilitate the joint optimization of various sub-tasks, we explore the impact of sub-task joint training and feature fusion from individual tasks on each task performance improvement. 4) We utilize curriculum learning to transition the model from recognizing single expressions to recognizing compound expressions, thereby improving the accuracy of compound expression recognition. Extensive experiments demonstrate the superiority of our designs.

7/29/2024