7th ABAW Competition: Multi-Task Learning and Compound Expression Recognition

Read original: arXiv:2407.03835 - Published 7/9/2024 by Dimitrios Kollias, Stefanos Zafeiriou, Irene Kotsia, Abhinav Dhall, Shreya Ghosh, Chunchang Shao, Guanyu Hu

7th ABAW Competition: Multi-Task Learning and Compound Expression Recognition

Overview

This paper discusses the 7th Affective Behavior Analysis in the Wild (ABAW) Competition, which focused on multi-task learning and compound expression recognition.
The competition included three main tasks: valence-arousal estimation, basic-compound expression recognition, and action unit detection.
The paper presents a summary of the competition, including the datasets used, evaluation metrics, and top-performing methods.

Plain English Explanation

The 7th ABAW Competition was an event that challenged researchers and developers to create AI systems that could analyze human emotions and expressions in real-world scenarios. This is an important area of study because being able to accurately detect and understand emotions can have many applications, such as in mental health support, customer service, and human-robot interaction.

The competition had three main tasks:

Valence-Arousal Estimation: Estimating the overall emotional state of a person, measured by their level of positivity/negativity (valence) and level of excitement/calmness (arousal).
Basic-Compound Expression Recognition: Identifying basic emotions like happiness, sadness, anger, etc., as well as more complex "compound" expressions that combine multiple emotions.
Action Unit Detection: Identifying the specific facial muscle movements that contribute to different emotional expressions.

The competition used several large datasets of real-world facial expressions and emotional states, which allowed the participating teams to train and test their AI models. The top-performing methods combined multiple machine learning techniques, such as deep learning and multi-task learning, to achieve high accuracy on the various tasks.

Overall, the 7th ABAW Competition pushed the boundaries of affective computing and showed the potential for AI systems to understand and interpret human emotions in natural settings.

Technical Explanation

The 7th ABAW Competition focused on three main tasks:

Valence-Arousal Estimation: Participants developed models to predict the overall emotional state of a person, as characterized by their level of positivity/negativity (valence) and level of excitement/calmness (arousal). This task used the Aff-Wild2 and s-Aff-Wild2 datasets.
Basic-Compound Expression Recognition: Participants built models to classify both basic emotional expressions (e.g., happiness, sadness, anger) and more complex "compound" expressions that involve a combination of emotions. The C-EXPR-DB dataset was used for this task.
Action Unit Detection: Participants created models to identify the specific facial muscle movements (action units) that contribute to different emotional expressions. This task utilized the Aff-Wild2 and s-Aff-Wild2 datasets.

The top-performing methods in the competition combined multiple machine learning techniques, such as deep learning architectures and multi-task learning approaches, to achieve high accuracy across the various tasks. For example, some teams used convolutional neural networks to extract visual features from facial images, and then applied recurrent neural networks or transformer-based models to capture the temporal dynamics of emotional expressions.

The competition provided a valuable benchmark for evaluating the state-of-the-art in affective computing and demonstrated the potential for AI systems to interpret human emotions in unconstrained, "in-the-wild" scenarios.

Critical Analysis

The 7th ABAW Competition made important strides in advancing the field of affective computing, but there are still several limitations and areas for further research:

The datasets used, while large and diverse, may not fully capture the complexity and nuance of human emotional expression in real-world situations. More work is needed to create even more realistic and representative datasets.
The competition focused on visual cues (facial expressions), but emotional states can also be influenced by other modalities, such as tone of voice, body language, and contextual information. Developing multimodal approaches could lead to more holistic and accurate emotion recognition.
The tasks in the competition were relatively narrow in scope, focusing on specific aspects of emotion analysis (valence-arousal, basic-compound expressions, action units). Expanding the competition to include more diverse and integrated tasks could provide a more comprehensive evaluation of affective computing capabilities.
While the top-performing methods demonstrated impressive results, there is still room for improvement, especially in terms of robustness, generalizability, and interpretability. Further research is needed to understand the limitations of current approaches and develop more reliable and explainable affective AI systems.

Overall, the 7th ABAW Competition represents an important milestone in the field of affective computing, but continued advancements in data collection, model design, and task integration will be crucial for realizing the full potential of AI-based emotion analysis in real-world applications.

Conclusion

The 7th ABAW Competition was a significant event that advanced the state-of-the-art in multi-task learning and compound expression recognition for affective behavior analysis. The competition's focus on valence-arousal estimation, basic-compound expression recognition, and action unit detection pushed researchers to develop innovative AI systems that could accurately interpret human emotions in unconstrained, real-world scenarios.

The top-performing methods demonstrated the power of combining deep learning, multi-task learning, and other machine learning techniques to achieve high accuracy across the various tasks. However, the research also highlighted the need for continued advancements in dataset creation, multimodal approaches, and the development of more robust and explainable affective computing systems.

As the field of affective computing continues to evolve, the insights and lessons learned from the 7th ABAW Competition will undoubtedly contribute to the development of AI systems that can better understand and respond to human emotions, with far-reaching applications in mental health, human-computer interaction, and beyond.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

7th ABAW Competition: Multi-Task Learning and Compound Expression Recognition

Dimitrios Kollias, Stefanos Zafeiriou, Irene Kotsia, Abhinav Dhall, Shreya Ghosh, Chunchang Shao, Guanyu Hu

This paper describes the 7th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with ECCV 2024. The 7th ABAW Competition addresses novel challenges in understanding human expressions and behaviors, crucial for the development of human-centered technologies. The Competition comprises of two sub-challenges: i) Multi-Task Learning (the goal is to learn at the same time, in a multi-task learning setting, to estimate two continuous affect dimensions, valence and arousal, to recognise between the mutually exclusive classes of the 7 basic expressions and 'other'), and to detect 12 Action Units); and ii) Compound Expression Recognition (the target is to recognise between the 7 mutually exclusive compound expression classes). s-Aff-Wild2, which is a static version of the A/V Aff-Wild2 database and contains annotations for valence-arousal, expressions and Action Units, is utilized for the purposes of the Multi-Task Learning Challenge; a part of C-EXPR-DB, which is an A/V in-the-wild database with compound expression annotations, is utilized for the purposes of the Compound Expression Recognition Challenge. In this paper, we introduce the two challenges, detailing their datasets and the protocols followed for each. We also outline the evaluation metrics, and highlight the baseline systems and their results. Additional information about the competition can be found at url{https://affective-behavior-analysis-in-the-wild.github.io/7th}.

7/9/2024

HSEmotion Team at the 7th ABAW Challenge: Multi-Task Learning and Compound Facial Expression Recognition

Andrey V. Savchenko

In this paper, we describe the results of the HSEmotion team in two tasks of the seventh Affective Behavior Analysis in-the-wild (ABAW) competition, namely, multi-task learning for simultaneous prediction of facial expression, valence, arousal, and detection of action units, and compound expression recognition. We propose an efficient pipeline based on frame-level facial feature extractors pre-trained in multi-task settings to estimate valence-arousal and basic facial expressions given a facial photo. We ensure the privacy-awareness of our techniques by using the lightweight architectures of neural networks, such as MT-EmotiDDAMFN, MT-EmotiEffNet, and MT-EmotiMobileFaceNet, that can run even on a mobile device without the need to send facial video to a remote server. It was demonstrated that a significant step in improving the overall accuracy is the smoothing of neural network output scores using Gaussian or box filters. It was experimentally demonstrated that such a simple post-processing of predictions from simple blending of two top visual models improves the F1-score of facial expression recognition up to 7%. At the same time, the mean Concordance Correlation Coefficient (CCC) of valence and arousal is increased by up to 1.25 times compared to each model's frame-level predictions. As a result, our final performance score on the validation set from the multi-task learning challenge is 4.5 times higher than the baseline (1.494 vs 0.32).

7/19/2024

Affective Behaviour Analysis via Progressive Learning

Chen Liu, Wei Zhang, Feng Qiu, Lincheng Li, Xin Yu

Affective Behavior Analysis aims to develop emotionally intelligent technology that can recognize and respond to human emotions. To advance this, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition establishes two tracks: i.e., the Multi-task Learning (MTL) Challenge and the Compound Expression (CE) challenge based on Aff-Wild2 and C-EXPR-DB datasets. In this paper, we present our methods and experimental results for the two competition tracks. Specifically, it can be summarized in the following four aspects: 1) To attain high-quality facial features, we train a Masked-Auto Encoder in a self-supervised manner. 2) We devise a temporal convergence module to capture the temporal information between video frames and explore the impact of window size and sequence length on each sub-task. 3) To facilitate the joint optimization of various sub-tasks, we explore the impact of sub-task joint training and feature fusion from individual tasks on each task performance improvement. 4) We utilize curriculum learning to transition the model from recognizing single expressions to recognizing compound expressions, thereby improving the accuracy of compound expression recognition. Extensive experiments demonstrate the superiority of our designs.

7/29/2024

Affective Behavior Analysis using Task-adaptive and AU-assisted Graph Network

Xiaodong Li, Wenchao Du, Hongyu Yang

In this paper, we present our solution and experiment result for the Multi-Task Learning Challenge of the 7th Affective Behavior Analysis in-the-wild(ABAW7) Competition. This challenge consists of three tasks: action unit detection, facial expression recognition, and valance-arousal estimation. We address the research problems of this challenge from three aspects: 1)For learning robust visual feature representations, we introduce the pre-trained large model Dinov2. 2) To adaptively extract the required features of eack task, we design a task-adaptive block that performs cross-attention between a set of learnable query vectors and pre-extracted features. 3) By proposing the AU-assisted Graph Convolutional Network(AU-GCN), we make full use of the correlation information between AUs to assist in solving the EXPR and VA tasks. Finally, we achieve the evaluation measure of textbf{1.2542} on the validation set provided by the organizers.

7/17/2024