Unveiling the Human-like Similarities of Automatic Facial Expression Recognition: An Empirical Exploration through Explainable AI

Read original: arXiv:2401.11835 - Published 9/4/2024 by F. Xavier Gaya-Morey, Silvia Ramis-Guarinos, Cristina Manresa-Yee, Jose M. Buades-Rubio

🤖

Overview

Facial expression recognition is crucial for understanding human behavior.
Deep learning has enabled models that can outperform humans in this task.
However, it's unclear how closely these deep neural networks mimic human perception.
This study aims to explore the similarity between deep neural networks and human perception of facial expressions.

Plain English Explanation

Facial expressions are an important way that humans communicate and understand each other's emotions. Deep learning models have become very good at recognizing facial expressions, even better than humans in some cases. But it's not clear how these deep learning models actually "see" and process facial expressions compared to how humans do it.

This study looked at 12 different deep learning models, both general object classifiers and models specifically designed for recognizing facial expressions. The researchers used a special technique to generate heatmaps that showed which parts of the face were most important for each model when recognizing different facial expressions.

They then compared these heatmaps to the facial regions that are known to be important for human perception of facial expressions, based on previous research. The goal was to see how similar the deep learning models' processing was to human perception.

Technical Explanation

The researchers employed an explainable AI method to generate heatmaps revealing the crucial facial regions used by 12 different deep neural networks when classifying 6 basic facial expressions. This allowed them to assess how closely the models' processing aligned with human perception of facial expressions.

They quantitatively compared the heatmaps to ground truth masks based on Friesen and Ekman's well-established description of facial expression cues. Metrics used included Intersection over Union (IoU) and normalized correlation coefficients. Qualitatively, they observed that models with pre-trained weights showed more similarity in heatmaps compared to those without pre-training.

Across all expressions and architectures, the average IoU was low at 0.2702, indicating limited alignment between the models and human perception. The best-performing architecture averaged 0.3269, while the worst-averaged 0.2066. Dendrograms based on normalized correlation revealed two main clusters - models with and without pre-training.

Critical Analysis

The findings suggest that current deep learning models for facial expression recognition, while highly capable, do not closely mimic human perceptual processing. The researchers note that network architecture seems to influence the level of similarity, as models with similar architectures prioritized similar facial regions.

This work highlights the importance of not solely relying on model performance metrics, but also examining the internal workings and decision-making processes of these systems. Without this deeper understanding, it's difficult to have confidence that they are truly replicating human-level intelligence and decision-making.

Further research is needed to better understand the factors that drive the divergence between machine and human facial expression processing. Exploring model architectures, training data, and other design choices could yield insights to bring these systems closer to human-level perception and reasoning.

Conclusion

This study provides valuable insights into the limitations of current deep learning models for facial expression recognition. While these models can outperform humans on the task, they do not closely mirror human perceptual processing of facial cues.

Understanding these differences is crucial as we develop AI systems intended to interact with and understand humans. Bridging the gap between machine and human facial expression processing could lead to more natural, intuitive, and trustworthy human-AI interactions in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤖

Unveiling the Human-like Similarities of Automatic Facial Expression Recognition: An Empirical Exploration through Explainable AI

F. Xavier Gaya-Morey, Silvia Ramis-Guarinos, Cristina Manresa-Yee, Jose M. Buades-Rubio

Facial expression recognition is vital for human behavior analysis, and deep learning has enabled models that can outperform humans. However, it is unclear how closely they mimic human processing. This study aims to explore the similarity between deep neural networks and human perception by comparing twelve different networks, including both general object classifiers and FER-specific models. We employ an innovative global explainable AI method to generate heatmaps, revealing crucial facial regions for the twelve networks trained on six facial expressions. We assess these results both quantitatively and qualitatively, comparing them to ground truth masks based on Friesen and Ekman's description and among them. We use Intersection over Union (IoU) and normalized correlation coefficients for comparisons. We generate 72 heatmaps to highlight critical regions for each expression and architecture. Qualitatively, models with pre-trained weights show more similarity in heatmaps compared to those without pre-training. Specifically, eye and nose areas influence certain facial expressions, while the mouth is consistently important across all models and expressions. Quantitatively, we find low average IoU values (avg. 0.2702) across all expressions and architectures. The best-performing architecture averages 0.3269, while the worst-performing one averages 0.2066. Dendrograms, built with the normalized correlation coefficient, reveal two main clusters for most expressions: models with pre-training and models without pre-training. Findings suggest limited alignment between human and AI facial expression recognition, with network architectures influencing the similarity, as similar architectures prioritize similar facial regions.

9/4/2024

Post-hoc and manifold explanations analysis of facial expression data based on deep learning

Yang Xiao

The complex information processing system of humans generates a lot of objective and subjective evaluations, making the exploration of human cognitive products of great cutting-edge theoretical value. In recent years, deep learning technologies, which are inspired by biological brain mechanisms, have made significant strides in the application of psychological or cognitive scientific research, particularly in the memorization and recognition of facial data. This paper investigates through experimental research how neural networks process and store facial expression data and associate these data with a range of psychological attributes produced by humans. Researchers utilized deep learning model VGG16, demonstrating that neural networks can learn and reproduce key features of facial data, thereby storing image memories. Moreover, the experimental results reveal the potential of deep learning models in understanding human emotions and cognitive processes and establish a manifold visualization interpretation of cognitive products or psychological attributes from a non-Euclidean space perspective, offering new insights into enhancing the explainability of AI. This study not only advances the application of AI technology in the field of psychology but also provides a new psychological theoretical understanding the information processing of the AI. The code is available in here: https://github.com/NKUShaw/Psychoinformatics.

4/30/2024

👁️

Explainable Facial Expression Recognition for People with Intellectual Disabilities

Silvia Ramis Guarinos, Cristina Manresa Yee, Jose Maria Buades Rubio, Francesc Xavier Gaya-Morey

Facial expression recognition plays an important role in human behaviour, communication, and interaction. Recent neural networks have demonstrated to perform well at its automatic recognition, with different explainability techniques available to make them more transparent. In this work, we propose a facial expression recognition study for people with intellectual disabilities that would be integrated into a social robot. We train two well-known neural networks with five databases of facial expressions and test them with two databases containing people with and without intellectual disabilities. Finally, we study in which regions the models focus to perceive a particular expression using two different explainability techniques: LIME and RISE, assessing the differences when used on images containing disabled and non-disabled people.

5/21/2024

Towards A Comprehensive Visual Saliency Explanation Framework for AI-based Face Recognition Systems

Yuhang Lu, Zewei Xu, Touradj Ebrahimi

Over recent years, deep convolutional neural networks have significantly advanced the field of face recognition techniques for both verification and identification purposes. Despite the impressive accuracy, these neural networks are often criticized for lacking explainability. There is a growing demand for understanding the decision-making process of AI-based face recognition systems. Some studies have investigated the use of visual saliency maps as explanations, but they have predominantly focused on the specific face verification case. The discussion on more general face recognition scenarios and the corresponding evaluation methodology for these explanations have long been absent in current research. Therefore, this manuscript conceives a comprehensive explanation framework for face recognition tasks. Firstly, an exhaustive definition of visual saliency map-based explanations for AI-based face recognition systems is provided, taking into account the two most common recognition situations individually, i.e., face verification and identification. Secondly, a new model-agnostic explanation method named CorrRISE is proposed to produce saliency maps, which reveal both the similar and dissimilar regions between any given face images. Subsequently, the explanation framework conceives a new evaluation methodology that offers quantitative measurement and comparison of the performance of general visual saliency explanation methods in face recognition. Consequently, extensive experiments are carried out on multiple verification and identification scenarios. The results showcase that CorrRISE generates insightful saliency maps and demonstrates superior performance, particularly in similarity maps in comparison with the state-of-the-art explanation approaches.

7/9/2024