Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News

2404.13493

Published 4/23/2024 by Qixuan Zhang, Zhifeng Wang, Yang Liu, Zhenyue Qin, Kaihao Zhang, Sabrina Caldwell, Tom Gedeon

Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News

Abstract

In this paper, we present a novel benchmark for Emotion Recognition using facial landmarks extracted from realistic news videos. Traditional methods relying on RGB images are resource-intensive, whereas our approach with Facial Landmark Emotion Recognition (FLER) offers a simplified yet effective alternative. By leveraging Graph Neural Networks (GNNs) to analyze the geometric and spatial relationships of facial landmarks, our method enhances the understanding and accuracy of emotion recognition. We discuss the advancements and challenges in deep learning techniques for emotion recognition, particularly focusing on Graph Neural Networks (GNNs) and Transformers. Our experimental results demonstrate the viability and potential of our dataset as a benchmark, setting a new direction for future research in emotion recognition technologies. The codes and models are at: https://github.com/wangzhifengharrison/benchmark_real_news

Create account to get full access

Overview

This paper explores the use of facial expression analysis to map authentic emotional responses of people watching real news footage.
The researchers developed a graph neural network model to detect and analyze facial landmarks and emotions in news videos, and then benchmarked its performance against human raters.
The goal was to create a more accurate and reliable system for tracking emotional reactions to real-world events, with potential applications in areas like media analysis, psychology, and affective computing.

Plain English Explanation

The researchers in this study were interested in using computer vision and machine learning to better understand how people emotionally respond to real-world news events. They developed an advanced artificial intelligence (AI) model that could analyze the facial expressions of people watching news footage and detect their underlying emotions.

Typically, emotion recognition systems are trained on acted or posed facial expressions, which may not accurately reflect how people genuinely feel in real-life situations. To address this, the researchers used news videos as their dataset, which capture people's natural, unscripted reactions. They then had human raters watch the same videos and provide their own judgments of the emotions displayed, which served as a benchmark to evaluate the performance of their AI model.

The key innovation was the use of a graph neural network architecture, which allowed the model to better understand the relationships between different facial features and how they contribute to emotional expressions. This made the emotion recognition more accurate and nuanced compared to traditional approaches.

The researchers found that their model was able to detect emotions in the news footage with a high degree of accuracy, often matching or even exceeding the judgments of the human raters. This suggests that facial expression analysis could be a powerful tool for gaining insights into how people genuinely feel about important real-world events, with potential applications in fields like psychology, media analysis, and affective computing.

Technical Explanation

The researchers developed a graph neural network (GNN) model to perform facial landmark detection and emotion recognition on real news footage. The GNN architecture allowed the model to learn the relationships between different facial features and how they contribute to emotional expressions, rather than treating them independently.

The dataset consisted of news videos from various sources, with the researchers manually annotating the facial landmarks and emotions displayed by the subjects. They then had human raters watch the same videos and provide their own judgments of the emotions, which served as a benchmark for evaluating the model's performance.

The GNN model was trained to predict the locations of 68 facial landmarks, as well as the intensity of 7 basic emotions (anger, disgust, fear, happiness, sadness, surprise, and neutral) for each frame of the video. The model's outputs were then compared to the human raters' assessments to measure its accuracy.

The results showed that the GNN model was able to achieve state-of-the-art performance on the facial landmark detection task, and closely matched or even outperformed the human raters on the emotion recognition task. This suggests that the model was able to capture the nuanced, authentic emotional responses of the subjects in a way that traditional approaches may have struggled with.

Critical Analysis

One potential limitation of the study is the relatively small size and diversity of the news video dataset. While the researchers made efforts to include a range of news sources and topics, the dataset may not be representative of the full spectrum of real-world emotional expressions. Expanding the dataset, both in terms of video content and demographic diversity of the subjects, could help to further validate the model's performance.

Additionally, the study focused solely on facial expressions as a proxy for emotional state, which may not capture the full range of emotional cues that humans use to interpret others' feelings. Incorporating other modalities, such as tone of voice, body language, and contextual information, could lead to a more comprehensive and robust emotion recognition system.

Despite these limitations, the researchers' use of a graph neural network architecture represents a promising advancement in the field of facial expression analysis. By modeling the interdependencies between facial features, the model was able to achieve impressive results in detecting authentic emotional responses, with potential applications in areas like media analysis, psychology, and affective computing.

Conclusion

This study demonstrates the potential of using advanced AI models, specifically graph neural networks, to map the authentic emotional responses of people to real-world news events. By benchmarking the model's performance against human raters, the researchers have shown that facial expression analysis can provide valuable insights into how people genuinely feel about important current affairs.

The findings of this research could have implications for a wide range of fields, from media analysis and psychology to affective computing and human-centered design. As the use of AI in these domains continues to grow, tools like the one developed in this study could become increasingly valuable for understanding and interpreting human emotions and behaviors in real-world contexts.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

Jiadong Liang, Feng Lu

Vivid talking face generation holds immense potential applications across diverse multimedia domains, such as film and game production. While existing methods accurately synchronize lip movements with input audio, they typically ignore crucial alignments between emotion and facial cues, which include expression, gaze, and head pose. These alignments are indispensable for synthesizing realistic videos. To address these issues, we propose a two-stage audio-driven talking face generation framework that employs 3D facial landmarks as intermediate variables. This framework achieves collaborative alignment of expression, gaze, and pose with emotions through self-supervised learning. Specifically, we decompose this task into two key steps, namely speech-to-landmarks synthesis and landmarks-to-face generation. The first step focuses on simultaneously synthesizing emotionally aligned facial cues, including normalized landmarks that represent expressions, gaze, and head pose. These cues are subsequently reassembled into relocated facial landmarks. In the second step, these relocated landmarks are mapped to latent key points using self-supervised learning and then input into a pretrained model to create high-quality face images. Extensive experiments on the MEAD dataset demonstrate that our model significantly advances the state-of-the-art performance in both visual quality and emotional alignment.

6/13/2024

cs.CV

👁️

Music Recommendation Based on Facial Emotion Recognition

Rajesh B, Keerthana V, Narayana Darapaneni, Anwesh Reddy P

Introduction: Music provides an incredible avenue for individuals to express their thoughts and emotions, while also serving as a delightful mode of entertainment for enthusiasts and music lovers. Objectives: This paper presents a comprehensive approach to enhancing the user experience through the integration of emotion recognition, music recommendation, and explainable AI using GRAD-CAM. Methods: The proposed methodology utilizes a ResNet50 model trained on the Facial Expression Recognition (FER) dataset, consisting of real images of individuals expressing various emotions. Results: The system achieves an accuracy of 82% in emotion classification. By leveraging GRAD-CAM, the model provides explanations for its predictions, allowing users to understand the reasoning behind the system's recommendations. The model is trained on both FER and real user datasets, which include labelled facial expressions, and real images of individuals expressing various emotions. The training process involves pre-processing the input images, extracting features through convolutional layers, reasoning with dense layers, and generating emotion predictions through the output layer Conclusion: The proposed methodology, leveraging the Resnet50 model with ROI-based analysis and explainable AI techniques, offers a robust and interpretable solution for facial emotion detection paper.

4/9/2024

cs.CV cs.IR

DogFLW: Dog Facial Landmarks in the Wild Dataset

George Martvel, Greta Abele, Annika Bremhorst, Chiara Canori, Nareed Farhat, Giulia Pedretti, Ilan Shimshoni, Anna Zamansky

Affective computing for animals is a rapidly expanding research area that is going deeper than automated movement tracking to address animal internal states, like pain and emotions. Facial expressions can serve to communicate information about these states in mammals. However, unlike human-related studies, there is a significant shortage of datasets that would enable the automated analysis of animal facial expressions. Inspired by the recently introduced Cat Facial Landmarks in the Wild dataset, presenting cat faces annotated with 48 facial anatomy-based landmarks, in this paper, we develop an analogous dataset containing 3,274 annotated images of dogs. Our dataset is based on a scheme of 46 facial anatomy-based landmarks. The DogFLW dataset is available from the corresponding author upon a reasonable request.

5/21/2024

cs.CV

GANmut: Generating and Modifying Facial Expressions

Maria Surani

In the realm of emotion synthesis, the ability to create authentic and nuanced facial expressions continues to gain importance. The GANmut study discusses a recently introduced advanced GAN framework that, instead of relying on predefined labels, learns a dynamic and interpretable emotion space. This methodology maps each discrete emotion as vectors starting from a neutral state, their magnitude reflecting the emotion's intensity. The current project aims to extend the study of this framework by benchmarking across various datasets, image resolutions, and facial detection methodologies. This will involve conducting a series of experiments using two emotional datasets: Aff-Wild2 and AffNet. Aff-Wild2 contains videos captured in uncontrolled environments, which include diverse camera angles, head positions, and lighting conditions, providing a real-world challenge. AffNet offers images with labelled emotions, improving the diversity of emotional expressions available for training. The first two experiments will focus on training GANmut using the Aff-Wild2 dataset, processed with either RetinaFace or MTCNN, both of which are high-performance deep learning face detectors. This setup will help determine how well GANmut can learn to synthesise emotions under challenging conditions and assess the comparative effectiveness of these face detection technologies. The subsequent two experiments will merge the Aff-Wild2 and AffNet datasets, combining the real world variability of Aff-Wild2 with the diverse emotional labels of AffNet. The same face detectors, RetinaFace and MTCNN, will be employed to evaluate whether the enhanced diversity of the combined datasets improves GANmut's performance and to compare the impact of each face detection method in this hybrid setup.

6/18/2024

cs.CV