FindingEmo: An Image Dataset for Emotion Recognition in the Wild

Read original: arXiv:2402.01355 - Published 6/6/2024 by Laurent Mertens, Elahe' Yargholi, Hans Op de Beeck, Jan Van den Stock, Joost Vennekens

🖼️

Overview

Introduces a new image dataset called FindingEmo, designed for emotion recognition research
Contains annotations for 25,000 images depicting complex social scenes with multiple people
Annotations include Valence, Arousal, and Emotion label, gathered using the Prolific platform
Provides the list of image URLs and associated source code alongside the dataset

Plain English Explanation

The researchers have created a new dataset called FindingEmo, which consists of 25,000 images that are annotated for emotions. Unlike previous datasets that focused on individual faces or single people, this dataset contains images of more complex social scenes with multiple people. The images were annotated for three key dimensions of emotion: Valence (how positive or negative the emotion is), Arousal (how intense the emotion is), and an Emotion label (the specific emotion being expressed). The researchers used the Prolific platform to gather these annotations from people. Along with the dataset, they are also providing the list of image URLs and the source code used in the project.

This dataset is significant because it allows researchers to study emotion recognition in more realistic and complex scenarios, rather than just looking at isolated faces or individuals. By capturing the nuances of how emotions are expressed in social settings, it could lead to more effective emotion recognition systems that are better equipped to handle the complexities of real-world interactions. The dataset could also be used to train language models for fine-grained emotion detection or to map authentic emotional expressions in a more naturalistic way.

Technical Explanation

The FindingEmo dataset contains 25,000 images that were carefully selected and annotated to capture complex social scenes with multiple people. This is in contrast to many existing emotion recognition datasets that focus on individual faces or single-person images. The researchers argue that real-world emotional expressions often occur in the context of social interactions, and thus, a dataset that reflects this complexity is necessary to advance the field of emotion recognition.

Each image in the FindingEmo dataset has been annotated for three key dimensions of emotion: Valence (the positivity or negativity of the emotion), Arousal (the intensity of the emotion), and an Emotion label (the specific emotion being expressed). These annotations were gathered using the Prolific platform, which allows researchers to collect high-quality data from a diverse pool of participants.

In addition to the annotated images, the researchers are also providing the list of URLs pointing to the original images, as well as the source code used in the data collection and annotation process. This allows other researchers to access the dataset and build upon the work, potentially leading to further advancements in emotion recognition, especially in the context of complex, social settings.

Critical Analysis

The FindingEmo dataset represents a significant step forward in the field of emotion recognition, as it moves beyond the traditional focus on individual faces or single-person images. By capturing the complexity of social interactions, the dataset has the potential to support the development of more robust and contextual emotion recognition systems.

However, one potential limitation of the dataset is the reliance on Prolific for the annotation process. While Prolific is a reputable platform, there may be cultural or demographic biases in the pool of participants that could influence the annotations. Additionally, the subjective nature of emotion labeling means that there may be some degree of inconsistency or ambiguity in the annotations, which could impact the reliability of the dataset.

Furthermore, the researchers do not provide detailed information on the distribution of emotions or the diversity of the social scenes depicted in the images. This makes it difficult to assess the representativeness of the dataset and whether it covers a wide range of emotional expressions and social contexts.

Despite these potential limitations, the FindingEmo dataset represents an important contribution to the field of emotion recognition. By encouraging researchers to consider the complexities of real-world emotional expression, this dataset could lead to the development of more accurate and nuanced emotion recognition models, which could have far-reaching implications for a variety of applications, from mental health support to human-computer interaction.

Conclusion

The FindingEmo dataset is a significant advancement in the field of emotion recognition research. By providing a dataset of 25,000 images depicting complex social scenes with multiple people, the researchers have created a valuable resource for studying emotion in more realistic and naturalistic settings. The annotations for Valence, Arousal, and Emotion label, gathered using the Prolific platform, add depth and nuance to the dataset.

This dataset has the potential to drive the development of more effective emotion recognition systems that can better handle the complexities of real-world emotional expression, with applications ranging from mental health support to human-computer interaction. By making the dataset and associated source code publicly available, the researchers have opened the door for further research and collaboration in this important area.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

FindingEmo: An Image Dataset for Emotion Recognition in the Wild

Laurent Mertens, Elahe' Yargholi, Hans Op de Beeck, Jan Van den Stock, Joost Vennekens

We introduce FindingEmo, a new image dataset containing annotations for 25k images, specifically tailored to Emotion Recognition. Contrary to existing datasets, it focuses on complex scenes depicting multiple people in various naturalistic, social settings, with images being annotated as a whole, thereby going beyond the traditional focus on faces or single individuals. Annotated dimensions include Valence, Arousal and Emotion label, with annotations gathered using Prolific. Together with the annotations, we release the list of URLs pointing to the original images, as well as all associated source code.

6/6/2024

$EMO-KNOW: A Large Scale Dataset on Emotion and Emotion-cause$

EMO-KNOW: A Large Scale Dataset on Emotion and Emotion-cause

Mia Huong Nguyen, Yasith Samaradivakara, Prasanth Sasikumar, Chitralekha Gupta, Suranga Nanayakkara

Emotion-Cause analysis has attracted the attention of researchers in recent years. However, most existing datasets are limited in size and number of emotion categories. They often focus on extracting parts of the document that contain the emotion cause and fail to provide more abstractive, generalizable root cause. To bridge this gap, we introduce a large-scale dataset of emotion causes, derived from 9.8 million cleaned tweets over 15 years. We describe our curation process, which includes a comprehensive pipeline for data gathering, cleaning, labeling, and validation, ensuring the dataset's reliability and richness. We extract emotion labels and provide abstractive summarization of the events causing emotions. The final dataset comprises over 700,000 tweets with corresponding emotion-cause pairs spanning 48 emotion classes, validated by human evaluators. The novelty of our dataset stems from its broad spectrum of emotion classes and the abstractive emotion cause that facilitates the development of an emotion-cause knowledge graph for nuanced reasoning. Our dataset will enable the design of emotion-aware systems that account for the diverse emotional responses of different people for the same event.

6/19/2024

🔎

EmoFake: An Initial Dataset for Emotion Fake Audio Detection

Yan Zhao, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Xiaohui Zhang, Yongfeng Dong

Many datasets have been designed to further the development of fake audio detection, such as datasets of the ASVspoof and ADD challenges. However, these datasets do not consider a situation that the emotion of the audio has been changed from one to another, while other information (e.g. speaker identity and content) remains the same. Changing the emotion of an audio can lead to semantic changes. Speech with tampered semantics may pose threats to people's lives. Therefore, this paper reports our progress in developing such an emotion fake audio detection dataset involving changing emotion state of the origin audio named EmoFake. The fake audio in EmoFake is generated by open source emotion voice conversion models. Furthermore, we proposed a method named Graph Attention networks using Deep Emotion embedding (GADE) for the detection of emotion fake audio. Some benchmark experiments are conducted on this dataset. The results show that our designed dataset poses a challenge to the fake audio detection model trained with the LA dataset of ASVspoof 2019. The proposed GADE shows good performance in the face of emotion fake audio.

7/25/2024

EmoCAM: Toward Understanding What Drives CNN-based Emotion Recognition

Youssef Doulfoukar, Laurent Mertens, Joost Vennekens

Convolutional Neural Networks are particularly suited for image analysis tasks, such as Image Classification, Object Recognition or Image Segmentation. Like all Artificial Neural Networks, however, they are black box models, and suffer from poor explainability. This work is concerned with the specific downstream task of Emotion Recognition from images, and proposes a framework that combines CAM-based techniques with Object Detection on a corpus level to better understand on which image cues a particular model, in our case EmoNet, relies to assign a specific emotion to an image. We demonstrate that the model mostly focuses on human characteristics, but also explore the pronounced effect of specific image modifications.

7/22/2024