Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification

2404.01805

Published 4/3/2024 by Michael Mitsios, Georgios Vamvoukakis, Georgia Maniati, Nikolaos Ellinas, Georgios Dimitriou, Konstantinos Markopoulos, Panos Kakoulidis, Alexandra Vioni, Myrsini Christidou, Junkwang Oh and 6 others

cs.LG

Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification

Abstract

Emotion detection in textual data has received growing interest in recent years, as it is pivotal for developing empathetic human-computer interaction systems. This paper introduces a method for categorizing emotions from text, which acknowledges and differentiates between the diversified similarities and distinctions of various emotions. Initially, we establish a baseline by training a transformer-based model for standard emotion classification, achieving state-of-the-art performance. We argue that not all misclassifications are of the same importance, as there are perceptual similarities among emotional classes. We thus redefine the emotion labeling problem by shifting it from a traditional classification model to an ordinal classification one, where discrete emotions are arranged in a sequential order according to their valence levels. Finally, we propose a method that performs ordinal classification in the two-dimensional emotion space, considering both valence and arousal scales. The results show that our approach not only preserves high accuracy in emotion prediction but also significantly reduces the magnitude of errors in cases of misclassification.

Create account to get full access

Overview

This paper presents a novel approach for predicting the emotions expressed in text using a combined classification of valence (positivity/negativity) and arousal (intensity).
The researchers demonstrate that this combined model outperforms traditional single-dimensional emotion classification approaches.
The work has potential applications in areas like sentiment analysis, content moderation, and personalized recommendations.

Plain English Explanation

The paper focuses on the problem of automatically detecting the emotions expressed in written text. Emotions can be described along two main dimensions: valence, which refers to whether the emotion is positive or negative, and arousal, which refers to the intensity of the emotion.

Traditionally, emotion detection systems have tried to classify text into a single emotion category, such as "happy," "sad," or "angry." However, the researchers argue that this oversimplifies the complexity of human emotion, which often involves a mixture of positive and negative feelings with varying degrees of intensity.

To address this, the researchers developed a new model that jointly predicts both the valence (positivity/negativity) and arousal (intensity) of the emotions expressed in a given piece of text. They found that this combined approach outperformed traditional single-label emotion classification, allowing for a more nuanced and accurate understanding of the emotional content.

The potential applications of this work include improved sentiment analysis for business intelligence, more effective content moderation for online platforms, and personalized recommendations that better match users' emotional preferences. By accounting for the multidimensional nature of human emotion, this research represents an important step forward in the field of affective computing.

Technical Explanation

The paper proposes a novel deep learning architecture for the task of emotion prediction in text. The model consists of a shared encoder followed by two separate output heads, one for predicting valence (a 5-class ordinal classification task) and one for predicting arousal (also a 5-class ordinal classification task).

The shared encoder is based on a pretrained BERT language model, which is fine-tuned on the emotion prediction task. The valence and arousal heads each consist of a fully connected layer followed by a softmax activation to produce probability distributions over the ordinal emotion classes.

To train the model, the researchers used a dataset of text samples annotated with valence and arousal ratings. They experimented with different loss functions, including mean squared error (MSE) and cross-entropy, and found that a combined loss function incorporating both MSE and cross-entropy performed best.

Extensive experiments on benchmark emotion datasets showed that the proposed combined valence-arousal model outperformed traditional single-label emotion classification approaches, as well as other state-of-the-art emotion prediction methods. The researchers attribute this improvement to the model's ability to capture the multidimensional nature of emotion more accurately.

Critical Analysis

The paper provides a compelling case for the benefits of jointly modeling valence and arousal in text emotion prediction. The results demonstrate clear performance improvements over standard approaches, validating the researchers' core hypothesis.

One potential limitation is the reliance on a relatively small dataset, which may raise concerns about the generalizability of the findings. The authors acknowledge this and suggest that future work should explore the model's performance on larger, more diverse datasets.

Additionally, while the paper discusses potential real-world applications, it does not delve into the ethical considerations of emotion detection systems, such as concerns around privacy, bias, and the potential for misuse. As affective computing becomes more prevalent, it will be important for researchers to carefully address these issues.

Overall, the paper presents a well-designed and meaningful contribution to the field of emotion analysis in text. The combined valence-arousal approach offers a more nuanced and accurate representation of human emotion, with promising implications for a variety of applications.

Conclusion

This paper introduces a novel deep learning architecture for predicting the emotional content of text, which jointly models the valence (positivity/negativity) and arousal (intensity) dimensions of emotion. The results demonstrate that this combined approach outperforms traditional single-label emotion classification, providing a more comprehensive understanding of the emotional expression in written language.

The potential applications of this research are wide-ranging, from improved sentiment analysis and content moderation to personalized recommendations that better align with users' emotional preferences. As the field of affective computing continues to evolve, this work represents an important step forward in the accurate and meaningful interpretation of human emotion from text.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning

Lukas Christ, Shahin Amiriparian, Manuel Milling, Ilhan Aslan, Bjorn W. Schuller

Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by introducing continuous valence and arousal labels for an existing dataset of children's stories originally annotated with discrete emotion categories. We collect additional annotations for this data and map the categorical labels to the continuous valence and arousal space. For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach. The best configuration achieves a Concordance Correlation Coefficient (CCC) of $.8221$ for valence and $.7125$ for arousal on the test set, demonstrating the efficacy of our proposed approach. A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story. In addition, we uncover the weaknesses of our approach by investigating examples that prove to be difficult to predict.

6/5/2024

cs.CL cs.AI

🤯

CAGE: Circumplex Affect Guided Expression Inference

Niklas Wagner, Felix Matzler, Samed R. Vossberg, Helen Schneider, Svetlana Pavlitska, J. Marius Zollner

Understanding emotions and expressions is a task of interest across multiple disciplines, especially for improving user experiences. Contrary to the common perception, it has been shown that emotions are not discrete entities but instead exist along a continuum. People understand discrete emotions differently due to a variety of factors, including cultural background, individual experiences, and cognitive biases. Therefore, most approaches to expression understanding, particularly those relying on discrete categories, are inherently biased. In this paper, we present a comparative in-depth analysis of two common datasets (AffectNet and EMOTIC) equipped with the components of the circumplex model of affect. Further, we propose a model for the prediction of facial expressions tailored for lightweight applications. Using a small-scaled MaxViT-based model architecture, we evaluate the impact of discrete expression category labels in training with the continuous valence and arousal labels. We show that considering valence and arousal in addition to discrete category labels helps to significantly improve expression inference. The proposed model outperforms the current state-of-the-art models on AffectNet, establishing it as the best-performing model for inferring valence and arousal achieving a 7% lower RMSE. Training scripts and trained weights to reproduce our results can be found here: https://github.com/wagner-niklas/CAGE_expression_inference.

4/24/2024

cs.CV

VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning

Alexandros Xenos, Niki Maria Foteinopoulou, Ioanna Ntinou, Ioannis Patras, Georgios Tzimiropoulos

Recognising emotions in context involves identifying the apparent emotions of an individual, taking into account contextual cues from the surrounding scene. Previous approaches to this task have involved the design of explicit scene-encoding architectures or the incorporation of external scene-related information, such as captions. However, these methods often utilise limited contextual information or rely on intricate training pipelines. In this work, we leverage the groundbreaking capabilities of Vision-and-Large-Language Models (VLLMs) to enhance in-context emotion classification without introducing complexity to the training process in a two-stage approach. In the first stage, we propose prompting VLLMs to generate descriptions in natural language of the subject's apparent emotion relative to the visual context. In the second stage, the descriptions are used as contextual information and, along with the image input, are used to train a transformer-based architecture that fuses text and visual features before the final classification task. Our experimental results show that the text and image features have complementary information, and our fused architecture significantly outperforms the individual modalities without any complex training methods. We evaluate our approach on three different datasets, namely, EMOTIC, CAER-S, and BoLD, and achieve state-of-the-art or comparable accuracy across all datasets and metrics compared to much more complex approaches. The code will be made publicly available on github: https://github.com/NickyFot/EmoCommonSense.git

4/11/2024

cs.CV cs.HC

Large Language Models on Fine-grained Emotion Detection Dataset with Data Augmentation and Transfer Learning

Kaipeng Wang, Zhi Jing, Yongye Su, Yikun Han

This paper delves into enhancing the classification performance on the GoEmotions dataset, a large, manually annotated dataset for emotion detection in text. The primary goal of this paper is to address the challenges of detecting subtle emotions in text, a complex issue in Natural Language Processing (NLP) with significant practical applications. The findings offer valuable insights into addressing the challenges of emotion detection in text and suggest directions for future research, including the potential for a survey paper that synthesizes methods and performances across various datasets in this domain.

4/10/2024

cs.CL cs.AI