Deep Emotion Recognition in Textual Conversations: A Survey

2211.09172

Published 5/24/2024 by Patr'icia Pereira, Helena Moniz, Joao Paulo Carvalho

🤿

Abstract

While Emotion Recognition in Conversations (ERC) has seen a tremendous advancement in the last few years, new applications and implementation scenarios present novel challenges and opportunities. These range from leveraging the conversational context, speaker and emotion dynamics modelling, to interpreting common sense expressions, informal language and sarcasm, addressing challenges of real time ERC, recognizing emotion causes, different taxonomies across datasets, multilingual ERC to interpretability. This survey starts by introducing ERC, elaborating on the challenges and opportunities pertaining to this task. It proceeds with a description of the emotion taxonomies and a variety of ERC benchmark datasets employing such taxonomies. This is followed by descriptions of the most prominent works in ERC with explanations of the Deep Learning architectures employed. Then, it provides advisable ERC practices towards better frameworks, elaborating on methods to deal with subjectivity in annotations and modelling and methods to deal with the typically unbalanced ERC datasets. Finally, it presents systematic review tables comparing several works regarding the methods used and their performance. The survey highlights the advantage of leveraging techniques to address unbalanced data, the exploration of mixed emotions and the benefits of incorporating annotation subjectivity in the learning phase.

Create account to get full access

Overview

The paper discusses the advancements and new challenges in the field of Emotion Recognition in Conversations (ERC).
It covers a range of topics, including leveraging conversational context, modeling speaker and emotion dynamics, interpreting informal language and sarcasm, real-time ERC, recognizing emotion causes, addressing differences in emotion taxonomies across datasets, and achieving interpretability in ERC systems.
The paper introduces ERC, elaborates on the challenges and opportunities, and provides a comprehensive review of emotion taxonomies, benchmark datasets, and prominent deep learning architectures used in ERC.
It also discusses best practices for developing better ERC frameworks, dealing with subjectivity in annotations and data imbalance, and the benefits of incorporating annotation subjectivity in the learning phase.

Plain English Explanation

Emotion Recognition in Conversations (ERC) is a field that has seen significant advancements in recent years. ERC systems aim to understand and analyze the emotions expressed by people during conversations. However, as new applications and implementation scenarios emerge, the field is facing novel challenges and opportunities.

One of the key challenges is leveraging the full context of a conversation, rather than just analyzing individual utterances. This includes understanding the dynamics between speakers and how their emotions evolve over time. Another challenge is interpreting more complex and informal language, such as sarcasm or common-sense expressions, which can be difficult for ERC systems to understand.

Real-time ERC, where the system needs to recognize emotions in conversations as they happen, also presents unique challenges. Researchers are also exploring ways to identify the underlying causes of emotions, which can provide deeper insights. Additionally, the field is grappling with differences in emotion taxonomies (the way emotions are categorized) across various datasets, and the need for multilingual ERC capabilities.

To address these challenges, the paper reviews the most prominent deep learning architectures and techniques used in ERC, as well as best practices for developing more robust and effective ERC frameworks. This includes methods to deal with the subjectivity inherent in emotion annotations and the typically unbalanced nature of ERC datasets.

The paper highlights the advantages of leveraging techniques to address data imbalance, the exploration of mixed emotions (where multiple emotions are expressed simultaneously), and the benefits of incorporating annotation subjectivity into the learning process. By addressing these challenges, researchers aim to create ERC systems that are more accurate, reliable, and insightful.

Technical Explanation

The paper begins by introducing the task of Emotion Recognition in Conversations (ERC) and the various challenges and opportunities present in this field. It delves into the importance of leveraging the full conversational context, including speaker and emotion dynamics, to improve ERC performance.

The paper then provides a detailed overview of the different emotion taxonomies and benchmark datasets used in ERC research. This includes discussing the nuances and differences in how emotions are categorized across various datasets, which can present challenges for developing robust and generalizable ERC systems.

Next, the paper examines the most prominent deep learning architectures and techniques employed in ERC, such as those described in the Transformer-Based Neural Networks for Emotion Recognition in Conversations and Emotion-Anchored Contrastive Learning Framework for Emotion Recognition papers. These architectures leverage advanced neural network models, such as transformers, to capture the complex patterns and dependencies within conversational data.

The paper then provides a systematic review of various ERC methods, comparing their performance and the techniques used to address challenges like data imbalance and subjectivity in emotion annotations. This includes discussing approaches like those presented in the SemEval-2024 Task 3: Multimodal Emotion Cause, IITK at SemEval-2024 Task 10: Who, and Samsung Research China, Beijing at SemEval-2024 papers.

Critical Analysis

The paper provides a comprehensive overview of the current state of Emotion Recognition in Conversations (ERC) research, highlighting both the advancements and the remaining challenges in the field. One potential limitation identified is the reliance on subjective emotion annotations, which can introduce biases and inconsistencies in the training data.

While the paper discusses methods to address data imbalance and incorporate annotation subjectivity, further research may be needed to develop more robust and generalizable techniques. Additionally, the paper does not delve deeply into the interpretability and explainability of the deep learning architectures used in ERC, which could be an important consideration for real-world applications.

Another area for further exploration is the integration of multimodal information (e.g., tone of voice, facial expressions, body language) to enhance the accuracy and reliability of ERC systems, as mentioned in the SemEval-2024 Task 3: Multimodal Emotion Cause paper.

Overall, the paper provides a valuable and comprehensive review of the ERC field, highlighting the key challenges and opportunities, as well as the prominent deep learning techniques and best practices. Researchers and practitioners in this domain may find the insights and recommendations presented in the paper useful for advancing the state-of-the-art in Emotion Recognition in Conversations.

Conclusion

The paper presents a thorough survey of the field of Emotion Recognition in Conversations (ERC), covering the various challenges and opportunities that have emerged as the field has advanced. It delves into the complexities of leveraging conversational context, interpreting informal language and sarcasm, and dealing with the subjective nature of emotion annotations and dataset imbalances.

The paper's comprehensive review of emotion taxonomies, benchmark datasets, and deep learning architectures provides a valuable resource for researchers and practitioners working in this domain. The insights and best practices discussed can help guide the development of more robust and effective ERC systems, which have important applications in areas like customer service, mental health monitoring, and human-computer interaction.

By addressing the key issues identified in the paper, such as the need for techniques to handle data imbalance and annotation subjectivity, the ERC field can continue to evolve and deliver increasingly accurate and insightful emotional recognition capabilities. This, in turn, can lead to more meaningful and empathetic interactions between humans and technology, with far-reaching implications for various industries and applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning

Haoxiang Shi, Xulong Zhang, Ning Cheng, Yong Zhang, Jun Yu, Jing Xiao, Jianzong Wang

The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual information. Previous ERC methods relied on simple connections for cross-modal fusion and ignored the information differences between modalities, resulting in the model being unable to focus on modality-specific emotional information. At the same time, the shared information between modalities was not processed to generate emotions. Information redundancy problem. To overcome these limitations, we propose a cross-modal fusion emotion prediction network based on vector connections. The network mainly includes two stages: the multi-modal feature fusion stage based on connection vectors and the emotion classification stage based on fused features. Furthermore, we design a supervised inter-class contrastive learning module based on emotion labels. Experimental results confirm the effectiveness of the proposed method, demonstrating excellent performance on the IEMOCAP and MELD datasets.

5/29/2024

cs.CL

Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation

Fangxu Yu, Junjie Guo, Zhen Wu, Xinyu Dai

Emotion Recognition in Conversation (ERC) involves detecting the underlying emotion behind each utterance within a conversation. Effectively generating representations for utterances remains a significant challenge in this task. Recent works propose various models to address this issue, but they still struggle with differentiating similar emotions such as excitement and happiness. To alleviate this problem, We propose an Emotion-Anchored Contrastive Learning (EACL) framework that can generate more distinguishable utterance representations for similar emotions. To achieve this, we utilize label encodings as anchors to guide the learning of utterance representations and design an auxiliary loss to ensure the effective separation of anchors for similar emotions. Moreover, an additional adaptation process is proposed to adapt anchors to serve as effective classifiers to improve classification performance. Across extensive experiments, our proposed EACL achieves state-of-the-art emotion recognition performance and exhibits superior performance on similar emotions. Our code is available at https://github.com/Yu-Fangxu/EACL.

4/1/2024

cs.CL cs.SD eess.AS

ITEACH-Net: Inverted Teacher-studEnt seArCH Network for Emotion Recognition in Conversation

Haiyang Sun, Zheng Lian, Chenglong Wang, Kang Chen, Licai Sun, Bin Liu, Jianhua Tao

There remain two critical challenges that hinder the development of ERC. Firstly, there is a lack of exploration into mining deeper insights from the data itself for conversational emotion tasks. Secondly, the systems exhibit vulnerability to random modality feature missing, which is a common occurrence in realistic settings. Focusing on these two key challenges, we propose a novel framework for incomplete multimodal learning in ERC, called Inverted Teacher-studEnt seArCH Network (ITEACH-Net). ITEACH-Net comprises two novel components: the Emotion Context Changing Encoder (ECCE) and the Inverted Teacher-Student (ITS) framework. Specifically, leveraging the tendency for emotional states to exhibit local stability within conversational contexts, ECCE captures these patterns and further perceives their evolution over time. Recognizing the varying challenges of handling incomplete versus complete data, ITS employs a teacher-student framework to decouple the respective computations. Subsequently, through Neural Architecture Search, the student model develops enhanced computational capabilities for handling incomplete data compared to the teacher model. During testing, we design a novel evaluation method, testing the model's performance under different missing rate conditions without altering the model weights. We conduct experiments on three benchmark ERC datasets, and the results demonstrate that our ITEACH-Net outperforms existing methods in incomplete multimodal ERC. We believe ITEACH-Net can inspire relevant research on the intrinsic nature of emotions within conversation scenarios and pave a more robust route for incomplete learning techniques. Codes will be made available.

6/4/2024

cs.MM

Transformer based neural networks for emotion recognition in conversations

Claudiu Creanga, Liviu P. Dinu

This paper outlines the approach of the ISDS-NLP team in the SemEval 2024 Task 10: Emotion Discovery and Reasoning its Flip in Conversation (EDiReF). For Subtask 1 we obtained a weighted F1 score of 0.43 and placed 12 in the leaderboard. We investigate two distinct approaches: Masked Language Modeling (MLM) and Causal Language Modeling (CLM). For MLM, we employ pre-trained BERT-like models in a multilingual setting, fine-tuning them with a classifier to predict emotions. Experiments with varying input lengths, classifier architectures, and fine-tuning strategies demonstrate the effectiveness of this approach. Additionally, we utilize Mistral 7B Instruct V0.2, a state-of-the-art model, applying zero-shot and few-shot prompting techniques. Our findings indicate that while Mistral shows promise, MLMs currently outperform them in sentence-level emotion classification.

5/21/2024

cs.CL