The Whole Is Bigger Than the Sum of Its Parts: Modeling Individual Annotators to Capture Emotional Variability

Read original: arXiv:2408.11956 - Published 8/23/2024 by James Tavernor, Yara El-Tawil, Emily Mower Provost

The Whole Is Bigger Than the Sum of Its Parts: Modeling Individual Annotators to Capture Emotional Variability

Overview

This paper explores how to model the individual differences of human annotators to better capture emotional variability in data.
The researchers propose a framework that models each annotator's unique biases and perspectives, rather than relying on aggregated annotations.
Their approach could lead to more nuanced and representative emotion datasets, with applications in areas like natural language processing and human-AI interaction.

Plain English Explanation

When people annotate data, like labeling the emotions in text, they each bring their own unique perspectives and biases. The whole is bigger than the sum of its parts.

The researchers in this paper wanted to find a way to capture that individual variability, rather than just using the average of everyone's annotations. They developed a framework that models each annotator as their own entity, with their own tendencies and viewpoints.

This approach could lead to emotion datasets that are more nuanced and representative of the true diversity of human experience. It has applications in fields like natural language processing and human-AI interaction, where understanding individual differences is crucial.

Technical Explanation

The researchers propose a novel framework called Individual Annotator Modeling (IAM) that models each annotator as a unique entity. Rather than aggregating annotations, IAM captures the individual biases, perspectives, and tendencies of each annotator.

The key components of IAM include:

An encoder that maps annotations into a latent space
A decoder that generates the final prediction by combining the latent representations of all annotators
A user-specific projection layer that models each annotator's individual characteristics

By incorporating this annotator-level modeling, IAM is able to better account for the inherent subjectivity and variability in emotional annotations. The researchers evaluate IAM on a range of emotion datasets and find that it outperforms standard aggregation approaches.

Critical Analysis

The IAM framework is a promising step towards more representative and nuanced emotion datasets. By explicitly modeling the individual annotators, it can capture the true diversity of human perspectives on emotional experiences.

However, the paper does not extensively explore the potential limitations or biases that could arise from this approach. For example, if the annotator pool is not sufficiently diverse, the resulting models may still fail to generalize to the full range of human emotional expression.

Additionally, the computational complexity of the IAM framework raises questions about its scalability, especially for large-scale annotation efforts. Further research is needed to assess the trade-offs between model complexity, performance, and practical applicability.

Conclusion

This paper presents a novel framework for modeling individual annotators in order to better capture emotional variability in data. By recognizing that the "whole is bigger than the sum of its parts," the IAM approach could lead to more representative and nuanced emotion datasets with applications in natural language processing and human-AI interaction.

While the framework shows promise, further research is needed to address potential limitations and ensure its scalability. Nonetheless, this work represents an important step towards more inclusive and representative data in the field of emotion analysis.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

The Whole Is Bigger Than the Sum of Its Parts: Modeling Individual Annotators to Capture Emotional Variability

James Tavernor, Yara El-Tawil, Emily Mower Provost

Emotion expression and perception are nuanced, complex, and highly subjective processes. When multiple annotators label emotional data, the resulting labels contain high variability. Most speech emotion recognition tasks address this by averaging annotator labels as ground truth. However, this process omits the nuance of emotion and inter-annotator variability, which are important signals to capture. Previous work has attempted to learn distributions to capture emotion variability, but these methods also lose information about the individual annotators. We address these limitations by learning to predict individual annotators and by introducing a novel method to create distributions from continuous model outputs that permit the learning of emotion distributions during model training. We show that this combined approach can result in emotion distributions that are more accurate than those seen in prior work, in both within- and cross-corpus settings.

8/23/2024

You are an expert annotator: Automatic Best-Worst-Scaling Annotations for Emotion Intensity Modeling

Christopher Bagdon, Prathamesh Karmalker, Harsha Gurulingappa, Roman Klinger

Labeling corpora constitutes a bottleneck to create models for new tasks or domains. Large language models mitigate the issue with automatic corpus labeling methods, particularly for categorical annotations. Some NLP tasks such as emotion intensity prediction, however, require text regression, but there is no work on automating annotations for continuous label assignments. Regression is considered more challenging than classification: The fact that humans perform worse when tasked to choose values from a rating scale lead to comparative annotation methods, including best-worst scaling. This raises the question if large language model-based annotation methods show similar patterns, namely that they perform worse on rating scale annotation tasks than on comparative annotation tasks. To study this, we automate emotion intensity predictions and compare direct rating scale predictions, pairwise comparisons and best-worst scaling. We find that the latter shows the highest reliability. A transformer regressor fine-tuned on these data performs nearly on par with a model trained on the original manual annotations.

4/23/2024

Enrolment-based personalisation for improving individual-level fairness in speech emotion recognition

Andreas Triantafyllopoulos, Bjorn Schuller

The expression of emotion is highly individualistic. However, contemporary speech emotion recognition (SER) systems typically rely on population-level models that adopt a `one-size-fits-all' approach for predicting emotion. Moreover, standard evaluation practices measure performance also on the population level, thus failing to characterise how models work across different speakers. In the present contribution, we present a new method for capitalising on individual differences to adapt an SER model to each new speaker using a minimal set of enrolment utterances. In addition, we present novel evaluation schemes for measuring fairness across different speakers. Our findings show that aggregated evaluation metrics may obfuscate fairness issues on the individual-level, which are uncovered by our evaluation, and that our proposed method can improve performance both in aggregated and disaggregated terms.

6/12/2024

Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning

Lukas Christ, Shahin Amiriparian, Manuel Milling, Ilhan Aslan, Bjorn W. Schuller

Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by introducing continuous valence and arousal labels for an existing dataset of children's stories originally annotated with discrete emotion categories. We collect additional annotations for this data and map the categorical labels to the continuous valence and arousal space. For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach. The best configuration achieves a Concordance Correlation Coefficient (CCC) of $.8221$ for valence and $.7125$ for arousal on the test set, demonstrating the efficacy of our proposed approach. A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story. In addition, we uncover the weaknesses of our approach by investigating examples that prove to be difficult to predict.

6/5/2024