GPT-4 Emulates Average-Human Emotional Cognition from a Third-Person Perspective

Read original: arXiv:2408.13718 - Published 8/27/2024 by Ala N. Tak, Jonathan Gratch

GPT-4 Emulates Average-Human Emotional Cognition from a Third-Person Perspective

Overview

Large language models (LLMs) like GPT-4 are being explored for their ability to recognize and simulate human emotions.
This paper examines how well GPT-4 can emulate average-human emotional cognition from a third-person perspective.
The researchers conducted two studies to assess GPT-4's emotional understanding and expression.

Plain English Explanation

The researchers wanted to see how well the GPT-4 language model could recognize and express human emotions. They were curious if GPT-4 could understand emotions from the perspective of an outside observer, rather than just expressing its own feelings.

To test this, they had GPT-4 read about emotional situations and then describe how a typical person might feel and react. They also had GPT-4 try to write its own stories with emotional content.

The researchers found that GPT-4 was generally able to identify common human emotional responses and describe them from an outside perspective. However, its emotional expressions in its own stories were somewhat limited compared to how a human would convey emotions.

Overall, the results suggest that large language models like GPT-4 are making progress in understanding and simulating human emotional cognition, but they still have room for improvement to fully capture the nuances of human emotional experience.

Technical Explanation

The researchers conducted two studies to assess GPT-4's ability to emulate average-human emotional cognition from a third-person perspective.

In Study 1, participants were presented with emotional scenarios and asked to describe how a typical person would feel and react. These responses were then used to create a benchmark dataset of average-human emotional cognition. GPT-4 was then prompted to read the same scenarios and provide its own descriptions of the typical emotional responses.

The researchers found that GPT-4 was generally able to identify common human emotional reactions and describe them from an outside perspective. Its responses aligned reasonably well with the human-provided benchmark data, suggesting GPT-4 has developed some understanding of average-human emotional cognition.

In Study 2, the researchers had GPT-4 generate its own short stories with emotional content. They then analyzed the model's ability to convey emotions through its narrative and descriptive writing.

The results indicated that while GPT-4 could express basic emotions such as happiness, sadness, and anger, its emotional expression was somewhat limited compared to how a human would convey emotions in their own storytelling.

Overall, the findings suggest that large language models like GPT-4 are making progress in understanding and simulating human emotional cognition from a third-person perspective. However, they still have room for improvement to fully capture the nuances and complexities of human emotional experience.

Critical Analysis

The researchers acknowledge several limitations and caveats to their work. First, the benchmark dataset of average-human emotional cognition used in Study 1 was relatively small and may not fully capture the diversity of human emotional responses. Additionally, the study design relied on prompting GPT-4 to describe emotions, which may not fully reflect the model's innate emotional understanding.

In Study 2, the researchers note that GPT-4's emotional expression in its own stories, while present, was somewhat limited compared to human-written narratives. This suggests that current large language models still struggle to fully emulate the depth and complexity of human emotional experience when tasked with generating emotional content themselves.

Future research could explore alternative approaches to assessing LLMs' emotional cognition, such as evaluating their responses to more diverse and naturalistic emotional scenarios. Additionally, exploring ways to further enhance LLMs' emotional understanding and expression could lead to more advanced emotional AI systems.

Conclusion

This study provides insights into how well GPT-4 can emulate average-human emotional cognition from a third-person perspective. The results suggest that while GPT-4 has developed a basic understanding of common human emotional responses, its ability to convey emotions through its own storytelling is still limited compared to human-generated emotional expression.

This research highlights the ongoing challenges in developing AI systems that can fully capture the nuances and complexities of human emotional experience. Continued advancements in emotional AI could have significant implications for fields such as affective computing, human-computer interaction, and mental health support.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

GPT-4 Emulates Average-Human Emotional Cognition from a Third-Person Perspective

Ala N. Tak, Jonathan Gratch

This paper extends recent investigations on the emotional reasoning abilities of Large Language Models (LLMs). Current research on LLMs has not directly evaluated the distinction between how LLMs predict the self-attribution of emotions and the perception of others' emotions. We first look at carefully crafted emotion-evoking stimuli, originally designed to find patterns of brain neural activity representing fine-grained inferred emotional attributions of others. We show that GPT-4 is especially accurate in reasoning about such stimuli. This suggests LLMs agree with humans' attributions of others' emotions in stereotypical scenarios remarkably more than self-attributions of emotions in idiosyncratic situations. To further explore this, our second study utilizes a dataset containing annotations from both the author and a third-person perspective. We find that GPT-4's interpretations align more closely with human judgments about the emotions of others than with self-assessments. Notably, conventional computational models of emotion primarily rely on self-reported ground truth as the gold standard. However, an average observer's standpoint, which LLMs appear to have adopted, might be more relevant for many downstream applications, at least in the absence of individual information and adequate safety considerations.

8/27/2024

From Text to Emotion: Unveiling the Emotion Annotation Capabilities of LLMs

Minxue Niu (University of Michigan), Mimansa Jaiswal (Independent Researcher), Emily Mower Provost (University of Michigan)

Training emotion recognition models has relied heavily on human annotated data, which present diversity, quality, and cost challenges. In this paper, we explore the potential of Large Language Models (LLMs), specifically GPT4, in automating or assisting emotion annotation. We compare GPT4 with supervised models and or humans in three aspects: agreement with human annotations, alignment with human perception, and impact on model training. We find that common metrics that use aggregated human annotations as ground truth can underestimate the performance, of GPT-4 and our human evaluation experiment reveals a consistent preference for GPT-4 annotations over humans across multiple datasets and evaluators. Further, we investigate the impact of using GPT-4 as an annotation filtering process to improve model training. Together, our findings highlight the great potential of LLMs in emotion annotation tasks and underscore the need for refined evaluation methodologies.

9/2/2024

The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games

Mikhail Mozikov, Nikita Severin, Valeria Bodishtianu, Maria Glushanina, Mikhail Baklashkin, Andrey V. Savchenko, Ilya Makarov

Behavior study experiments are an important part of society modeling and understanding human interactions. In practice, many behavioral experiments encounter challenges related to internal and external validity, reproducibility, and social bias due to the complexity of social interactions and cooperation in human user studies. Recent advances in Large Language Models (LLMs) have provided researchers with a new promising tool for the simulation of human behavior. However, existing LLM-based simulations operate under the unproven hypothesis that LLM agents behave similarly to humans as well as ignore a crucial factor in human decision-making: emotions. In this paper, we introduce a novel methodology and the framework to study both, the decision-making of LLMs and their alignment with human behavior under emotional states. Experiments with GPT-3.5 and GPT-4 on four games from two different classes of behavioral game theory showed that emotions profoundly impact the performance of LLMs, leading to the development of more optimal strategies. While there is a strong alignment between the behavioral responses of GPT-3.5 and human participants, particularly evident in bargaining games, GPT-4 exhibits consistent behavior, ignoring induced emotions for rationality decisions. Surprisingly, emotional prompting, particularly with `anger' emotion, can disrupt the superhuman alignment of GPT-4, resembling human emotional responses.

6/6/2024

GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing

Hao Lu, Xuesong Niu, Jiyao Wang, Yin Wang, Qingyong Hu, Jiaqi Tang, Yuting Zhang, Kaishen Yuan, Bin Huang, Zitong Yu, Dengbo He, Shuiguang Deng, Hao Chen, Yingcong Chen, Shiguang Shan

Multimodal large language models (MLLMs) are designed to process and integrate information from multiple sources, such as text, speech, images, and videos. Despite its success in language understanding, it is critical to evaluate the performance of downstream tasks for better human-centric applications. This paper assesses the application of MLLMs with 5 crucial abilities for affective computing, spanning from visual affective tasks and reasoning tasks. The results show that gpt has high accuracy in facial action unit recognition and micro-expression detection while its general facial expression recognition performance is not accurate. We also highlight the challenges of achieving fine-grained micro-expression recognition and the potential for further study and demonstrate the versatility and potential of gpt for handling advanced tasks in emotion recognition and related fields by integrating with task-related agents for more complex tasks, such as heart rate estimation through signal processing. In conclusion, this paper provides valuable insights into the potential applications and challenges of MLLMs in human-centric computing. Our interesting examples are at https://github.com/EnVision-Research/GPT4Affectivity.

4/11/2024