Synthetic vs Human Emotional Faces: What Changes in Humans' Decoding Accuracy

2404.10435

Published 4/17/2024 by Terry Amorese, Marialucia Cuciniello, Alessandro Vinciarelli, Gennaro Cordasco, Anna Esposito

🎯

Abstract

Considered the increasing use of assistive technologies in the shape of virtual agents, it is necessary to investigate those factors which characterize and affect the interaction between the user and the agent, among these emerges the way in which people interpret and decode synthetic emotions, i.e., emotional expressions conveyed by virtual agents. For these reasons, an article is proposed, which involved 278 participants split in differently aged groups (young, middle-aged, and elders). Within each age group, some participants were administered a naturalistic decoding task, a recognition task of human emotional faces, while others were administered a synthetic decoding task, namely emotional expressions conveyed by virtual agents. Participants were required to label pictures of female and male humans or virtual agents of different ages (young, middle-aged, and old) displaying static expressions of disgust, anger, sadness, fear, happiness, surprise, and neutrality. Results showed that young participants showed better recognition performances (compared to older groups) of anger, sadness, and neutrality, while female participants showed better recognition performances(compared to males) ofsadness, fear, and neutrality; sadness and fear were better recognized when conveyed by real human faces, while happiness, surprise, and neutrality were better recognized when represented by virtual agents. Young faces were better decoded when expressing anger and surprise, middle-aged faces were better decoded when expressing sadness, fear, and happiness , while old faces were better decoded in the case of disgust; on average, female faces where better decoded compared to male ones.

Create account to get full access

Overview

This study investigates how people interpret and recognize emotional expressions conveyed by virtual agents, also known as synthetic emotions.
The study involved 278 participants of different age groups (young, middle-aged, and elderly) who completed tasks to decode emotional expressions in both human and virtual agent faces.
The results showed differences in how the various age and gender groups performed on the emotion recognition tasks, as well as differences in how certain emotions were better recognized in human versus virtual agent faces.

Plain English Explanation

The paper examines how people interpret and understand the emotional expressions displayed by virtual assistants, such as AI chatbots or animated characters. As virtual agents become more common, it's important to understand how humans perceive and respond to the digital emotions they express.

The researchers recruited 278 participants of different ages - young, middle-aged, and elderly - and had them complete two tasks. In one task, they had to identify the emotions shown on the faces of real human models. In the other task, they had to identify the emotions shown on the faces of virtual agents.

The results showed some interesting patterns. Younger participants were better at recognizing certain emotions like anger, sadness, and neutrality compared to older participants. Women were better at recognizing emotions like sadness, fear, and neutrality compared to men. Overall, people were better at recognizing sad and fearful expressions on real human faces, but happy, surprised, and neutral expressions on virtual agent faces.

The researchers also found that people decoded emotions differently based on the age and gender of the face they were looking at. For example, young faces showing anger or surprise were better recognized, while middle-aged faces showing sadness, fear, and happiness were better recognized. Female faces were generally better recognized than male faces.

These findings suggest that the way people interpret emotional expressions can be influenced by both the nature of the face (real or virtual) as well as the demographics of the person displaying the emotion. This has important implications for the design of emotional AI systems and how they interact with users of different backgrounds.

Technical Explanation

The study involved 278 participants divided into young, middle-aged, and elderly age groups. Within each age group, some participants completed a naturalistic decoding task where they identified emotions in human faces, while others completed a synthetic decoding task where they identified emotions in virtual agent faces.

In both tasks, participants were shown static images of female and male faces of different ages (young, middle-aged, old) displaying expressions of disgust, anger, sadness, fear, happiness, surprise, and neutrality. Participants had to label each facial expression.

The results showed that younger participants had better recognition of anger, sadness, and neutrality compared to older groups. Female participants had better recognition of sadness, fear, and neutrality compared to males. Sadness and fear were better recognized in human faces, while happiness, surprise, and neutrality were better recognized in virtual agent faces.

The researchers also found differences in how well emotions were decoded based on the age and gender of the face. Young faces were better decoded for anger and surprise, middle-aged faces were better decoded for sadness, fear, and happiness, and old faces were better decoded for disgust. On average, female faces were better decoded than male faces.

These findings suggest that the interpretation of emotional expressions, whether from humans or virtual agents, can be influenced by the perceiver's own demographics as well as the demographics of the face displaying the emotion. This has implications for the design of emotional AI systems that need to account for these types of biases.

Critical Analysis

The paper provides valuable insights into how people perceive and interpret synthetic emotional expressions, but it also has some limitations. The study only used static facial expressions, which may not fully capture the dynamic nature of emotional displays. Additionally, the sample size, while respectable, may not be large or diverse enough to generalize the findings to broader populations.

Another potential issue is the use of simplified categorical emotions (e.g., disgust, anger) rather than a more nuanced, dimensional approach to emotion. Emotions can be complex and multifaceted, and a binary classification system may oversimplify the phenomenon.

It would also be interesting to see if the findings hold true for different types of virtual agents, such as those with more anthropomorphic or cartoon-like features. The study focused on relatively realistic virtual agents, but the perception of emotions may vary depending on the agent's visual design.

Despite these limitations, the research makes an important contribution to our understanding of how humans interact with and perceive synthetic emotional expressions. The findings have implications for the design of more intuitive and effective AI-powered virtual assistants that can better adapt to the needs and preferences of diverse user groups.

Conclusion

This study provides valuable insights into how people interpret and recognize emotional expressions conveyed by virtual agents. The researchers found that age, gender, and the nature of the emotional display (human vs. virtual agent) all influenced people's ability to decode the expressed emotions.

These findings have important implications for the design of virtual agents and the development of emotional AI systems that can effectively communicate with and respond to users' emotional needs. By understanding the factors that shape our perception of synthetic emotions, we can create more natural and engaging virtual interactions that better meet the needs of diverse users.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔎

As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli

Di Cooke, Abigail Edwards, Sophia Barkoff, Kathryn Kelly

As synthetic media becomes progressively more realistic and barriers to using it continue to lower, the technology has been increasingly utilized for malicious purposes, from financial fraud to nonconsensual pornography. Today, the principal defense against being misled by synthetic media relies on the ability of the human observer to visually and auditorily discern between real and fake. However, it remains unclear just how vulnerable people actually are to deceptive synthetic media in the course of their day to day lives. We conducted a perceptual study with 1276 participants to assess how accurate people were at distinguishing synthetic images, audio only, video only, and audiovisual stimuli from authentic. To reflect the circumstances under which people would likely encounter synthetic media in the wild, testing conditions and stimuli emulated a typical online platform, while all synthetic media used in the survey was sourced from publicly accessible generative AI technology. We find that overall, participants struggled to meaningfully discern between synthetic and authentic content. We also find that detection performance worsens when the stimuli contains synthetic content as compared to authentic content, images featuring human faces as compared to non face objects, a single modality as compared to multimodal stimuli, mixed authenticity as compared to being fully synthetic for audiovisual stimuli, and features foreign languages as compared to languages the observer is fluent in. Finally, we also find that prior knowledge of synthetic media does not meaningfully impact their detection performance. Collectively, these results indicate that people are highly susceptible to being tricked by synthetic media in their daily lives and that human perceptual detection capabilities can no longer be relied upon as an effective counterdefense.

4/5/2024

cs.HC cs.AI cs.SD eess.AS

🤖

Improved Emotional Alignment of AI and Humans: Human Ratings of Emotions Expressed by Stable Diffusion v1, DALL-E 2, and DALL-E 3

James Derek Lomas, Willem van der Maden, Sohhom Bandyopadhyay, Giovanni Lion, Nirmal Patel, Gyanesh Jain, Yanna Litowsky, Haian Xue, Pieter Desmet

Generative AI systems are increasingly capable of expressing emotions via text and imagery. Effective emotional expression will likely play a major role in the efficacy of AI systems -- particularly those designed to support human mental health and wellbeing. This motivates our present research to better understand the alignment of AI expressed emotions with the human perception of emotions. When AI tries to express a particular emotion, how might we assess whether they are successful? To answer this question, we designed a survey to measure the alignment between emotions expressed by generative AI and human perceptions. Three generative image models (DALL-E 2, DALL-E 3 and Stable Diffusion v1) were used to generate 240 examples of images, each of which was based on a prompt designed to express five positive and five negative emotions across both humans and robots. 24 participants recruited from the Prolific website rated the alignment of AI-generated emotional expressions with a text prompt used to generate the emotion (i.e., A robot expressing the emotion amusement). The results of our evaluation suggest that generative AI models are indeed capable of producing emotional expressions that are well-aligned with a range of human emotions; however, we show that the alignment significantly depends upon the AI model used and the emotion itself. We analyze variations in the performance of these systems to identify gaps for future improvement. We conclude with a discussion of the implications for future AI systems designed to support mental health and wellbeing.

5/30/2024

cs.AI

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang

The emergence of contemporary deepfakes has attracted significant attention in machine learning research, as artificial intelligence (AI) generated synthetic media increases the incidence of misinterpretation and is difficult to distinguish from genuine content. Currently, machine learning techniques have been extensively studied for automatically detecting deepfakes. However, human perception has been less explored. Malicious deepfakes could ultimately cause public and social problems. Can we humans correctly perceive the authenticity of the content of the videos we watch? The answer is obviously uncertain; therefore, this paper aims to evaluate the human ability to discern deepfake videos through a subjective study. We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models. To this end, we used gamification concepts to provide 110 participants (55 native English speakers and 55 non-native English speakers) with a webbased platform where they could access a series of 40 videos (20 real and 20 fake) to determine their authenticity. Each participant performed the experiment twice with the same 40 videos in different random orders. The videos are manually selected from the FakeAVCeleb dataset. We found that all AI models performed better than humans when evaluated on the same 40 videos. The study also reveals that while deception is not impossible, humans tend to overestimate their detection capabilities. Our experimental results may help benchmark human versus machine performance, advance forensics analysis, and enable adaptive countermeasures.

5/8/2024

cs.CV cs.AI cs.CY cs.LG cs.MM

🏋️

Narrative Review of Support for Emotional Expressions in Virtual Reality: Psychophysiology of speech-to-text interfaces

Sunday David Ubur, Denis Gracanin

This narrative review on emotional expression in Speech-to-Text (STT) interfaces with Virtual Reality (VR) aims to identify advancements, limitations, and research gaps in incorporating emotional expression into transcribed text generated by STT systems. Using a rigorous search strategy, relevant articles published between 2020 and 2024 are extracted and categorized into themes such as communication enhancement technologies, innovations in captioning, emotion recognition in AR and VR, and empathic machines. The findings reveal the evolution of tools and techniques to meet the needs of individuals with hearing impairments, showcasing innovations in live transcription, closed captioning, AR, VR, and emotion recognition technologies. Despite improvements in accessibility, the absence of emotional nuance in transcribed text remains a significant communication challenge. The study underscores the urgency for innovations in STT technology to capture emotional expressions. The research discusses integrating emotional expression into text through strategies like animated text captions, emojilization tools, and models associating emotions with animation properties. Extending these efforts into AR and VR environments opens new possibilities for immersive and emotionally resonant experiences, especially in educational contexts. The study also explores empathic applications in healthcare, education, and human-robot interactions, highlighting the potential for personalized and effective interactions. The multidisciplinary nature of the literature underscores the potential for collaborative and interdisciplinary research.

5/24/2024

cs.HC