Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Read original: arXiv:2209.14272 - Published 7/9/2024 by Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Muller, Andreas Konig, Bjorn W. Schuller

🔮

Overview

This paper explores the role of humor in human-AI interaction and introduces a novel dataset to study spontaneous humor in real-world settings.
Current humor detection methods have been limited to staged data, which may not accurately reflect how humor is used in natural conversations.
The researchers create the Passau-Spontaneous Football Coach Humor (Passau-SFCH) dataset, which contains 11 hours of recordings annotated for humor and its dimensions.
They experiment with various machine learning models, including Transformers, convolutional neural networks, and expert-designed features, to analyze the performance of different modalities (text, audio, video) for spontaneous humor recognition.
The researchers also investigate multimodal approaches to humor recognition, including decision-level fusion and a novel multimodal architecture.

Plain English Explanation

Humor is an important part of how humans interact with each other and how we experience the world. Being able to understand humor automatically could help make interactions between humans and AI systems more natural and comfortable. However, most current methods for detecting humor have only been tested on staged, or pre-planned, data, which may not accurately represent how humor is used in real-life situations.

To address this, the researchers in this paper created a new dataset called the Passau-Spontaneous Football Coach Humor (Passau-SFCH) dataset. This dataset contains about 11 hours of recordings of spontaneous humor, where the humor was not planned or staged. The researchers annotated the dataset to indicate when humor was present and what kind of humor it was (e.g., positive or negative sentiment, directed at oneself or others).

Using this dataset, the researchers experimented with different machine learning models to see which ones were best at automatically detecting and analyzing spontaneous humor. They tried models that looked at the text, audio, and video of the recordings, as well as models that combined information from multiple sources (called "multimodal" models).

The researchers found that facial expressions were the most promising for automatically detecting the sentiment (positive or negative) of humor, while the text-based features were best for detecting the direction of the humor (whether it was directed at oneself or others). They also developed a new multimodal model that performed the best overall at recognizing spontaneous humor.

Technical Explanation

The researchers conducted a series of experiments to analyze the performance of different modalities (text, audio, video) and multimodal approaches for spontaneous humor recognition using the Passau-SFCH dataset.

For the unimodal experiments, they employed pretrained Transformers, convolutional neural networks, and expert-designed features. The text-based models leveraged linguistic cues, the audio models utilized acoustic features, and the video models analyzed facial expressions and body language.

In the multimodal experiments, the researchers explored different fusion strategies, including decision-level fusion and a novel multimodal Transformer (MulT) approach. The decision-level fusion combined the predictions from the unimodal models, while the MulT architecture learned cross-modal relationships to make the final predictions.

The results suggest that for automatic analysis of humor sentiment, facial expressions are the most promising modality, while humor direction can be best modeled using text-based features. The researchers found that multimodal approaches generally outperformed unimodal models, with their novel multimodal architecture yielding the best overall performance.

Critical Analysis

The researchers make a valuable contribution by introducing the Passau-SFCH dataset, which provides a more naturalistic setting for studying spontaneous humor compared to previous datasets that focused on staged or planned humor. This is a significant step forward, as the ability to recognize humor in real-world interactions is crucial for developing more natural and engaging human-AI interfaces.

However, the dataset is relatively small, with only 11 hours of recordings, and may not capture the full diversity of spontaneous humor that can occur in different contexts. Additionally, the annotations for humor and its dimensions (sentiment and direction) were performed by human raters, which could introduce subjective biases.

The researchers acknowledge these limitations and suggest that future work should focus on expanding the dataset size and exploring more advanced multimodal fusion techniques. It would also be interesting to investigate how the models perform on different types of spontaneous humor, such as irony, sarcasm, or puns, and how the models might generalize to other real-world settings beyond the specific football coaching context.

Conclusion

This paper presents an important step towards enabling AI systems to better understand and respond to spontaneous human humor in natural interactions. The introduction of the Passau-SFCH dataset and the experiments with various machine learning models provide valuable insights into the strengths and limitations of different modalities and fusion strategies for humor recognition.

The findings suggest that a multimodal approach, which combines text, audio, and video features, has the most promise for accurately detecting and analyzing spontaneous humor in real-world settings. This research lays the foundation for further advancements in humor-aware AI systems that can engage in more natural, enjoyable, and meaningful interactions with humans.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔮

Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Muller, Andreas Konig, Bjorn W. Schuller

Humor is a substantial element of human social behavior, affect, and cognition. Its automatic understanding can facilitate a more naturalistic human-AI interaction. Current methods of humor detection have been exclusively based on staged data, making them inadequate for real-world applications. We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor (Passau-SFCH) dataset, comprising about 11 hours of recordings. The Passau-SFCH dataset is annotated for the presence of humor and its dimensions (sentiment and direction) as proposed in Martin's Humor Style Questionnaire. We conduct a series of experiments employing pretrained Transformers, convolutional neural networks, and expert-designed features. The performance of each modality (text, audio, video) for spontaneous humor recognition is analyzed and their complementarity is investigated. Our findings suggest that for the automatic analysis of humor and its sentiment, facial expressions are most promising, while humor direction can be best modeled via text-based features. Further, we experiment with different multimodal approaches to humor recognition, including decision-level fusion and MulT, a multimodal Transformer approach. In this context, we propose a novel multimodal architecture that yields the best overall results. Finally, we make our code publicly available at https://www.github.com/lc0197/passau-sfch. The Passau-SFCH dataset is available upon request.

7/9/2024

👁️

The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition

Shahin Amiriparian, Lukas Christ, Alexander Kathan, Maurice Gerczuk, Niklas Muller, Steffen Klug, Lukas Stappen, Andreas Konig, Erik Cambria, Bjorn Schuller, Simone Eulitz

The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems: In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of individuals such as assertiveness, dominance, likability, and sincerity based on the provided audio-visual data. The Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor) dataset expands upon the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset, focusing on the detection of spontaneous humor in a cross-lingual and cross-cultural setting. The main objective of MuSe 2024 is to unite a broad audience from various research domains, including multimodal sentiment analysis, audio-visual affective computing, continuous signal processing, and natural language processing. By fostering collaboration and exchange among experts in these fields, the MuSe 2024 endeavors to advance the understanding and application of sentiment analysis and affective computing across multiple modalities. This baseline paper provides details on each sub-challenge and its corresponding dataset, extracted features from each data modality, and discusses challenge baselines. For our baseline system, we make use of a range of Transformers and expert-designed features and train Gated Recurrent Unit (GRU)-Recurrent Neural Network (RNN) models on them, resulting in a competitive baseline system. On the unseen test datasets of the respective sub-challenges, it achieves a mean Pearson's Correlation Coefficient ($rho$) of 0.3573 for MuSe-Perception and an Area Under the Curve (AUC) value of 0.8682 for MuSe-Humor.

6/13/2024

Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models

Zachary Horvitz, Jingru Chen, Rahul Aditya, Harshvardhan Srivastava, Robert West, Zhou Yu, Kathleen McKeown

Humor is a fundamental facet of human cognition and interaction. Yet, despite recent advances in natural language processing, humor detection remains a challenging task that is complicated by the scarcity of datasets that pair humorous texts with similar non-humorous counterparts. In our work, we investigate whether large language models (LLMs), can generate synthetic data for humor detection via editing texts. We benchmark LLMs on an existing human dataset and show that current LLMs display an impressive ability to 'unfun' jokes, as judged by humans and as measured on the downstream task of humor detection. We extend our approach to a code-mixed English-Hindi humor dataset, where we find that GPT-4's synthetic data is highly rated by bilingual annotators and provides challenging adversarial examples for humor classifiers.

6/24/2024

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

Lee Hyun, Kim Sung-Bin, Seungju Han, Youngjae Yu, Tae-Hyun Oh

Despite the recent advances of the artificial intelligence, building social intelligence remains a challenge. Among social signals, laughter is one of the distinctive expressions that occurs during social interactions between humans. In this work, we tackle a new challenge for machines to understand the rationale behind laughter in video, Video Laugh Reasoning. We introduce this new task to explain why people laugh in a particular video and a dataset for this task. Our proposed dataset, SMILE, comprises video clips and language descriptions of why people laugh. We propose a baseline by leveraging the reasoning capacity of large language models (LLMs) with textual video representation. Experiments show that our baseline can generate plausible explanations for laughter. We further investigate the scalability of our baseline by probing other video understanding tasks and in-the-wild videos. We release our dataset, code, and model checkpoints on https://github.com/postech-ami/SMILE-Dataset.

5/27/2024