Is AI fun? HumorDB: a curated dataset and benchmark to investigate graphical humor

Read original: arXiv:2406.13564 - Published 6/21/2024 by Veedant Jain, Felipe dos Santos Alves Feitosa, Gabriel Kreiman

Is AI fun? HumorDB: a curated dataset and benchmark to investigate graphical humor

Overview

• This paper introduces HumorDB, a new dataset and benchmark for investigating graphical humor.

• The researchers curated a large collection of images and comic strips, and had them rated for humor by crowdsourced annotators.

• The goal is to enable the development of AI systems that can understand and generate humorous content, which has applications in areas like entertainment, marketing, and education.

Plain English Explanation

The paper describes a new dataset called HumorDB that was created to help AI systems learn about humor. The researchers gathered a big collection of funny images and comic strips, and then had lots of people rate how humorous they found each one. This data can be used to train AI models to recognize what makes something funny, and potentially even generate their own humorous content.

Understanding humor is a challenging task for AI, as it often relies on complex social and cultural cues. But being able to automatically detect and generate humor could be useful in all kinds of applications, like making funnier ads, understanding laughter in videos, or adding more levity to educational materials. The HumorDB dataset provides a valuable resource to advance research in this area.

Technical Explanation

The researchers collected a dataset of over 100,000 images and comic strips, sourcing content from popular online platforms like Reddit, Imgur, and Webtoon. They then recruited a large crowd of annotators through platforms like Amazon Mechanical Turk to rate each item for how funny they found it on a scale from 1 to 7.

To ensure high-quality annotations, the researchers implemented various quality control measures, such as requiring annotators to complete a qualification task, monitoring for aberrant response patterns, and collecting multiple ratings per item. The final HumorDB dataset contains over 6 million humor ratings, providing a robust benchmark for evaluating AI models' ability to understand and generate humorous content.

The researchers also provide baseline results using several state-of-the-art computer vision and natural language processing models fine-tuned on the HumorDB dataset. Their experiments demonstrate that current AI systems still struggle to accurately predict human humor ratings, highlighting the challenge of this task and the need for further advancements in areas like multimodal understanding and commonsense reasoning.

Critical Analysis

The HumorDB dataset represents a significant contribution to the field of computational humor, providing a large-scale, high-quality resource for benchmarking AI systems. However, the researchers acknowledge several limitations of the current work:

The dataset is primarily focused on Western, English-language humor, which may not generalize well to other cultural contexts.
The humor ratings are subjective and may be influenced by individual differences in taste and preferences.
The dataset contains only static images and comic strips, whereas humor in the real world often involves dynamic, multimodal elements like video, audio, and interaction.

Future research could explore ways to address these limitations, such as expanding the dataset to include more diverse cultural perspectives, incorporating methods to account for individual differences in humor perception, and exploring the use of more dynamic, multimodal humor samples.

Additionally, while the baseline results demonstrate the difficulty of the task, it would be valuable to see more in-depth analysis of the types of humor that current AI systems struggle with, and to investigate potential biases or blind spots in the models. This could provide valuable insights to guide future research and development efforts.

Conclusion

The HumorDB dataset and benchmark represent an important step forward in the field of computational humor. By providing a large-scale, high-quality dataset of human-annotated humor ratings, the researchers have created a valuable resource for developing AI systems that can better understand and generate humorous content.

While the current state-of-the-art models still struggle to accurately predict human humor ratings, the availability of this dataset opens up new avenues for research and innovation. Advances in this area could have far-reaching implications, from enhancing entertainment and marketing experiences to improving the effectiveness of educational materials and supporting mental health interventions.

As the field of AI continues to evolve, the ability to understand and generate humor will become an increasingly valuable skill. The HumorDB dataset and the insights gained from this work will undoubtedly play a crucial role in driving progress in this exciting and multifaceted domain.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Is AI fun? HumorDB: a curated dataset and benchmark to investigate graphical humor

Veedant Jain, Felipe dos Santos Alves Feitosa, Gabriel Kreiman

Despite significant advancements in computer vision, understanding complex scenes, particularly those involving humor, remains a substantial challenge. This paper introduces HumorDB, a novel image-only dataset specifically designed to advance visual humor understanding. HumorDB consists of meticulously curated image pairs with contrasting humor ratings, emphasizing subtle visual cues that trigger humor and mitigating potential biases. The dataset enables evaluation through binary classification(Funny or Not Funny), range regression(funniness on a scale from 1 to 10), and pairwise comparison tasks(Which Image is Funnier?), effectively capturing the subjective nature of humor perception. Initial experiments reveal that while vision-only models struggle, vision-language models, particularly those leveraging large language models, show promising results. HumorDB also shows potential as a valuable zero-shot benchmark for powerful large multimodal models. We open-source both the dataset and code under the CC BY 4.0 license.

6/21/2024

Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning

Jifan Zhang, Lalit Jain, Yang Guo, Jiayi Chen, Kuan Lok Zhou, Siddharth Suresh, Andrew Wagenmaker, Scott Sievert, Timothy Rogers, Kevin Jamieson, Robert Mankoff, Robert Nowak

We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years. This unique dataset supports the development and evaluation of multimodal large language models and preference-based fine-tuning algorithms for humorous caption generation. We propose novel benchmarks for judging the quality of model-generated captions, utilizing both GPT4 and human judgments to establish ranking-based evaluation strategies. Our experimental results highlight the limitations of current fine-tuning methods, such as RLHF and DPO, when applied to creative tasks. Furthermore, we demonstrate that even state-of-the-art models like GPT4 and Claude currently underperform top human contestants in generating humorous captions. As we conclude this extensive data collection effort, we release the entire preference dataset to the research community, fostering further advancements in AI humor generation and evaluation.

6/18/2024

Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models

Zachary Horvitz, Jingru Chen, Rahul Aditya, Harshvardhan Srivastava, Robert West, Zhou Yu, Kathleen McKeown

Humor is a fundamental facet of human cognition and interaction. Yet, despite recent advances in natural language processing, humor detection remains a challenging task that is complicated by the scarcity of datasets that pair humorous texts with similar non-humorous counterparts. In our work, we investigate whether large language models (LLMs), can generate synthetic data for humor detection via editing texts. We benchmark LLMs on an existing human dataset and show that current LLMs display an impressive ability to 'unfun' jokes, as judged by humans and as measured on the downstream task of humor detection. We extend our approach to a code-mixed English-Hindi humor dataset, where we find that GPT-4's synthetic data is highly rated by bilingual annotators and provides challenging adversarial examples for humor classifiers.

6/24/2024

🔮

Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Muller, Andreas Konig, Bjorn W. Schuller

Humor is a substantial element of human social behavior, affect, and cognition. Its automatic understanding can facilitate a more naturalistic human-AI interaction. Current methods of humor detection have been exclusively based on staged data, making them inadequate for real-world applications. We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor (Passau-SFCH) dataset, comprising about 11 hours of recordings. The Passau-SFCH dataset is annotated for the presence of humor and its dimensions (sentiment and direction) as proposed in Martin's Humor Style Questionnaire. We conduct a series of experiments employing pretrained Transformers, convolutional neural networks, and expert-designed features. The performance of each modality (text, audio, video) for spontaneous humor recognition is analyzed and their complementarity is investigated. Our findings suggest that for the automatic analysis of humor and its sentiment, facial expressions are most promising, while humor direction can be best modeled via text-based features. Further, we experiment with different multimodal approaches to humor recognition, including decision-level fusion and MulT, a multimodal Transformer approach. In this context, we propose a novel multimodal architecture that yields the best overall results. Finally, we make our code publicly available at https://www.github.com/lc0197/passau-sfch. The Passau-SFCH dataset is available upon request.

7/9/2024