Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

2405.04097

Published 5/8/2024 by Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

Abstract

The emergence of contemporary deepfakes has attracted significant attention in machine learning research, as artificial intelligence (AI) generated synthetic media increases the incidence of misinterpretation and is difficult to distinguish from genuine content. Currently, machine learning techniques have been extensively studied for automatically detecting deepfakes. However, human perception has been less explored. Malicious deepfakes could ultimately cause public and social problems. Can we humans correctly perceive the authenticity of the content of the videos we watch? The answer is obviously uncertain; therefore, this paper aims to evaluate the human ability to discern deepfake videos through a subjective study. We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models. To this end, we used gamification concepts to provide 110 participants (55 native English speakers and 55 non-native English speakers) with a webbased platform where they could access a series of 40 videos (20 real and 20 fake) to determine their authenticity. Each participant performed the experiment twice with the same 40 videos in different random orders. The videos are manually selected from the FakeAVCeleb dataset. We found that all AI models performed better than humans when evaluated on the same 40 videos. The study also reveals that while deception is not impossible, humans tend to overestimate their detection capabilities. Our experimental results may help benchmark human versus machine performance, advance forensics analysis, and enable adaptive countermeasures.

Create account to get full access

Overview

This paper investigates how humans perceive and detect audiovisual deepfakes, which are synthetic media that combine manipulated audio and video.
The researchers conducted a large-scale crowdsourcing study to understand people's ability to identify deepfakes across different types of media (e.g., video, audio, and audiovisual).
They also explored the impact of gamification and feedback on deepfake detection performance.

Plain English Explanation

The paper explores how well people can identify fake videos and audio that have been manipulated using advanced AI techniques, known as deepfakes. Deepfakes can be used to create convincing but false audio and video content, which raises concerns about the spread of misinformation.

The researchers ran a large online study where they showed people different types of media - some real and some deepfakes - and asked them to try to identify which ones were fake. They also tested whether making the task into a game-like experience, with points and feedback, would help people get better at spotting the deepfakes.

The goal was to understand the current state of human perception and detection of deepfakes, which could inform the development of better tools and techniques to combat the spread of this type of synthetic media.

Technical Explanation

The researchers conducted a large-scale crowdsourcing study to investigate human perception of audiovisual deepfakes. They recruited over 4,000 participants to engage in a series of trials where they were shown various media clips - some real and some deepfakes - and asked to identify which ones were synthetic.

The study design included three main conditions: [video-only], [audio-only], and [audiovisual]. Within each condition, the researchers also tested the impact of gamification and feedback on deepfake detection performance.

The [video-only] condition presented participants with manipulated video clips, while the [audio-only] condition used synthetic audio clips. The [audiovisual] condition combined manipulated audio and video into a single deepfake.

In the gamification conditions, participants earned points for correctly identifying deepfakes, which introduced an element of competition and engagement. The feedback conditions provided participants with information about their detection accuracy after each trial.

The results showed that human performance on deepfake detection was generally poor, with accuracy rates close to chance level. However, the researchers found that gamification and feedback could modestly improve people's ability to spot deepfakes, particularly in the [audiovisual] condition.

The paper provides valuable insights into the current limitations of human perception when it comes to detecting deepfakes, and highlights the need for continued research and development of automated deepfake detection systems.

Critical Analysis

The paper provides a comprehensive and rigorous investigation of human perception of audiovisual deepfakes. The large-scale crowdsourcing approach and careful experimental design allow the researchers to draw meaningful conclusions about the current state of deepfake detection capabilities.

One potential limitation is that the study focused on a specific set of deepfake generation algorithms and media content. It would be interesting to see how the results might vary with different types of deepfakes or in more realistic, real-world scenarios.

Additionally, the paper does not delve deeply into the underlying cognitive and perceptual processes that contribute to people's ability (or inability) to detect deepfakes. Further research exploring the psychological factors at play could provide valuable insights.

Despite these minor caveats, the paper makes a significant contribution to the understanding of deepfake detection and the challenges we face in combating the spread of synthetic media. The findings underscore the need for continued advancements in automated deepfake detection techniques, as well as the importance of educating the public on how to critically evaluate the media they consume.

Conclusion

This paper provides a comprehensive investigation of human perception of audiovisual deepfakes, highlighting the current limitations in people's ability to detect synthetic media. The large-scale crowdsourcing study reveals that human performance on deepfake detection is generally poor, with accuracy rates close to chance level.

However, the researchers also found that introducing gamification and feedback can modestly improve people's deepfake detection skills, particularly in the audiovisual condition. These findings have important implications for the development of tools and techniques to combat the spread of misinformation, as well as the need for public education on how to critically evaluate the media they encounter.

Overall, the paper makes a valuable contribution to the understanding of deepfake detection and the ongoing challenges in this rapidly evolving field. As deepfake technologies continue to advance, it will be crucial to develop robust and reliable detection methods, while also empowering the public to be discerning consumers of digital media.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔎

As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli

Di Cooke, Abigail Edwards, Sophia Barkoff, Kathryn Kelly

As synthetic media becomes progressively more realistic and barriers to using it continue to lower, the technology has been increasingly utilized for malicious purposes, from financial fraud to nonconsensual pornography. Today, the principal defense against being misled by synthetic media relies on the ability of the human observer to visually and auditorily discern between real and fake. However, it remains unclear just how vulnerable people actually are to deceptive synthetic media in the course of their day to day lives. We conducted a perceptual study with 1276 participants to assess how accurate people were at distinguishing synthetic images, audio only, video only, and audiovisual stimuli from authentic. To reflect the circumstances under which people would likely encounter synthetic media in the wild, testing conditions and stimuli emulated a typical online platform, while all synthetic media used in the survey was sourced from publicly accessible generative AI technology. We find that overall, participants struggled to meaningfully discern between synthetic and authentic content. We also find that detection performance worsens when the stimuli contains synthetic content as compared to authentic content, images featuring human faces as compared to non face objects, a single modality as compared to multimodal stimuli, mixed authenticity as compared to being fully synthetic for audiovisual stimuli, and features foreign languages as compared to languages the observer is fluent in. Finally, we also find that prior knowledge of synthetic media does not meaningfully impact their detection performance. Collectively, these results indicate that people are highly susceptible to being tricked by synthetic media in their daily lives and that human perceptual detection capabilities can no longer be relied upon as an effective counterdefense.

4/5/2024

cs.HC cs.AI cs.SD eess.AS

Detecting music deepfakes is easy but actually hard

Darius Afchar, Gabriel Meseguer-Brocal, Romain Hennequin

In the face of a new era of generative models, the detection of artificially generated content has become a matter of utmost importance. The ability to create credible minute-long music deepfakes in a few seconds on user-friendly platforms poses a real threat of fraud on streaming services and unfair competition to human artists. This paper demonstrates the possibility (and surprising ease) of training classifiers on datasets comprising real audio and fake reconstructions, achieving a convincing accuracy of 99.8%. To our knowledge, this marks the first publication of a music deepfake detector, a tool that will help in the regulation of music forgery. Nevertheless, informed by decades of literature on forgery detection in other fields, we stress that a good test score is not the end of the story. We step back from the straightforward ML framework and expose many facets that could be problematic with such a deployed detector: calibration, robustness to audio manipulation, generalisation to unseen models, interpretability and possibility for recourse. This second part acts as a position for future research steps in the field and a caveat to a flourishing market of fake content checkers.

5/24/2024

cs.SD cs.LG eess.AS

🧪

Media Forensics and Deepfake Systematic Survey

Nadeem Jabbar CH, Aqib Saghir, Ayaz Ahmad Meer, Salman Ahmad Sahi, Bilal Hassan, Siddiqui Muhammad Yasir

Deepfake is a generative deep learning algorithm that creates or changes facial features in a very realistic way making it hard to differentiate the real from the fake features It can be used to make movies look better as well as to spread false information by imitating famous people In this paper many different ways to make a Deepfake are explained analyzed and separated categorically Using Deepfake datasets models are trained and tested for reliability through experiments Deepfakes are a type of facial manipulation that allow people to change their entire faces identities attributes and expressions The trends in the available Deepfake datasets are also discussed with a focus on how they have changed Using Deep learning a general Deepfake detection model is made Moreover the problems in making and detecting Deepfakes are also mentioned As a result of this survey it is expected that the development of new Deepfake based imaging tools will speed up in the future This survey gives indepth review of methods for manipulating images of face and various techniques to spot altered face images Four types of facial manipulation are specifically discussed which are attribute manipulation expression swap entire face synthesis and identity swap Across every manipulation category we yield information on manipulation techniques significant benchmarks for technical evaluation of counterfeit detection techniques available public databases and a summary of the outcomes of all such analyses From all of the topics in the survey we focus on the most recent development of Deepfake showing its advances and obstacles in detecting fake images

6/21/2024

cs.CV cs.AI cs.MM

A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection

Kyungbok Lee, You Zhang, Zhiyao Duan

This paper addresses the challenge of developing a robust audio-visual deepfake detection model. In practical use cases, new generation algorithms are continually emerging, and these algorithms are not encountered during the development of detection methods. This calls for the generalization ability of the method. Additionally, to ensure the credibility of detection methods, it is beneficial for the model to interpret which cues from the video indicate it is fake. Motivated by these considerations, we then propose a multi-stream fusion approach with one-class learning as a representation-level regularization technique. We study the generalization problem of audio-visual deepfake detection by creating a new benchmark by extending and re-splitting the existing FakeAVCeleb dataset. The benchmark contains four categories of fake video(Real Audio-Fake Visual, Fake Audio-Fake Visual, Fake Audio-Real Visual, and unsynchronized video). The experimental results show that our approach improves the model's detection of unseen attacks by an average of 7.31% across four test sets, compared to the baseline model. Additionally, our proposed framework offers interpretability, indicating which modality the model identifies as fake.

6/21/2024

cs.SD cs.AI cs.MM eess.AS