Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos

Read original: arXiv:2406.09601 - Published 6/17/2024 by Qingyuan Liu, Pengyuan Shi, Yun-Yun Tsai, Chengzhi Mao, Junfeng Yang

Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos

Overview

This paper proposes a method for detecting AI-generated videos, which are becoming more prevalent and difficult to distinguish from real videos.
The researchers develop a benchmark dataset to evaluate the performance of AI video detection models and present a new model that outperforms existing approaches.
The work aims to address the growing challenge of identifying manipulated media, known as "deepfakes," which can be used to spread misinformation or infringe on intellectual property rights.

Plain English Explanation

The paper focuses on the problem of detecting AI-generated videos, which are becoming increasingly realistic and hard to distinguish from real videos. These AI-generated videos, often called "deepfakes," can be used to spread false information or violate intellectual property rights.

To address this challenge, the researchers create a new dataset that can be used to test the performance of AI video detection models. They then present a new model that outperforms existing approaches in identifying AI-generated videos.

The paper aims to help develop more robust methods for detecting manipulated media, which is an important issue as the quality of AI-generated content continues to improve. By making it easier to identify fake videos, the researchers hope to limit the spread of misinformation and protect against the unauthorized use of copyrighted material.

Technical Explanation

The researchers develop a new dataset called "Turns Out I'm Not Real" to evaluate the performance of AI video detection models. This dataset includes a diverse set of real and AI-generated videos, covering a range of subjects, camera angles, and video qualities.

The researchers then present a new model for detecting AI-generated videos that builds on recent advances in diffusion models and other state-of-the-art approaches. Their model is designed to be more robust and accurate than existing methods, with the goal of helping to mitigate the issue of IP infringement in visual generative AI.

Critical Analysis

The paper provides a valuable contribution to the field of deepfake detection, but it also acknowledges some limitations and areas for further research. For example, the dataset may not fully capture the diversity of AI-generated videos that could emerge in the future, and the model's performance may be influenced by the specific training data and architectures used.

Additionally, the paper does not address broader societal implications or ethical considerations around the use of AI-generated media, which is an important area for further discussion and analysis.

Conclusion

Overall, this paper presents a promising approach for detecting AI-generated videos, which is an increasingly important challenge as the quality of these manipulated media continues to improve. By developing a robust benchmark dataset and a new detection model, the researchers are contributing to the ongoing efforts to combat the spread of misinformation and protect against the unauthorized use of copyrighted material.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos

Qingyuan Liu, Pengyuan Shi, Yun-Yun Tsai, Chengzhi Mao, Junfeng Yang

The impressive achievements of generative models in creating high-quality videos have raised concerns about digital integrity and privacy vulnerabilities. Recent works to combat Deepfakes videos have developed detectors that are highly accurate at identifying GAN-generated samples. However, the robustness of these detectors on diffusion-generated videos generated from video creation tools (e.g., SORA by OpenAI, Runway Gen-2, and Pika, etc.) is still unexplored. In this paper, we propose a novel framework for detecting videos synthesized from multiple state-of-the-art (SOTA) generative models, such as Stable Video Diffusion. We find that the SOTA methods for detecting diffusion-generated images lack robustness in identifying diffusion-generated videos. Our analysis reveals that the effectiveness of these detectors diminishes when applied to out-of-domain videos, primarily because they struggle to track the temporal features and dynamic variations between frames. To address the above-mentioned challenge, we collect a new benchmark video dataset for diffusion-generated videos using SOTA video creation tools. We extract representation within explicit knowledge from the diffusion model for video frames and train our detector with a CNN + LSTM architecture. The evaluation shows that our framework can well capture the temporal features between frames, achieves 93.7% detection accuracy for in-domain videos, and improves the accuracy of out-domain videos by up to 16 points.

6/17/2024

What Matters in Detecting AI-Generated Videos like Sora?

Chirui Chang, Zhengzhe Liu, Xiaoyang Lyu, Xiaojuan Qi

Recent advancements in diffusion-based video generation have showcased remarkable results, yet the gap between synthetic and real-world videos remains under-explored. In this study, we examine this gap from three fundamental perspectives: appearance, motion, and geometry, comparing real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion. To achieve this, we train three classifiers using 3D convolutional networks, each targeting distinct aspects: vision foundation model features for appearance, optical flow for motion, and monocular depth for geometry. Each classifier exhibits strong performance in fake video detection, both qualitatively and quantitatively. This indicates that AI-generated videos are still easily detectable, and a significant gap between real and fake videos persists. Furthermore, utilizing the Grad-CAM, we pinpoint systematic failures of AI-generated videos in appearance, motion, and geometry. Finally, we propose an Ensemble-of-Experts model that integrates appearance, optical flow, and depth information for fake video detection, resulting in enhanced robustness and generalization ability. Our model is capable of detecting videos generated by Sora with high accuracy, even without exposure to any Sora videos during training. This suggests that the gap between real and fake videos can be generalized across various video generative models. Project page: https://justin-crchang.github.io/3DCNNDetection.github.io/

7/1/2024

🔎

Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method

Peisong He, Leyao Zhu, Jiaxing Li, Shiqi Wang, Haoliang Li

The generative model has made significant advancements in the creation of realistic videos, which causes security issues. However, this emerging risk has not been adequately addressed due to the absence of a benchmark dataset for AI-generated videos. In this paper, we first construct a video dataset using advanced diffusion-based video generation algorithms with various semantic contents. Besides, typical video lossy operations over network transmission are adopted to generate degraded samples. Then, by analyzing local and global temporal defects of current AI-generated videos, a novel detection framework by adaptively learning local motion information and global appearance variation is constructed to expose fake videos. Finally, experiments are conducted to evaluate the generalization and robustness of different spatial and temporal domain detection methods, where the results can serve as the baseline and demonstrate the research challenge for future studies.

5/8/2024

Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features

Lichuan Ji, Yingqi Lin, Zhenhua Huang, Yan Han, Xiaogang Xu, Jiafei Wu, Chong Wang, Zhe Liu

The development of AI-Generated Content (AIGC) has empowered the creation of remarkably realistic AI-generated videos, such as those involving Sora. However, the widespread adoption of these models raises concerns regarding potential misuse, including face video scams and copyright disputes. Addressing these concerns requires the development of robust tools capable of accurately determining video authenticity. The main challenges lie in the dataset and neural classifier for training. Current datasets lack a varied and comprehensive repository of real and generated content for effective discrimination. In this paper, we first introduce an extensive video dataset designed specifically for AI-Generated Video Detection (GenVidDet). It includes over 2.66 M instances of both real and generated videos, varying in categories, frames per second, resolutions, and lengths. The comprehensiveness of GenVidDet enables the training of a generalizable video detector. We also present the Dual-Branch 3D Transformer (DuB3D), an innovative and effective method for distinguishing between real and generated videos, enhanced by incorporating motion information alongside visual appearance. DuB3D utilizes a dual-branch architecture that adaptively leverages and fuses raw spatio-temporal data and optical flow. We systematically explore the critical factors affecting detection performance, achieving the optimal configuration for DuB3D. Trained on GenVidDet, DuB3D can distinguish between real and generated video content with 96.77% accuracy, and strong generalization capability even for unseen types.

5/27/2024