UniForensics: Face Forgery Detection via General Facial Representation

Read original: arXiv:2407.19079 - Published 7/30/2024 by Ziyuan Fang, Hanqing Zhao, Tianyi Wei, Wenbo Zhou, Ming Wan, Zhanyi Wang, Weiming Zhang, Nenghai Yu

UniForensics: Face Forgery Detection via General Facial Representation

Overview

Deepfake detection is an important task to combat the spread of misinformation and protect individual privacy.
This paper proposes a novel self-supervised contrastive learning approach called UniForensics for face forgery detection.
UniForensics aims to learn a general facial representation that can effectively detect a wide range of face forgeries, including those not seen during training.

Plain English Explanation

The paper introduces a new method called UniForensics for detecting deepfakes, which are manipulated or artificial images or videos made to look like real people. Deepfake detection is crucial to prevent the spread of misinformation and protect people's privacy.

UniForensics uses a self-supervised contrastive learning approach to learn a general facial representation that can effectively detect a wide variety of face forgeries, even ones that the model hasn't seen before during training. This is an important capability, as new deepfake techniques are constantly emerging.

The key idea is to train the model to learn facial features that are common across both real and forged faces, rather than just trying to memorize patterns of specific known forgeries. This allows UniForensics to generalize better to unseen forgeries.

Technical Explanation

The UniForensics model uses a self-supervised contrastive learning approach to learn a general facial representation. The model is trained on a large dataset of both real and forged face images. During training, the model learns to identify common facial features that distinguish real faces from forgeries, without being explicitly told which images are real or fake.

This is achieved by training the model to maximize the similarity between representations of real faces and minimize the similarity between representations of real and forged faces. The model is incentivized to discover the underlying facial characteristics that are consistent across real faces and different from forged faces.

By learning this general facial representation, UniForensics can then be used to effectively detect a wide range of face forgeries, including those not seen during training. This is a significant advantage over previous approaches that relied on detecting specific forgery artifacts or patterns.

Critical Analysis

The authors acknowledge that while UniForensics demonstrates strong performance on a variety of deepfake detection benchmarks, there are still some limitations and areas for further research. For example, the model may struggle with detecting forgeries that closely mimic the facial characteristics of real individuals, or those that introduce subtle manipulations not captured by the learned facial representation.

Additionally, the generalization capabilities of UniForensics could be further explored, particularly in terms of its robustness to different data distributions, forgery techniques, and real-world deployment scenarios.

Overall, the UniForensics approach represents an important step forward in deepfake detection, but continued research and evaluation will be necessary to address the evolving challenges in this rapidly advancing field.

Conclusion

The UniForensics paper presents a novel self-supervised contrastive learning approach for face forgery detection that aims to learn a general facial representation, enabling effective detection of a wide range of deepfakes. By focusing on discovering common facial characteristics across real and forged faces, rather than just memorizing specific forgery patterns, UniForensics demonstrates strong generalization capabilities.

While the method shows promising results, the authors acknowledge the need for further research to address remaining limitations and ensure the robustness of deepfake detection systems in the face of increasingly sophisticated forgery techniques. Continued advancements in this area will be crucial to combat the spread of misinformation and protect individual privacy.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

UniForensics: Face Forgery Detection via General Facial Representation

Ziyuan Fang, Hanqing Zhao, Tianyi Wei, Wenbo Zhou, Ming Wan, Zhanyi Wang, Weiming Zhang, Nenghai Yu

Previous deepfake detection methods mostly depend on low-level textural features vulnerable to perturbations and fall short of detecting unseen forgery methods. In contrast, high-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization. Motivated by this, we propose a detection method that utilizes high-level semantic features of faces to identify inconsistencies in temporal domain. We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video classification network, initialized with a meta-functional face encoder for enriched facial representation. In this way, we can take advantage of both the powerful spatio-temporal model and the high-level semantic information of faces. Furthermore, to leverage easily accessible real face data and guide the model in focusing on spatio-temporal features, we design a Dynamic Video Self-Blending (DVSB) method to efficiently generate training samples with diverse spatio-temporal forgery traces using real facial videos. Based on this, we advance our framework with a two-stage training approach: The first stage employs a novel self-supervised contrastive learning, where we encourage the network to focus on forgery traces by impelling videos generated by the same forgery process to have similar representations. On the basis of the representation learned in the first stage, the second stage involves fine-tuning on face forgery detection dataset to build a deepfake detector. Extensive experiments validates that UniForensics outperforms existing face forgery methods in generalization ability and robustness. In particular, our method achieves 95.3% and 77.2% cross dataset AUC on the challenging Celeb-DFv2 and DFDC respectively.

7/30/2024

Towards More General Video-based Deepfake Detection through Facial Feature Guided Adaptation for Foundation Model

Yue-Hua Han, Tai-Ming Huang, Shu-Tzu Lo, Po-Han Huang, Kai-Lung Hua, Jun-Cheng Chen

With the rise of deep learning, generative models have enabled the creation of highly realistic synthetic images, presenting challenges due to their potential misuse. While research in Deepfake detection has grown rapidly in response, many detection methods struggle with unseen Deepfakes generated by new synthesis techniques. To address this generalisation challenge, we propose a novel Deepfake detection approach by adapting the Foundation Models with rich information encoded inside, specifically using the image encoder from CLIP which has demonstrated strong zero-shot capability for downstream tasks. Inspired by the recent advances of parameter efficient fine-tuning, we propose a novel side-network-based decoder to extract spatial and temporal cues from the given video clip, with the promotion of the Facial Component Guidance (FCG) to encourage the spatial feature to include features of key facial parts for more robust and general Deepfake detection. Through extensive cross-dataset evaluations, our approach exhibits superior effectiveness in identifying unseen Deepfake samples, achieving notable performance improvement even with limited training samples and manipulation types. Our model secures an average performance enhancement of 0.9% AUROC in cross-dataset assessments comparing with state-of-the-art methods, especially a significant lead of achieving 4.4% improvement on the challenging DFDC dataset.

6/6/2024

A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

Yijun Bei, Hengrui Lou, Jinsong Geng, Erteng Liu, Lechao Cheng, Jie Song, Mingli Song, Zunlei Feng

With the rapid development of AI-generated content (AIGC) technology, the production of realistic fake facial images and videos that deceive human visual perception has become possible. Consequently, various face forgery detection techniques have been proposed to identify such fake facial content. However, evaluating the effectiveness and generalizability of these detection techniques remains a significant challenge. To address this, we have constructed a large-scale evaluation benchmark called DeepFaceGen, aimed at quantitatively assessing the effectiveness of face forgery detection and facilitating the iterative development of forgery detection technology. DeepFaceGen consists of 776,990 real face image/video samples and 773,812 face forgery image/video samples, generated using 34 mainstream face generation techniques. During the construction process, we carefully consider important factors such as content diversity, fairness across ethnicities, and availability of comprehensive labels, in order to ensure the versatility and convenience of DeepFaceGen. Subsequently, DeepFaceGen is employed in this study to evaluate and analyze the performance of 13 mainstream face forgery detection techniques from various perspectives. Through extensive experimental analysis, we derive significant findings and propose potential directions for future research. The code and dataset for DeepFaceGen are available at https://github.com/HengruiLou/DeepFaceGen.

6/17/2024

GM-DF: Generalized Multi-Scenario Deepfake Detection

Yingxin Lai, Zitong Yu, Jing Yang, Bin Li, Xiangui Kang, Linlin Shen

Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation of detection accuracy when models are directly trained on combined datasets due to the discrepancy across collection scenarios and generation methods. To address the above issue, a Generalized Multi-Scenario Deepfake Detection framework (GM-DF) is proposed to serve multiple real-world scenarios by a unified model. First, we propose a hybrid expert modeling approach for domain-specific real/forgery feature extraction. Besides, as for the commonality representation, we use CLIP to extract the common features for better aligning visual and textual features across domains. Meanwhile, we introduce a masked image reconstruction mechanism to force models to capture rich forged details. Finally, we supervise the models via a domain-aware meta-learning strategy to further enhance their generalization capacities. Specifically, we design a novel domain alignment loss to strongly align the distributions of the meta-test domains and meta-train domains. Thus, the updated models are able to represent both specific and common real/forgery features across multiple datasets. In consideration of the lack of study of multi-dataset training, we establish a new benchmark leveraging multi-source data to fairly evaluate the models' generalization capacity on unseen scenarios. Both qualitative and quantitative experiments on five datasets conducted on traditional protocols as well as the proposed benchmark demonstrate the effectiveness of our approach.

7/1/2024