Explaining deep learning models for spoofing and deepfake detection with SHapley Additive exPlanations

Read original: arXiv:2110.03309 - Published 4/29/2024 by Wanying Ge, Jose Patino, Massimiliano Todisco, Nicholas Evans

🤿

Overview

Significant progress has been made in detecting deepfakes and spoofed media, but explaining how detection models work remains a challenge.
This paper explores using a tool called SHapley Additive exPlanations (SHAP) to gain insights into spoofing detection models.
The paper demonstrates how SHAP can reveal unexpected classifier behaviors, identify the most influential features, and compare different spoofing detection models.
The tool is efficient, flexible, and the results are reproducible using open-source software.

Plain English Explanation

Deepfakes and other types of spoofed media, where content is manipulated to appear real, have become a growing concern. Researchers have made substantial progress in developing algorithms that can detect these manipulated media. However, a key challenge remains: explaining how these detection models actually work and make their decisions.

The researchers in this paper explored using a tool called SHapley Additive exPlanations (SHAP) to gain more insight into spoofing detection models. SHAP is a way to understand what factors or "features" a model is using to make its predictions.

By applying SHAP to spoofing detection models, the researchers were able to uncover some surprising insights. They could see which specific artifacts or characteristics of the media were most influential in the model's decision-making process. They could also compare how different spoofing detection models behaved and identify key differences between them.

Importantly, the SHAP tool proved to be both efficient and flexible, allowing the researchers to easily apply it to a variety of model architectures beyond just spoofing detection. The results they obtained were also fully reproducible using open-source software, making the research more transparent and accessible.

Technical Explanation

The paper describes the researchers' use of SHapley Additive exPlanations (SHAP) to gain insights into the behavior of spoofing detection models. SHAP is a powerful technique for explaining the output of machine learning models by quantifying the contribution of each input feature to the final prediction.

The researchers applied SHAP to multiple spoofing detection models, including both image-based and video-based approaches. By analyzing the SHAP values, they were able to reveal unexpected classifier behaviors, identify the specific image/video artifacts that contributed most to the model's outputs, and compare the decision-making processes of different spoofing detection architectures.

For example, the SHAP analysis uncovered that certain spoofing detection models were overly reliant on spurious correlations in the training data, leading to vulnerabilities. The tool also helped the researchers understand how various low-level image features, such as color and texture, influenced the models' predictions.

Importantly, the SHAP-based analysis was not limited to the spoofing detection domain. The researchers demonstrated the flexibility of the approach by applying it to other related tasks, such as anomaly detection and face forgery detection. The results obtained were fully reproducible using open-source SHAP software.

Critical Analysis

The paper makes a valuable contribution by showcasing the use of SHAP to gain insights into the inner workings of spoofing detection models. This type of model explanation is crucial for building trust in these systems and ensuring their robustness.

However, the paper does not address some important limitations of the SHAP approach. As noted in the T-Explainer paper, SHAP can be sensitive to feature correlations and may not capture higher-order interactions effectively. Additionally, the paper does not discuss the computational overhead of applying SHAP to large-scale models or datasets.

Further research is needed to explore complementary explanation techniques, such as contrastive or counterfactual explanations, to provide a more comprehensive understanding of spoofing detection models. Combining multiple explanation approaches could lead to more robust and trustworthy AI systems in this domain.

Conclusion

This paper demonstrates the valuable insights that can be gained by applying SHAP to spoofing detection models. By revealing unexpected classifier behaviors, identifying the most influential features, and comparing different model architectures, the SHAP-based analysis provides a path towards more transparent and explainable AI in the context of deepfake and media manipulation detection.

The flexibility and efficiency of the SHAP tool, as well as the reproducibility of the results, suggest that this approach could have broader applicability beyond just spoofing detection. As the research community continues to push towards trustworthy AI, tools like SHAP will play a crucial role in building confidence and understanding in complex machine learning systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤿

Explaining deep learning models for spoofing and deepfake detection with SHapley Additive exPlanations

Wanying Ge, Jose Patino, Massimiliano Todisco, Nicholas Evans

Substantial progress in spoofing and deepfake detection has been made in recent years. Nonetheless, the community has yet to make notable inroads in providing an explanation for how a classifier produces its output. The dominance of black box spoofing detection solutions is at further odds with the drive toward trustworthy, explainable artificial intelligence. This paper describes our use of SHapley Additive exPlanations (SHAP) to gain new insights in spoofing detection. We demonstrate use of the tool in revealing unexpected classifier behaviour, the artefacts that contribute most to classifier outputs and differences in the behaviour of competing spoofing detection models. The tool is both efficient and flexible, being readily applicable to a host of different architecture models in addition to related, different applications. All results reported in the paper are reproducible using open-source software.

4/29/2024

Unified Explanations in Machine Learning Models: A Perturbation Approach

Jacob Dineen, Don Kridel, Daniel Dolk, David Castillo

A high-velocity paradigm shift towards Explainable Artificial Intelligence (XAI) has emerged in recent years. Highly complex Machine Learning (ML) models have flourished in many tasks of intelligence, and the questions have started to shift away from traditional metrics of validity towards something deeper: What is this model telling me about my data, and how is it arriving at these conclusions? Inconsistencies between XAI and modeling techniques can have the undesirable effect of casting doubt upon the efficacy of these explainability approaches. To address these problems, we propose a systematic, perturbation-based analysis against a popular, model-agnostic method in XAI, SHapley Additive exPlanations (Shap). We devise algorithms to generate relative feature importance in settings of dynamic inference amongst a suite of popular machine learning and deep learning methods, and metrics that allow us to quantify how well explanations generated under the static case hold. We propose a taxonomy for feature importance methodology, measure alignment, and observe quantifiable similarity amongst explanation models across several datasets.

5/31/2024

Fooling SHAP with Output Shuffling Attacks

Jun Yuan, Aritra Dasgupta

Explainable AI~(XAI) methods such as SHAP can help discover feature attributions in black-box models. If the method reveals a significant attribution from a ``protected feature'' (e.g., gender, race) on the model output, the model is considered unfair. However, adversarial attacks can subvert the detection of XAI methods. Previous approaches to constructing such an adversarial model require access to underlying data distribution, which may not be possible in many practical scenarios. We relax this constraint and propose a novel family of attacks, called shuffling attacks, that are data-agnostic. The proposed attack strategies can adapt any trained machine learning model to fool Shapley value-based explanations. We prove that Shapley values cannot detect shuffling attacks. However, algorithms that estimate Shapley values, such as linear SHAP and SHAP, can detect these attacks with varying degrees of effectiveness. We demonstrate the efficacy of the attack strategies by comparing the performance of linear SHAP and SHAP using real-world datasets.

8/14/2024

Shaping Up SHAP: Enhancing Stability through Layer-Wise Neighbor Selection

Gwladys Kelodjou, Laurence Roz'e, V'eronique Masson, Luis Gal'arraga, Romaric Gaudel, Maurice Tchuente, Alexandre Termier

Machine learning techniques, such as deep learning and ensemble methods, are widely used in various domains due to their ability to handle complex real-world tasks. However, their black-box nature has raised multiple concerns about the fairness, trustworthiness, and transparency of computer-assisted decision-making. This has led to the emergence of local post-hoc explainability methods, which offer explanations for individual decisions made by black-box algorithms. Among these methods, Kernel SHAP is widely used due to its model-agnostic nature and its well-founded theoretical framework. Despite these strengths, Kernel SHAP suffers from high instability: different executions of the method with the same inputs can lead to significantly different explanations, which diminishes the relevance of the explanations. The contribution of this paper is two-fold. On the one hand, we show that Kernel SHAP's instability is caused by its stochastic neighbor selection procedure, which we adapt to achieve full stability without compromising explanation fidelity. On the other hand, we show that by restricting the neighbors generation to perturbations of size 1 -- which we call the coalitions of Layer 1 -- we obtain a novel feature-attribution method that is fully stable, computationally efficient, and still meaningful.

6/18/2024