On the Faithfulness of Vision Transformer Explanations

2404.01415

Published 4/3/2024 by Junyi Wu, Weitai Kang, Hao Tang, Yuan Hong, Yan Yan

On the Faithfulness of Vision Transformer Explanations

Abstract

To interpret Vision Transformers, post-hoc explanations assign salience scores to input pixels, providing human-understandable heatmaps. However, whether these interpretations reflect true rationales behind the model's output is still underexplored. To address this gap, we study the faithfulness criterion of explanations: the assigned salience scores should represent the influence of the corresponding input pixels on the model's predictions. To evaluate faithfulness, we introduce Salience-guided Faithfulness Coefficient (SaCo), a novel evaluation metric leveraging essential information of salience distribution. Specifically, we conduct pair-wise comparisons among distinct pixel groups and then aggregate the differences in their salience scores, resulting in a coefficient that indicates the explanation's degree of faithfulness. Our explorations reveal that current metrics struggle to differentiate between advanced explanation methods and Random Attribution, thereby failing to capture the faithfulness property. In contrast, our proposed SaCo offers a reliable faithfulness measurement, establishing a robust metric for interpretations. Furthermore, our SaCo demonstrates that the use of gradient and multi-layer aggregation can markedly enhance the faithfulness of attention-based explanation, shedding light on potential paths for advancing Vision Transformer explainability.

Create account to get full access

Overview

The paper examines the faithfulness of explanations provided by Vision Transformers, a type of deep learning model used for image classification tasks.
The researchers investigate whether the saliency maps and attention visualizations commonly used to explain these models' decisions accurately reflect the features they actually use to make predictions.
They find that while attention-based explanations can capture some relevant features, they often fail to identify the most important factors driving the model's outputs.
The paper proposes an alternative explanation method called Relevance Propagation that provides more faithful insights into Vision Transformer decision-making.

Plain English Explanation

Deep learning models like Vision Transformers have become powerful tools for image classification, but it can be difficult to understand how they arrive at their predictions. Saliency maps and attention visualizations are commonly used to explain these models, highlighting the image regions or features the model focuses on.

However, the new research suggests these explanation methods don't always accurately reflect the true factors driving the model's decisions. While attention-based explanations can capture some relevant information, they often fail to identify the most important features the model is actually using.

The researchers propose an alternative approach called Relevance Propagation that provides more faithful insights into how Vision Transformers make their decisions. By tracing the flow of information through the model, Relevance Propagation can better pinpoint the specific image elements that have the greatest influence on the final classification output.

This is an important finding, as faithful explanations are crucial for building trust in AI systems and understanding their limitations. If the standard visualization techniques don't accurately reflect how a model is making its judgments, it becomes difficult to validate the model's reasoning or identify potential biases or errors.

Technical Explanation

The paper evaluates the faithfulness of two common explanation methods for Vision Transformers: saliency maps, which highlight the most important pixels in the input image, and attention visualizations, which show the regions the model focuses on.

To assess faithfulness, the researchers compare these explanation outputs to the actual relevance of different image regions, as determined by a "ground truth" relevance metric they develop. This metric traces the flow of information through the Vision Transformer to quantify each region's contribution to the final classification decision.

Their experiments show that while attention-based explanations can identify some relevant features, they often fail to pinpoint the most important factors driving the model's outputs. In contrast, the Relevance Propagation approach is able to provide more accurate and faithful insights into the model's decision-making process.

The paper also demonstrates that the faithfulness of explanations can vary depending on factors like the image content, model architecture, and training dataset. This highlights the need for rigorous evaluation of explanation methods to ensure they are truly capturing the model's reasoning.

Critical Analysis

The paper provides a thorough and rigorous analysis of explanation methods for Vision Transformers, making an important contribution to the growing field of interpretable AI. By developing a ground truth relevance metric, the researchers are able to assess the accuracy of saliency maps and attention visualizations in a principled way.

One limitation noted in the paper is that the Relevance Propagation technique may be computationally expensive, especially for larger models. The authors suggest exploring ways to make the approach more efficient, such as approximations or parallelization.

Additionally, while the paper focuses on Vision Transformers, the findings likely have broader implications for the interpretability of other deep learning models as well. Further research could investigate the faithfulness of explanations for other architectures and application domains.

It would also be valuable to explore the practical implications of these results, such as how faithful explanations could be used to improve model development, debugging, or deployment in real-world scenarios. Engaging with end-users and domain experts could provide useful insights into the specific explanation needs and challenges they face.

Overall, this paper makes a compelling case for the importance of evaluating explanation methods and developing more faithful techniques to build trust and understanding in AI systems.

Conclusion

This research highlights a critical issue in the field of interpretable AI - the need to ensure the explanations provided by models like Vision Transformers are truly faithful representations of how they make decisions. The findings demonstrate that standard saliency maps and attention visualizations can fail to capture the most important features driving a model's outputs.

By proposing the Relevance Propagation approach as a more accurate explanation method, the paper offers a path forward for building greater transparency and trust in advanced deep learning systems. Faithful explanations are crucial for validating model behavior, identifying biases or errors, and ultimately ensuring these technologies are developed and deployed responsibly.

As AI continues to be applied in high-stakes domains, this work underscores the importance of not just generating explanations, but ensuring they authentically reflect the inner workings of these complex models. The insights from this research can help pave the way for more trustworthy and interpretable AI systems that can be better understood and relied upon.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

Hengyi Wang, Shiwei Tan, Hao Wang

Vision transformers (ViTs) have emerged as a significant area of focus, particularly for their capacity to be jointly trained with large language models and to serve as robust vision foundation models. Yet, the development of trustworthy explanation methods for ViTs has lagged, particularly in the context of post-hoc interpretations of ViT predictions. Existing sub-image selection approaches, such as feature-attribution and conceptual models, fall short in this regard. This paper proposes five desiderata for explaining ViTs -- faithfulness, stability, sparsity, multi-level structure, and parsimony -- and demonstrates the inadequacy of current methods in meeting these criteria comprehensively. We introduce a variational Bayesian explanation framework, dubbed ProbAbilistic Concept Explainers (PACE), which models the distributions of patch embeddings to provide trustworthy post-hoc conceptual explanations. Our qualitative analysis reveals the distributions of patch-level concepts, elucidating the effectiveness of ViTs by modeling the joint distribution of patch embeddings and ViT's predictions. Moreover, these patch-level explanations bridge the gap between image-level and dataset-level explanations, thus completing the multi-level structure of PACE. Through extensive experiments on both synthetic and real-world datasets, we demonstrate that PACE surpasses state-of-the-art methods in terms of the defined desiderata.

6/21/2024

cs.LG cs.AI cs.CV stat.ML

👀

Improving Interpretation Faithfulness for Vision Transformers

Lijie Hu, Yixin Liu, Ninghao Liu, Mengdi Huai, Lichao Sun, Di Wang

Vision Transformers (ViTs) have achieved state-of-the-art performance for various vision tasks. One reason behind the success lies in their ability to provide plausible innate explanations for the behavior of neural architectures. However, ViTs suffer from issues with explanation faithfulness, as their focal points are fragile to adversarial attacks and can be easily changed with even slight perturbations on the input image. In this paper, we propose a rigorous approach to mitigate these issues by introducing Faithful ViTs (FViTs). Briefly speaking, an FViT should have the following two properties: (1) The top-$k$ indices of its self-attention vector should remain mostly unchanged under input perturbation, indicating stable explanations; (2) The prediction distribution should be robust to perturbations. To achieve this, we propose a new method called Denoised Diffusion Smoothing (DDS), which adopts randomized smoothing and diffusion-based denoising. We theoretically prove that processing ViTs directly with DDS can turn them into FViTs. We also show that Gaussian noise is nearly optimal for both $ell_2$ and $ell_infty$-norm cases. Finally, we demonstrate the effectiveness of our approach through comprehensive experiments and evaluations. Results show that FViTs are more robust against adversarial attacks while maintaining the explainability of attention, indicating higher faithfulness.

5/6/2024

cs.CV cs.AI cs.LG

💬

Robust Infidelity: When Faithfulness Measures on Masked Language Models Are Misleading

Evan Crothers, Herna Viktor, Nathalie Japkowicz

A common approach to quantifying neural text classifier interpretability is to calculate faithfulness metrics based on iteratively masking salient input tokens and measuring changes in the model prediction. We propose that this property is better described as sensitivity to iterative masking, and highlight pitfalls in using this measure for comparing text classifier interpretability. We show that iterative masking produces large variation in faithfulness scores between otherwise comparable Transformer encoder text classifiers. We then demonstrate that iteratively masked samples produce embeddings outside the distribution seen during training, resulting in unpredictable behaviour. We further explore task-specific considerations that undermine principled comparison of interpretability using iterative masking, such as an underlying similarity to salience-based adversarial attacks. Our findings give insight into how these behaviours affect neural text classifiers, and provide guidance on how sensitivity to iterative masking should be interpreted.

6/4/2024

cs.CL cs.LG

Evaluating Readability and Faithfulness of Concept-based Explanations

Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, Xiting Wang

Despite the surprisingly high intelligence exhibited by Large Language Models (LLMs), we are somehow intimidated to fully deploy them into real-life applications considering their black-box nature. Concept-based explanations arise as a promising avenue for explaining what the LLMs have learned, making them more transparent to humans. However, current evaluations for concepts tend to be heuristic and non-deterministic, e.g. case study or human evaluation, hindering the development of the field. To bridge the gap, we approach concept-based explanation evaluation via faithfulness and readability. We first introduce a formal definition of concept generalizable to diverse concept-based explanations. Based on this, we quantify faithfulness via the difference in the output upon perturbation. We then provide an automatic measure for readability, by measuring the coherence of patterns that maximally activate a concept. This measure serves as a cost-effective and reliable substitute for human evaluation. Finally, based on measurement theory, we describe a meta-evaluation method for evaluating the above measures via reliability and validity, which can be generalized to other tasks as well. Extensive experimental analysis has been conducted to validate and inform the selection of concept evaluation measures.

5/1/2024

cs.AI cs.HC