Faithfulness Measurable Masked Language Models

Read original: arXiv:2310.07819 - Published 8/29/2024 by Andreas Madsen, Siva Reddy, Sarath Chandar

💬

Overview

Researchers propose a novel approach to measuring the faithfulness of explanations for natural language processing (NLP) models.
Existing methods for explaining NLP model predictions, such as importance measures, are often misleading despite being persuasive.
The proposed method addresses key challenges with existing faithfulness metrics, including out-of-distribution issues and computational expense.
The approach involves a novel fine-tuning technique that makes masked tokens "in-distribution," enabling more faithful importance measures.
The method is demonstrated on 16 different datasets and validated using statistical tests, showing improved faithfulness across multiple explanation techniques.

Plain English Explanation

Artificial intelligence (AI) models, especially those used for natural language processing (NLP), can be complex and difficult to understand. Researchers often try to explain how these models make decisions by identifying the most important words or "tokens" that contribute to a particular prediction.

Unfortunately, these explanations can be misleading, even if they seem convincing. To address this issue, the researchers propose a new way to measure how faithful or accurate these explanations really are. The key idea is that if a token is truly important, then masking or hiding that token should make the model's performance worse.

However, simply masking tokens can create other problems, as the model may not be familiar with this "out-of-distribution" data. The researchers solve this by using a novel fine-tuning technique that incorporates masking into the training process, so the model becomes accustomed to seeing masked tokens.

With this approach, the researchers can more accurately measure the faithfulness of various explanation methods across 16 different datasets and validate their results using statistical tests. They find that their method makes it easier and cheaper to optimize the explanations for maximum faithfulness, essentially making the model "inherently explainable."

Technical Explanation

The researchers address the challenge of measuring the faithfulness of NLP model explanations, such as importance measures that highlight the most influential tokens for a prediction. They note that these explanations are often misleading, even if they appear convincing.

To address this, the researchers propose a novel fine-tuning method that incorporates token masking into the training process. This ensures that masked tokens become "in-distribution" for the model, rather than being out-of-distribution as in existing approaches.

This in-distribution masking allows the researchers to use more reliable faithfulness metrics, such as measuring the drop in model performance when important tokens are masked. Other faithfulness metrics, such as those based on concept-based explanations or watermarking, are also enhanced by the in-distribution masking.

The researchers demonstrate the generality of their approach by applying it to 16 different datasets and validating the results using statistical tests. They find that their method makes it easier and cheaper to optimize explanations for maximum faithfulness, effectively making the model "indirectly inherently explainable."

Critical Analysis

The researchers' approach to measuring the faithfulness of NLP model explanations is a significant advance in the field. By addressing the key challenges of out-of-distribution issues and computational expense associated with existing methods, the researchers have developed a more reliable and accessible way to evaluate the accuracy of these explanations.

One potential limitation of the research is that it focuses primarily on the technical aspects of faithfulness measurement, without delving deeply into the broader implications or societal consequences of misleading model explanations. For example, the researchers do not discuss how their method could be used to improve the calibration of confidence estimates in NLP models, or how it could help address fairness and representation issues in these models.

Additionally, while the researchers demonstrate the generality of their approach across 16 datasets, it would be valuable to see how the method performs on a wider range of NLP tasks and model architectures, including more complex, state-of-the-art models. This could help validate the robustness and versatility of the proposed technique.

Overall, the researchers have made a significant contribution to the field of NLP model interpretability and explainability. Their work provides a valuable tool for researchers and practitioners to more accurately evaluate the faithfulness of model explanations, paving the way for more trustworthy and transparent AI systems.

Conclusion

The researchers have developed a novel approach to measuring the faithfulness of explanations for natural language processing (NLP) models. By addressing key challenges with existing faithfulness metrics, such as out-of-distribution issues and computational expense, the proposed method provides a more reliable and accessible way to evaluate the accuracy of these explanations.

The researchers demonstrate the generality of their approach across 16 different datasets and validate the results using statistical tests. Their work shows that by making masked tokens "in-distribution" through a novel fine-tuning technique, importance measures and other faithfulness metrics become more consistently faithful.

This advance in faithfulness measurement has important implications for the development of more transparent and trustworthy AI systems, as it enables researchers and practitioners to optimize model explanations for maximum faithfulness. The researchers' work represents a significant step forward in the field of NLP model interpretability and explainability.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Faithfulness Measurable Masked Language Models

Andreas Madsen, Siva Reddy, Sarath Chandar

A common approach to explaining NLP models is to use importance measures that express which tokens are important for a prediction. Unfortunately, such explanations are often wrong despite being persuasive. Therefore, it is essential to measure their faithfulness. One such metric is if tokens are truly important, then masking them should result in worse model performance. However, token masking introduces out-of-distribution issues, and existing solutions that address this are computationally expensive and employ proxy models. Furthermore, other metrics are very limited in scope. This work proposes an inherently faithfulness measurable model that addresses these challenges. This is achieved using a novel fine-tuning method that incorporates masking, such that masking tokens become in-distribution by design. This differs from existing approaches, which are completely model-agnostic but are inapplicable in practice. We demonstrate the generality of our approach by applying it to 16 different datasets and validate it using statistical in-distribution tests. The faithfulness is then measured with 9 different importance measures. Because masking is in-distribution, importance measures that themselves use masking become consistently more faithful. Additionally, because the model makes faithfulness cheap to measure, we can optimize explanations towards maximal faithfulness; thus, our model becomes indirectly inherently explainable.

8/29/2024

💬

Robust Infidelity: When Faithfulness Measures on Masked Language Models Are Misleading

Evan Crothers, Herna Viktor, Nathalie Japkowicz

A common approach to quantifying neural text classifier interpretability is to calculate faithfulness metrics based on iteratively masking salient input tokens and measuring changes in the model prediction. We propose that this property is better described as sensitivity to iterative masking, and highlight pitfalls in using this measure for comparing text classifier interpretability. We show that iterative masking produces large variation in faithfulness scores between otherwise comparable Transformer encoder text classifiers. We then demonstrate that iteratively masked samples produce embeddings outside the distribution seen during training, resulting in unpredictable behaviour. We further explore task-specific considerations that undermine principled comparison of interpretability using iterative masking, such as an underlying similarity to salience-based adversarial attacks. Our findings give insight into how these behaviours affect neural text classifiers, and provide guidance on how sensitivity to iterative masking should be interpreted.

6/4/2024

🔍

Faithfulness and the Notion of Adversarial Sensitivity in NLP Explanations

Supriya Manna, Niladri Sett

Faithfulness is arguably the most critical metric to assess the reliability of explainable AI. In NLP, current methods for faithfulness evaluation are fraught with discrepancies and biases, often failing to capture the true reasoning of models. We introduce Adversarial Sensitivity as a novel approach to faithfulness evaluation, focusing on the explainer's response when the model is under adversarial attack. Our method accounts for the faithfulness of explainers by capturing sensitivity to adversarial input changes. This work addresses significant limitations in existing evaluation techniques, and furthermore, quantifies faithfulness from a crucial yet underexplored paradigm.

9/27/2024

Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models

Sepehr Kamahi, Yadollah Yaghoobzadeh

Despite the widespread adoption of autoregressive language models, explainability evaluation research has predominantly focused on span infilling and masked language models (MLMs). Evaluating the faithfulness of an explanation method -- how accurately the method explains the inner workings and decision-making of the model -- is very challenging because it is very hard to separate the model from its explanation. Most faithfulness evaluation techniques corrupt or remove some input tokens considered important according to a particular attribution (feature importance) method and observe the change in the model's output. This approach creates out-of-distribution inputs for causal language models (CLMs) due to their training objective of next token prediction. In this study, we propose a technique that leverages counterfactual generation to evaluate the faithfulness of attribution methods for autoregressive language modeling scenarios. Our technique creates fluent and in-distribution counterfactuals that makes evaluation protocol more reliable. Code is available at https://github.com/Sepehr-Kamahi/faith

8/22/2024