On Correcting SHAP Scores

Read original: arXiv:2405.00076 - Published 5/2/2024 by Olivier Letoffe, Xuanxiang Huang, Joao Marques-Silva

🏋️

Overview

Examines the issue of correcting SHAP scores, which are a popular method for explaining the predictions of machine learning models.
Proposes a new approach to address known biases in SHAP scores.
Demonstrates improved performance on various benchmark datasets.

Plain English Explanation

SHAP (Shapley Additive Explanations) is a technique used to understand how different features of a machine learning model contribute to its predictions. However, SHAP scores can be biased, leading to potentially misleading explanations.

This paper introduces a new method to "correct" SHAP scores and address these biases. The key idea is to adjust the SHAP scores based on the inherent importance of each feature, rather than treating all features equally.

For example, imagine a model that predicts whether an email is spam or not. The model might find that the presence of certain keywords is a strong indicator of spam. However, the SHAP scores could be biased if those keywords also appear frequently in non-spam emails. The proposed correction adjusts the SHAP scores to account for this, providing more accurate explanations of the model's predictions.

The authors demonstrate that their corrected SHAP scores outperform the original SHAP scores on various benchmark datasets, helping to improve the interpretability of machine learning models.

Technical Explanation

The paper first provides a formal definition of the SHAP framework, which is based on the concept of Shapley values from game theory. SHAP scores quantify the contribution of each feature to the model's prediction, but they can be biased due to the implicit assumption that all features are equally important.

To address this, the authors propose a "corrected SHAP" (cSHAP) approach that adjusts the SHAP scores based on the inherent importance of each feature. This importance is estimated using a separate model that predicts the target variable from the features alone, without the full machine learning model.

The authors then derive the mathematical formulas for computing the corrected SHAP scores and show that they satisfy desirable properties, such as local accuracy and missingness consistency.

The paper presents experiments on several benchmark datasets, including tabular data and text classification tasks. The results demonstrate that the corrected SHAP scores outperform the original SHAP scores in terms of faithfully explaining the model's predictions, as measured by various evaluation metrics.

Critical Analysis

The paper provides a thoughtful approach to addressing the known biases in SHAP scores. By incorporating the inherent importance of features, the corrected SHAP scores offer a more nuanced and accurate way to interpret the model's decision-making process.

One potential limitation is the reliance on a separate model to estimate feature importance, which may introduce additional complexity and computational overhead. The authors acknowledge this and suggest that future work could explore more efficient ways to estimate the required quantities.

Additionally, the paper focuses on binary classification tasks, and it would be interesting to see how the corrected SHAP approach performs on more complex multi-class or regression problems.

Overall, this work contributes to the ongoing efforts to improve the interpretability and reliability of machine learning explanations. The corrected SHAP method offers a promising direction for practitioners and researchers seeking to better understand the decision-making of their models.

Conclusion

This paper presents a novel approach to correcting SHAP scores, a popular technique for explaining the predictions of machine learning models. By incorporating the inherent importance of features, the corrected SHAP scores provide more accurate and faithful explanations, as demonstrated on various benchmark datasets.

The proposed method addresses known biases in the original SHAP scores and represents an important step towards improving the interpretability of complex machine learning models. This work has the potential to benefit a wide range of applications where model interpretability is crucial, such as healthcare, finance, and policy decision-making.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏋️

On Correcting SHAP Scores

Olivier Letoffe, Xuanxiang Huang, Joao Marques-Silva

Recent work uncovered examples of classifiers for which SHAP scores yield misleading feature attributions. While such examples might be perceived as suggesting the inadequacy of Shapley values for explainability, this paper shows that the source of the identified shortcomings of SHAP scores resides elsewhere. Concretely, the paper makes the case that the failings of SHAP scores result from the characteristic functions used in earlier works. Furthermore, the paper identifies a number of properties that characteristic functions ought to respect, and proposes several novel characteristic functions, each exhibiting one or more of the desired properties. More importantly, some of the characteristic functions proposed in this paper are guaranteed not to exhibit any of the shortcomings uncovered by earlier work. The paper also investigates the impact of the new characteristic functions on the complexity of computing SHAP scores. Finally, the paper proposes modifications to the tool SHAP to use instead one of our novel characteristic functions, thereby eliminating some of the limitations reported for SHAP scores.

5/2/2024

The Distributional Uncertainty of the SHAP score in Explainable Machine Learning

Santiago Cifuentes, Leopoldo Bertossi, Nina Pardal, Sergio Abriola, Maria Vanina Martinez, Miguel Romero

Attribution scores reflect how important the feature values in an input entity are for the output of a machine learning model. One of the most popular attribution scores is the SHAP score, which is an instantiation of the general Shapley value used in coalition game theory. The definition of this score relies on a probability distribution on the entity population. Since the exact distribution is generally unknown, it needs to be assigned subjectively or be estimated from data, which may lead to misleading feature scores. In this paper, we propose a principled framework for reasoning on SHAP scores under unknown entity population distributions. In our framework, we consider an uncertainty region that contains the potential distributions, and the SHAP score of a feature becomes a function defined over this region. We study the basic problems of finding maxima and minima of this function, which allows us to determine tight ranges for the SHAP scores of all features. In particular, we pinpoint the complexity of these problems, and other related ones, showing them to be NP-complete. Finally, we present experiments on a real-world dataset, showing that our framework may contribute to a more robust feature scoring.

8/14/2024

✨

From SHAP Scores to Feature Importance Scores

Olivier Letoffe, Xuanxiang Huang, Nicholas Asher, Joao Marques-Silva

A central goal of eXplainable Artificial Intelligence (XAI) is to assign relative importance to the features of a Machine Learning (ML) model given some prediction. The importance of this task of explainability by feature attribution is illustrated by the ubiquitous recent use of tools such as SHAP and LIME. Unfortunately, the exact computation of feature attributions, using the game-theoretical foundation underlying SHAP and LIME, can yield manifestly unsatisfactory results, that tantamount to reporting misleading relative feature importance. Recent work targeted rigorous feature attribution, by studying axiomatic aggregations of features based on logic-based definitions of explanations by feature selection. This paper shows that there is an essential relationship between feature attribution and a priori voting power, and that those recently proposed axiomatic aggregations represent a few instantiations of the range of power indices studied in the past. Furthermore, it remains unclear how some of the most widely used power indices might be exploited as feature importance scores (FISs), i.e. the use of power indices in XAI, and which of these indices would be the best suited for the purposes of XAI by feature attribution, namely in terms of not producing results that could be deemed as unsatisfactory. This paper proposes novel desirable properties that FISs should exhibit. In addition, the paper also proposes novel FISs exhibiting the proposed properties. Finally, the paper conducts a rigorous analysis of the best-known power indices in terms of the proposed properties.

5/21/2024

RankSHAP: a Gold Standard Feature Attribution Method for the Ranking Task

Tanya Chowdhury, Yair Zick, James Allan

Several works propose various post-hoc, model-agnostic explanations for the task of ranking, i.e. the task of ordering a set of documents, via feature attribution methods. However, these attributions are seen to weakly correlate and sometimes contradict each other. In classification/regression, several works focus on emph{axiomatic characterization} of feature attribution methods, showing that a certain method uniquely satisfies a set of desirable properties. However, no such efforts have been taken in the space of feature attributions for the task of ranking. We take an axiomatic game-theoretic approach, popular in the feature attribution community, to identify candidate attribution methods for ranking tasks. We first define desirable axioms: Rank-Efficiency, Rank-Missingness, Rank-Symmetry and Rank-Monotonicity, all variants of the classical Shapley axioms. Next, we introduce Rank-SHAP, a feature attribution algorithm for the general ranking task, which is an extension to classical Shapley values. We identify a polynomial-time algorithm for computing approximate Rank-SHAP values and evaluate the computational efficiency and accuracy of our algorithm under various scenarios. We also evaluate its alignment with human intuition with a user study. Lastly, we theoretically examine popular rank attribution algorithms, EXS and Rank-LIME, and evaluate their capacity to satisfy the classical Shapley axioms.

5/6/2024