Explaining Probabilistic Models with Distributional Values

Read original: arXiv:2402.09947 - Published 6/17/2024 by Luca Franceschi, Michele Donini, C'edric Archambeau, Matthias Seeger

🧠

Overview

This paper addresses a gap between what machine learning models explain (e.g., the scalar probability of a class) and what users wish to understand (e.g., the output of a classifier).
The authors generalize cooperative game theory and value operators to introduce "distributional values" - random variables that track changes in model output (e.g., flipping of the predicted class).
They derive analytical expressions for distributional values for models with Gaussian, Bernoulli, and Categorical payoffs, and show that their framework provides fine-grained and insightful explanations.

Plain English Explanation

Machine learning models can be complex and opaque, making it difficult to understand how they arrive at their predictions. A large branch of explainable AI aims to address this issue using game theory concepts. However, the authors argue that game-theoretic explanations may sometimes be misleading or hard to interpret.

The key problem is that current explanation methods, such as SHAP, tend to explain the scalar probability of a class, rather than the actual output of the classifier that the user cares about. To bridge this gap, the authors introduce the concept of "distributional values" - random variables that track changes in the model's output, like the flipping of the predicted class.

The authors derive analytical expressions for distributional values for different types of models, such as those with Gaussian, Bernoulli, or Categorical outputs. They show that this approach provides more detailed and insightful explanations, which can be helpful for understanding the behavior of complex machine learning models, especially in domains like computer vision and natural language processing.

Technical Explanation

The paper proposes a framework for explaining the outputs of probabilistic machine learning models using a generalization of cooperative game theory. The authors introduce the concept of "distributional values" - random variables that track changes in the model's output, such as the flipping of the predicted class.

The key innovation is the derivation of analytical expressions for distributional values in the context of games with Gaussian, Bernoulli, and Categorical payoffs. This allows the framework to provide fine-grained explanations that go beyond the scalar probabilities typically explained by methods like SHAP.

The authors establish several characterizing properties of their distributional values and demonstrate the utility of their approach through case studies on vision and language models. The framework is shown to offer insightful explanations that can help users better understand the behavior of complex machine learning systems.

Critical Analysis

The paper makes a compelling case for the limitations of current game-theoretic explanation methods and the need for a more nuanced approach that can capture changes in model outputs. The authors' proposed framework for distributional values appears to be a promising step in this direction.

However, the paper does not address potential drawbacks or challenges in implementing this framework. For example, the analytical expressions derived for different model types may become intractable for large, complex models, limiting their practical applicability. Additionally, the interpretability and usefulness of the distributional value explanations for end-users may vary depending on the specific application and user needs.

Further research is needed to explore the trade-offs between the level of detail provided by the distributional values and the cognitive burden on users. There may also be opportunities to combine this approach with other explanation techniques to provide a more comprehensive understanding of model behavior.

Conclusion

This paper presents a novel framework for explaining the outputs of probabilistic machine learning models using a generalization of cooperative game theory. By introducing the concept of "distributional values" - random variables that track changes in model outputs - the authors aim to bridge the gap between what current explanation methods provide and what users actually wish to understand.

The analytical expressions derived for distributional values in different model types, such as Gaussian, Bernoulli, and Categorical, demonstrate the potential of this approach to offer fine-grained and insightful explanations. While further research is needed to address practical challenges and user needs, this work represents an important step forward in the field of explainable AI.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🧠

Explaining Probabilistic Models with Distributional Values

Luca Franceschi, Michele Donini, C'edric Archambeau, Matthias Seeger

A large branch of explainable machine learning is grounded in cooperative game theory. However, research indicates that game-theoretic explanations may mislead or be hard to interpret. We argue that often there is a critical mismatch between what one wishes to explain (e.g. the output of a classifier) and what current methods such as SHAP explain (e.g. the scalar probability of a class). This paper addresses such gap for probabilistic models by generalising cooperative games and value operators. We introduce the distributional values, random variables that track changes in the model output (e.g. flipping of the predicted class) and derive their analytic expressions for games with Gaussian, Bernoulli and Categorical payoffs. We further establish several characterising properties, and show that our framework provides fine-grained and insightful explanations with case studies on vision and language models.

6/17/2024

The Distributional Uncertainty of the SHAP score in Explainable Machine Learning

Santiago Cifuentes, Leopoldo Bertossi, Nina Pardal, Sergio Abriola, Maria Vanina Martinez, Miguel Romero

Attribution scores reflect how important the feature values in an input entity are for the output of a machine learning model. One of the most popular attribution scores is the SHAP score, which is an instantiation of the general Shapley value used in coalition game theory. The definition of this score relies on a probability distribution on the entity population. Since the exact distribution is generally unknown, it needs to be assigned subjectively or be estimated from data, which may lead to misleading feature scores. In this paper, we propose a principled framework for reasoning on SHAP scores under unknown entity population distributions. In our framework, we consider an uncertainty region that contains the potential distributions, and the SHAP score of a feature becomes a function defined over this region. We study the basic problems of finding maxima and minima of this function, which allows us to determine tight ranges for the SHAP scores of all features. In particular, we pinpoint the complexity of these problems, and other related ones, showing them to be NP-complete. Finally, we present experiments on a real-world dataset, showing that our framework may contribute to a more robust feature scoring.

8/14/2024

Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation

Guy Amir, Shahaf Bassan, Guy Katz

The ability to interpret Machine Learning (ML) models is becoming increasingly essential. However, despite significant progress in the field, there remains a lack of rigorous characterization regarding the innate interpretability of different models. In an attempt to bridge this gap, recent work has demonstrated that it is possible to formally assess interpretability by studying the computational complexity of explaining the decisions of various models. In this setting, if explanations for a particular model can be obtained efficiently, the model is considered interpretable (since it can be explained ``easily''). However, if generating explanations over an ML model is computationally intractable, it is considered uninterpretable. Prior research identified two key factors that influence the complexity of interpreting an ML model: (i) the type of the model (e.g., neural networks, decision trees, etc.); and (ii) the form of explanation (e.g., contrastive explanations, Shapley values, etc.). In this work, we claim that a third, important factor must also be considered for this analysis -- the underlying distribution over which the explanation is obtained. Considering the underlying distribution is key in avoiding explanations that are socially misaligned, i.e., convey information that is biased and unhelpful to users. We demonstrate the significant influence of the underlying distribution on the resulting overall interpretation complexity, in two settings: (i) prediction models paired with an external out-of-distribution (OOD) detector; and (ii) prediction models designed to inherently generate socially aligned explanations. Our findings prove that the expressiveness of the distribution can significantly influence the overall complexity of interpretation, and identify essential prerequisites that a model must possess to generate socially aligned explanations.

8/9/2024

Explaining a probabilistic prediction on the simplex with Shapley compositions

Paul-Gauthier No'e, Miquel Perell'o-Nieto, Jean-Franc{c}ois Bonastre, Peter Flach

Originating in game theory, Shapley values are widely used for explaining a machine learning model's prediction by quantifying the contribution of each feature's value to the prediction. This requires a scalar prediction as in binary classification, whereas a multiclass probabilistic prediction is a discrete probability distribution, living on a multidimensional simplex. In such a multiclass setting the Shapley values are typically computed separately on each class in a one-vs-rest manner, ignoring the compositional nature of the output distribution. In this paper, we introduce Shapley compositions as a well-founded way to properly explain a multiclass probabilistic prediction, using the Aitchison geometry from compositional data analysis. We prove that the Shapley composition is the unique quantity satisfying linearity, symmetry and efficiency on the Aitchison simplex, extending the corresponding axiomatic properties of the standard Shapley value. We demonstrate this proper multiclass treatment in a range of scenarios.

8/6/2024