The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

Read original: arXiv:2202.01602 - Published 7/9/2024 by Satyapriya Krishna, Tessa Han, Alex Gu, Steven Wu, Shahin Jabbari, Himabindu Lakkaraju

⚙️

Overview

This paper explores the problem of disagreement between different methods used to explain complex machine learning models.
As the use of post-hoc explanation methods (methods that explain a model's predictions after the fact) becomes more common in high-stakes settings, it's critical to understand if and when these explanations disagree with each other, and how practitioners resolve such disagreements.
However, there has been little research into these issues, which this paper aims to address.

Plain English Explanation

Machine learning models are often very complex, making it difficult to understand how they arrive at their predictions. To address this, researchers have developed various "post-hoc explanation methods" that try to explain the reasoning behind a model's predictions after the fact.

As these explanation methods are increasingly used in high-stakes settings, like medical diagnosis or loan approvals, it's important to understand if the explanations they provide disagree with each other, and how practitioners (the people using these models) handle such disagreements. For example, if one explanation method says a model's prediction was based on age, while another says it was based on gender, that's a disagreement that could lead to very different interpretations and decisions.

However, there hasn't been much research into this "disagreement problem" in explainable machine learning. This paper aims to change that by:

Formalizing the concept of disagreement between explanations
Measuring how often such disagreements occur in practice
Understanding how practitioners resolve these disagreements

The researchers conducted interviews with data scientists, developed a framework to quantify disagreement, and then analyzed real-world datasets, models, and explanation methods to see how often explanations disagree. They also did an online study to understand how practitioners deal with these disagreements.

Technical Explanation

The researchers first conducted interviews with data scientists to understand what they consider to be disagreement between explanations generated by different methods. Based on this, they developed a novel quantitative framework to formalize the notion of disagreement.

They then leveraged this framework to carry out a rigorous empirical analysis. They used four real-world datasets, six state-of-the-art post-hoc explanation methods (like LIME and SHAP), and six different predictive models. This allowed them to measure the extent of disagreement between the explanations generated by these popular explanation techniques.

Additionally, the researchers conducted an online user study with data scientists to understand how they resolve disagreements between explanations in practice. This revealed that practitioners often rely on ad-hoc heuristics when faced with such disagreements.

Critical Analysis

The paper makes an important contribution by highlighting the critical issue of disagreement between explanations in high-stakes machine learning applications. The rigorous empirical analysis provides clear evidence that state-of-the-art explanation methods often disagree, and the user study reveals concerning practices in how practitioners handle these disagreements.

However, the paper does not delve into the potential reasons behind the observed disagreements. It would be helpful to understand if certain explanation methods are more prone to disagreement than others, or if the type of model or dataset plays a role. Additionally, the paper does not propose any principled frameworks or guidelines for practitioners to effectively evaluate and compare explanations.

Furthermore, the paper acknowledges that the user study was conducted online with a limited number of participants. It would be valuable to supplement these findings with more in-depth, qualitative interviews to better understand the decision-making processes and challenges faced by practitioners in real-world settings.

Conclusion

This paper sheds light on a critical, yet overlooked, issue in the field of explainable machine learning. The results suggest that practitioners may be relying on misleading explanations when making important decisions, as the various explanation methods they use often disagree with each other. This underscores the need for more rigorous evaluation and comparison of explanation techniques, as well as the development of principled frameworks to help practitioners navigate and resolve such disagreements. Addressing these challenges will be crucial as the use of complex machine learning models continues to expand into high-stakes domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

⚙️

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

Satyapriya Krishna, Tessa Han, Alex Gu, Steven Wu, Shahin Jabbari, Himabindu Lakkaraju

As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we introduce and study the disagreement problem in explainable machine learning. More specifically, we formalize the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how practitioners resolve these disagreements. We first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction and introduce a novel quantitative framework to formalize this understanding. We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets, six state-of-the-art post hoc explanation methods, and six different predictive models, to measure the extent of disagreement between the explanations generated by various popular explanation methods. In addition, we carry out an online user study with data scientists to understand how they resolve the aforementioned disagreements. Our results indicate that (1) state-of-the-art explanation methods often disagree in terms of the explanations they output, and (2) machine learning practitioners often employ ad hoc heuristics when resolving such disagreements. These findings suggest that practitioners may be relying on misleading explanations when making consequential decisions. They also underscore the importance of developing principled frameworks for effectively evaluating and comparing explanations output by various explanation techniques.

7/9/2024

🤔

Understanding Prediction Discrepancies in Machine Learning Classifiers

Xavier Renard, Thibault Laugel, Marcin Detyniecki

A multitude of classifiers can be trained on the same data to achieve similar performances during test time, while having learned significantly different classification patterns. This phenomenon, which we call prediction discrepancies, is often associated with the blind selection of one model instead of another with similar performances. When making a choice, the machine learning practitioner has no understanding on the differences between models, their limits, where they agree and where they don't. But his/her choice will result in concrete consequences for instances to be classified in the discrepancy zone, since the final decision will be based on the selected classification pattern. Besides the arbitrary nature of the result, a bad choice could have further negative consequences such as loss of opportunity or lack of fairness. This paper proposes to address this question by analyzing the prediction discrepancies in a pool of best-performing models trained on the same data. A model-agnostic algorithm, DIG, is proposed to capture and explain discrepancies locally, to enable the practitioner to make the best educated decision when selecting a model by anticipating its potential undesired consequences. All the code to reproduce the experiments is available.

8/1/2024

Unified Explanations in Machine Learning Models: A Perturbation Approach

Jacob Dineen, Don Kridel, Daniel Dolk, David Castillo

A high-velocity paradigm shift towards Explainable Artificial Intelligence (XAI) has emerged in recent years. Highly complex Machine Learning (ML) models have flourished in many tasks of intelligence, and the questions have started to shift away from traditional metrics of validity towards something deeper: What is this model telling me about my data, and how is it arriving at these conclusions? Inconsistencies between XAI and modeling techniques can have the undesirable effect of casting doubt upon the efficacy of these explainability approaches. To address these problems, we propose a systematic, perturbation-based analysis against a popular, model-agnostic method in XAI, SHapley Additive exPlanations (Shap). We devise algorithms to generate relative feature importance in settings of dynamic inference amongst a suite of popular machine learning and deep learning methods, and metrics that allow us to quantify how well explanations generated under the static case hold. We propose a taxonomy for feature importance methodology, measure alignment, and observe quantifiable similarity amongst explanation models across several datasets.

5/31/2024

📈

From Model Explanation to Data Misinterpretation: Uncovering the Pitfalls of Post Hoc Explainers in Business Research

Ronilo Ragodos (Jeffrey), Tong Wang (Jeffrey), Lu Feng (Jeffrey), Yu (Jeffrey), Hu

Machine learning models have been increasingly used in business research. However, most state-of-the-art machine learning models, such as deep neural networks and XGBoost, are black boxes in nature. Therefore, post hoc explainers that provide explanations for machine learning models by, for example, estimating numerical importance of the input features, have been gaining wide usage. Despite the intended use of post hoc explainers being explaining machine learning models, we found a growing trend in business research where post hoc explanations are used to draw inferences about the data. In this work, we investigate the validity of such use. Specifically, we investigate with extensive experiments whether the explanations obtained by the two most popular post hoc explainers, SHAP and LIME, provide correct information about the true marginal effects of X on Y in the data, which we call data-alignment. We then identify what factors influence the alignment of explanations. Finally, we propose a set of mitigation strategies to improve the data-alignment of explanations and demonstrate their effectiveness with real-world data in an econometric context. In spite of this effort, we nevertheless conclude that it is often not appropriate to infer data insights from post hoc explanations. We articulate appropriate alternative uses, the most important of which is to facilitate the proposition and subsequent empirical investigation of hypotheses. The ultimate goal of this paper is to caution business researchers against translating post hoc explanations of machine learning models into potentially false insights and understanding of data.

9/2/2024