From Model Explanation to Data Misinterpretation: Uncovering the Pitfalls of Post Hoc Explainers in Business Research

Read original: arXiv:2408.16987 - Published 9/2/2024 by Ronilo Ragodos (Jeffrey), Tong Wang (Jeffrey), Lu Feng (Jeffrey), Yu (Jeffrey), Hu

📈

Overview

Machine learning models are increasingly used in business research, but many are "black boxes" that are difficult to understand.
Post hoc explainers have become popular for providing explanations of these models, such as estimating the importance of input features.
However, there is a trend in business research of using these post hoc explanations to draw conclusions about the underlying data, which may not be valid.

Plain English Explanation

Machine learning models are computer programs that can learn patterns from data and make predictions. They are widely used in business research to help make decisions. However, many of the most powerful machine learning models, like deep neural networks and XGBoost, are "black boxes" - it's hard for humans to understand exactly how they work and why they make the predictions they do.

To help address this, researchers have developed "post hoc explainers" - tools that can analyze a machine learning model after it has been trained and provide explanations, such as estimating how much each input feature contributes to the model's predictions. These explanations are intended to help users understand the model.

However, the researchers who wrote this paper found that business researchers are increasingly using these post hoc explanations not just to understand the model, but to draw conclusions about the underlying data - for example, saying that a certain input feature has a certain "effect" on the output. The paper investigates whether this practice is valid - in other words, whether the explanations provided by popular post hoc tools like SHAP and LIME actually align with the true relationships in the data.

Technical Explanation

The researchers conducted extensive experiments to evaluate whether the explanations provided by SHAP and LIME correctly reflect the "marginal effects" - the true relationships between the inputs (X) and outputs (Y) - in the underlying data. They identified several factors that influence how well the explanations align with the true data, such as the complexity of the machine learning model and the distribution of the data.

Based on their findings, the researchers propose some strategies that can help improve the alignment between post hoc explanations and the true data relationships. These include using a perturbation-based approach to generate the explanations, and considering the formal foundations and priorities when choosing which post hoc explainer to use.

Critical Analysis

The researchers acknowledge that despite these mitigation strategies, it is often still not appropriate to infer insights about the data directly from post hoc explanations. The explanations can be misleading, especially for complex machine learning models and non-linear relationships in the data.

The researchers encourage business researchers to use post hoc explanations more cautiously - not to draw conclusions about the data, but rather to propose hypotheses that can then be empirically tested. They also suggest focusing on other use cases for post hoc explainers, such as facilitating model debugging and feature engineering.

Overall, the paper raises important concerns about the validity of using post hoc explanations to gain insights about the underlying data, and provides guidance on more appropriate ways to leverage these tools in business research.

Conclusion

This research paper cautions against the growing practice of using post hoc explanations of machine learning models to directly infer insights about the underlying data. While these explainers can be useful for understanding the models themselves, the researchers demonstrate that the explanations do not necessarily align with the true relationships in the data.

The paper provides strategies to improve the data-alignment of post hoc explanations, but ultimately concludes that their primary purpose should be to facilitate hypothesis generation, model debugging, and other model-centric tasks - not to draw conclusions about the data. Business researchers are encouraged to be cautious when interpreting post hoc explanations, and to focus on empirical investigation of hypotheses rather than relying on the explanations alone.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📈

From Model Explanation to Data Misinterpretation: Uncovering the Pitfalls of Post Hoc Explainers in Business Research

Ronilo Ragodos (Jeffrey), Tong Wang (Jeffrey), Lu Feng (Jeffrey), Yu (Jeffrey), Hu

Machine learning models have been increasingly used in business research. However, most state-of-the-art machine learning models, such as deep neural networks and XGBoost, are black boxes in nature. Therefore, post hoc explainers that provide explanations for machine learning models by, for example, estimating numerical importance of the input features, have been gaining wide usage. Despite the intended use of post hoc explainers being explaining machine learning models, we found a growing trend in business research where post hoc explanations are used to draw inferences about the data. In this work, we investigate the validity of such use. Specifically, we investigate with extensive experiments whether the explanations obtained by the two most popular post hoc explainers, SHAP and LIME, provide correct information about the true marginal effects of X on Y in the data, which we call data-alignment. We then identify what factors influence the alignment of explanations. Finally, we propose a set of mitigation strategies to improve the data-alignment of explanations and demonstrate their effectiveness with real-world data in an econometric context. In spite of this effort, we nevertheless conclude that it is often not appropriate to infer data insights from post hoc explanations. We articulate appropriate alternative uses, the most important of which is to facilitate the proposition and subsequent empirical investigation of hypotheses. The ultimate goal of this paper is to caution business researchers against translating post hoc explanations of machine learning models into potentially false insights and understanding of data.

9/2/2024

Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference

Catherine Huang, Martin Pawelczyk, Himabindu Lakkaraju

Predictive machine learning models are becoming increasingly deployed in high-stakes contexts involving sensitive personal data; in these contexts, there is a trade-off between model explainability and data privacy. In this work, we push the boundaries of this trade-off: with a focus on foundation models for image classification fine-tuning, we reveal unforeseen privacy risks of post-hoc model explanations and subsequently offer mitigation strategies for such risks. First, we construct VAR-LRT and L1/L2-LRT, two new membership inference attacks based on feature attribution explanations that are significantly more successful than existing explanation-leveraging attacks, particularly in the low false-positive rate regime that allows an adversary to identify specific training set members with confidence. Second, we find empirically that optimized differentially private fine-tuning substantially diminishes the success of the aforementioned attacks, while maintaining high model accuracy. We carry out a systematic empirical investigation of our 2 new attacks with 5 vision transformer architectures, 5 benchmark datasets, 4 state-of-the-art post-hoc explanation methods, and 4 privacy strength settings.

7/29/2024

Unified Explanations in Machine Learning Models: A Perturbation Approach

Jacob Dineen, Don Kridel, Daniel Dolk, David Castillo

A high-velocity paradigm shift towards Explainable Artificial Intelligence (XAI) has emerged in recent years. Highly complex Machine Learning (ML) models have flourished in many tasks of intelligence, and the questions have started to shift away from traditional metrics of validity towards something deeper: What is this model telling me about my data, and how is it arriving at these conclusions? Inconsistencies between XAI and modeling techniques can have the undesirable effect of casting doubt upon the efficacy of these explainability approaches. To address these problems, we propose a systematic, perturbation-based analysis against a popular, model-agnostic method in XAI, SHapley Additive exPlanations (Shap). We devise algorithms to generate relative feature importance in settings of dynamic inference amongst a suite of popular machine learning and deep learning methods, and metrics that allow us to quantify how well explanations generated under the static case hold. We propose a taxonomy for feature importance methodology, measure alignment, and observe quantifiable similarity amongst explanation models across several datasets.

5/31/2024

Revisiting the robustness of post-hoc interpretability methods

Jiawen Wei, Hugues Turb'e, Gianmarco Mengaldo

Post-hoc interpretability methods play a critical role in explainable artificial intelligence (XAI), as they pinpoint portions of data that a trained deep learning model deemed important to make a decision. However, different post-hoc interpretability methods often provide different results, casting doubts on their accuracy. For this reason, several evaluation strategies have been proposed to understand the accuracy of post-hoc interpretability. Many of these evaluation strategies provide a coarse-grained assessment -- i.e., they evaluate how the performance of the model degrades on average by corrupting different data points across multiple samples. While these strategies are effective in selecting the post-hoc interpretability method that is most reliable on average, they fail to provide a sample-level, also referred to as fine-grained, assessment. In other words, they do not measure the robustness of post-hoc interpretability methods. We propose an approach and two new metrics to provide a fine-grained assessment of post-hoc interpretability methods. We show that the robustness is generally linked to its coarse-grained performance.

7/30/2024