Achieving interpretable machine learning by functional decomposition of black-box models into explainable predictor effects

David Kohler (Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn), David Rugamer (Department of Statistics, LMU Munich, Munich Center for Machine Learning), Matthias Schmid (Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn)

Machine learning (ML) has seen significant growth in both popularity and importance. The high prediction accuracy of ML models is often achieved through complex black-box architectures that are difficult to interpret. This interpretability problem has been hindering the use of ML in fields like medicine, ecology and insurance, where an understanding of the inner workings of the model is paramount to ensure user acceptance and fairness. The need for interpretable ML models has boosted research in the field of interpretable machine learning (IML). Here we propose a novel approach for the functional decomposition of black-box predictions, which is considered a core concept of IML. The idea of our method is to replace the prediction function by a surrogate model consisting of simpler subfunctions. Similar to additive regression models, these functions provide insights into the direction and strength of the main feature contributions and their interactions. Our method is based on a novel concept termed stacked orthogonality, which ensures that the main effects capture as much functional behavior as possible and do not contain information explained by higher-order interactions. Unlike earlier functional IML approaches, it is neither affected by extrapolation nor by hidden feature interactions. To compute the subfunctions, we propose an algorithm based on neural additive modeling and an efficient post-hoc orthogonalization procedure.

7/29/2024

🤯

Scientific Inference With Interpretable Machine Learning: Analyzing Models to Learn About Real-World Phenomena

Timo Freiesleben, Gunnar Konig, Christoph Molnar, Alvaro Tejero-Cantero

To learn about real world phenomena, scientists have traditionally used models with clearly interpretable elements. However, modern machine learning (ML) models, while powerful predictors, lack this direct elementwise interpretability (e.g. neural network weights). Interpretable machine learning (IML) offers a solution by analyzing models holistically to derive interpretations. Yet, current IML research is focused on auditing ML models rather than leveraging them for scientific inference. Our work bridges this gap, presenting a framework for designing IML methods-termed 'property descriptors' -- that illuminate not just the model, but also the phenomenon it represents. We demonstrate that property descriptors, grounded in statistical learning theory, can effectively reveal relevant properties of the joint probability distribution of the observational data. We identify existing IML methods suited for scientific inference and provide a guide for developing new descriptors with quantified epistemic uncertainty. Our framework empowers scientists to harness ML models for inference, and provides directions for future IML research to support scientific understanding.

7/16/2024

Integrating White and Black Box Techniques for Interpretable Machine Learning

Eric M. Vernon, Naoki Masuyama, Yusuke Nojima

In machine learning algorithm design, there exists a trade-off between the interpretability and performance of the algorithm. In general, algorithms which are simpler and easier for humans to comprehend tend to show worse performance than more complex, less transparent algorithms. For example, a random forest classifier is likely to be more accurate than a simple decision tree, but at the expense of interpretability. In this paper, we present an ensemble classifier design which classifies easier inputs using a highly-interpretable classifier (i.e., white box model), and more difficult inputs using a more powerful, but less interpretable classifier (i.e., black box model).

7/15/2024

A Critical Assessment of Interpretable and Explainable Machine Learning for Intrusion Detection

Omer Subasi, Johnathan Cree, Joseph Manzano, Elena Peterson

There has been a large number of studies in interpretable and explainable ML for cybersecurity, in particular, for intrusion detection. Many of these studies have significant amount of overlapping and repeated evaluations and analysis. At the same time, these studies overlook crucial model, data, learning process, and utility related issues and many times completely disregard them. These issues include the use of overly complex and opaque ML models, unaccounted data imbalances and correlated features, inconsistent influential features across different explanation methods, the inconsistencies stemming from the constituents of a learning process, and the implausible utility of explanations. In this work, we empirically demonstrate these issues, analyze them and propose practical solutions in the context of feature-based model explanations. Specifically, we advise avoiding complex opaque models such as Deep Neural Networks and instead using interpretable ML models such as Decision Trees as the available intrusion datasets are not difficult for such interpretable models to classify successfully. Then, we bring attention to the binary classification metrics such as Matthews Correlation Coefficient (which are well-suited for imbalanced datasets. Moreover, we find that feature-based model explanations are most often inconsistent across different settings. In this respect, to further gauge the extent of inconsistencies, we introduce the notion of cross explanations which corroborates that the features that are determined to be impactful by one explanation method most often differ from those by another method. Furthermore, we show that strongly correlated data features and the constituents of a learning process, such as hyper-parameters and the optimization routine, become yet another source of inconsistent explanations. Finally, we discuss the utility of feature-based explanations.

7/8/2024

Achieving interpretable machine learning by functional decomposition of black-box models into explainable predictor effects

Overview

Plain English Explanation

Technical Explanation

Critical Analysis

Conclusion

Related Papers