Probabilistically Plausible Counterfactual Explanations with Normalizing Flows

2405.17640

Published 5/29/2024 by Patryk Wielopolski, Oleksii Furman, Jerzy Stefanowski, Maciej Zik{e}ba

Probabilistically Plausible Counterfactual Explanations with Normalizing Flows

Abstract

We present PPCEF, a novel method for generating probabilistically plausible counterfactual explanations (CFs). PPCEF advances beyond existing methods by combining a probabilistic formulation that leverages the data distribution with the optimization of plausibility within a unified framework. Compared to reference approaches, our method enforces plausibility by directly optimizing the explicit density function without assuming a particular family of parametrized distributions. This ensures CFs are not only valid (i.e., achieve class change) but also align with the underlying data's probability density. For that purpose, our approach leverages normalizing flows as powerful density estimators to capture the complex high-dimensional data distribution. Furthermore, we introduce a novel loss that balances the trade-off between achieving class change and maintaining closeness to the original instance while also incorporating a probabilistic plausibility term. PPCEF's unconstrained formulation allows for efficient gradient-based optimization with batch processing, leading to orders of magnitude faster computation compared to prior methods. Moreover, the unconstrained formulation of PPCEF allows for the seamless integration of future constraints tailored to specific counterfactual properties. Finally, extensive evaluations demonstrate PPCEF's superiority in generating high-quality, probabilistically plausible counterfactual explanations in high-dimensional tabular settings. This makes PPCEF a powerful tool for not only interpreting complex machine learning models but also for improving fairness, accountability, and trust in AI systems.

Create account to get full access

Overview

This paper proposes a new method for generating probabilistically plausible counterfactual explanations using normalizing flows.
Counterfactual explanations are a type of interpretable machine learning that aim to provide explanations for model predictions by describing how the input would need to change to get a different output.
The authors address the challenge of generating counterfactual examples that are both plausible (realistic) and probabilistically valid (high likelihood under the data distribution).
Their approach uses normalizing flows, a powerful class of generative models, to capture the underlying data distribution and sample realistic counterfactual instances.

Plain English Explanation

Imagine you have a machine learning model that predicts whether someone will get approved for a loan. You want to understand

why

the model made a certain prediction - for example, why it denied a loan application. Counterfactual explanations provide a way to do this by showing how the input would need to change to get a different outcome.

For example, the counterfactual explanation might say: "If the applicant's income was $5,000 higher, the loan would have been approved." This gives the applicant a concrete understanding of what they could change to get approved.

The challenge is generating counterfactual examples that are

realistic

- they need to be plausible instances that could actually occur in the real world. Prior work has addressed this by imposing constraints or using generative models.

This paper takes a different approach using

normalizing flows

- a powerful type of generative model that can accurately capture the underlying distribution of the data. By sampling from this distribution, the authors can generate counterfactual examples that are both plausible and have a high likelihood under the true data distribution.

Technical Explanation

The key innovation of this paper is the use of normalizing flows to generate probabilistically plausible counterfactual explanations. Normalizing flows are a class of generative models that can learn a bijective mapping between the data space and a latent space with a simple distribution (e.g. Gaussian).

The authors train a normalizing flow model on the input data, allowing them to sample realistic counterfactual instances that have high likelihood under the true data distribution. This addresses the challenge of generating plausible counterfactuals, which has been a key limitation of prior work.

Alongside the normalizing flow model, the authors also train a separate classifier model to predict the outcome of interest (e.g. loan approval). They then use an optimization procedure to find counterfactual examples that achieve the desired outcome while remaining close to the original input in the latent space of the normalizing flow.

This approach has several advantages over previous methods, including causal-based approaches and structured prediction techniques. By leveraging the powerful density estimation capabilities of normalizing flows, the authors can generate counterfactuals that are both plausible and statistically valid.

Critical Analysis

The authors present a compelling approach for generating high-quality counterfactual explanations. However, there are a few potential limitations and areas for further research:

Interpretability: While the counterfactual examples generated by this method are probabilistically plausible, they may not be easily interpretable to end-users. Providing additional context or visualization around the changes required could improve the explanatory power.
Causal Considerations: The authors acknowledge that their approach does not explicitly model causal relationships, which can be important for generating meaningful counterfactuals. Integrating causal models or constraints could further improve the validity and usefulness of the generated explanations.
Scalability: The optimization procedure used to find counterfactuals may not scale well to high-dimensional or complex models. Investigating more efficient optimization techniques or alternative formulations could help improve the practicality of this approach.
Evaluation Metrics: The authors rely on proxy metrics like latent space distance to assess the plausibility of the counterfactuals. Developing more direct and comprehensive evaluation methods could help validate the effectiveness of this approach.

Overall, this paper presents an important step forward in the field of interpretable machine learning by addressing the challenge of generating realistic and probabilistically valid counterfactual explanations. The use of normalizing flows is a promising direction that merits further exploration and refinement.

Conclusion

This paper introduces a new method for generating probabilistically plausible counterfactual explanations using normalizing flows. By training a normalizing flow model to capture the underlying data distribution, the authors can sample realistic counterfactual instances that have high likelihood under the true data distribution.

This approach addresses a key limitation of prior work on counterfactual explanations, which struggled to generate examples that were both plausible and statistically valid. The authors demonstrate the effectiveness of their method on several benchmark datasets, showing that it can produce high-quality counterfactual explanations.

While the proposed technique has some potential limitations, it represents an important advancement in the field of interpretable machine learning. By providing users with realistic and probabilistically grounded explanations, this work has the potential to improve trust and transparency in complex AI systems across a variety of application domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels

Patryk Wielopolski, Oleksii Furman, Jerzy Stefanowski, Maciej Zik{e}ba

Growing regulatory and societal pressures demand increased transparency in AI, particularly in understanding the decisions made by complex machine learning models. Counterfactual Explanations (CFs) have emerged as a promising technique within Explainable AI (xAI), offering insights into individual model predictions. However, to understand the systemic biases and disparate impacts of AI models, it is crucial to move beyond local CFs and embrace global explanations, which offer a~holistic view across diverse scenarios and populations. Unfortunately, generating Global Counterfactual Explanations (GCEs) faces challenges in computational complexity, defining the scope of global, and ensuring the explanations are both globally representative and locally plausible. We introduce a novel unified approach for generating Local, Group-wise, and Global Counterfactual Explanations for differentiable classification models via gradient-based optimization to address these challenges. This framework aims to bridge the gap between individual and systemic insights, enabling a deeper understanding of model decisions and their potential impact on diverse populations. Our approach further innovates by incorporating a probabilistic plausibility criterion, enhancing actionability and trustworthiness. By offering a cohesive solution to the optimization and plausibility challenges in GCEs, our work significantly advances the interpretability and accountability of AI models, marking a step forward in the pursuit of transparent AI.

5/29/2024

cs.LG cs.AI

🧠

Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation

Junqi Jiang, Jianglin Lan, Francesco Leofante, Antonio Rago, Francesca Toni

Counterfactual Explanations (CEs) have received increasing interest as a major methodology for explaining neural network classifiers. Usually, CEs for an input-output pair are defined as data points with minimum distance to the input that are classified with a different label than the output. To tackle the established problem that CEs are easily invalidated when model parameters are updated (e.g. retrained), studies have proposed ways to certify the robustness of CEs under model parameter changes bounded by a norm ball. However, existing methods targeting this form of robustness are not sound or complete, and they may generate implausible CEs, i.e., outliers wrt the training dataset. In fact, no existing method simultaneously optimises for closeness and plausibility while preserving robustness guarantees. In this work, we propose Provably RObust and PLAusible Counterfactual Explanations (PROPLACE), a method leveraging on robust optimisation techniques to address the aforementioned limitations in the literature. We formulate an iterative algorithm to compute provably robust CEs and prove its convergence, soundness and completeness. Through a comparative experiment involving six baselines, five of which target robustness, we show that PROPLACE achieves state-of-the-art performances against metrics on three evaluation aspects.

4/5/2024

cs.LG cs.AI

Model-Based Counterfactual Explanations Incorporating Feature Space Attributes for Tabular Data

Yuta Sumiya, Hayaru shouno

Machine-learning models, which are known to accurately predict patterns from large datasets, are crucial in decision making. Consequently, counterfactual explanations-methods explaining predictions by introducing input perturbations-have become prominent. These perturbations often suggest ways to alter the predictions, leading to actionable recommendations. However, the current techniques require resolving the optimization problems for each input change, rendering them computationally expensive. In addition, traditional encoding methods inadequately address the perturbations of categorical variables in tabular data. Thus, this study propose FastDCFlow, an efficient counterfactual explanation method using normalizing flows. The proposed method captures complex data distributions, learns meaningful latent spaces that retain proximity, and improves predictions. For categorical variables, we employed TargetEncoding, which respects ordinal relationships and includes perturbation costs. The proposed method outperformed existing methods in multiple metrics, striking a balance between trade offs for counterfactual explanations. The source code is available in the following repository: https://github.com/sumugit/FastDCFlow.

4/23/2024

cs.LG

A Framework for Feasible Counterfactual Exploration incorporating Causality, Sparsity and Density

Kleopatra Markou, Dimitrios Tomaras, Vana Kalogeraki, Dimitrios Gunopulos

The imminent need to interpret the output of a Machine Learning model with counterfactual (CF) explanations - via small perturbations to the input - has been notable in the research community. Although the variety of CF examples is important, the aspect of them being feasible at the same time, does not necessarily apply in their entirety. This work uses different benchmark datasets to examine through the preservation of the logical causal relations of their attributes, whether CF examples can be generated after a small amount of changes to the original input, be feasible and actually useful to the end-user in a real-world case. To achieve this, we used a black box model as a classifier, to distinguish the desired from the input class and a Variational Autoencoder (VAE) to generate feasible CF examples. As an extension, we also extracted two-dimensional manifolds (one for each dataset) that located the majority of the feasible examples, a representation that adequately distinguished them from infeasible ones. For our experimentation we used three commonly used datasets and we managed to generate feasible and at the same time sparse, CF examples that satisfy all possible predefined causal constraints, by confirming their importance with the attributes in a dataset.

4/23/2024

cs.LG cs.AI