Enhancing Counterfactual Explanation Search with Diffusion Distance and Directional Coherence

2404.12810

Published 4/22/2024 by Marharyta Domnich, Raul Vicente

Enhancing Counterfactual Explanation Search with Diffusion Distance and Directional Coherence

Abstract

A pressing issue in the adoption of AI models is the increasing demand for more human-centric explanations of their predictions. To advance towards more human-centric explanations, understanding how humans produce and select explanations has been beneficial. In this work, inspired by insights of human cognition we propose and test the incorporation of two novel biases to enhance the search for effective counterfactual explanations. Central to our methodology is the application of diffusion distance, which emphasizes data connectivity and actionability in the search for feasible counterfactual explanations. In particular, diffusion distance effectively weights more those points that are more interconnected by numerous short-length paths. This approach brings closely connected points nearer to each other, identifying a feasible path between them. We also introduce a directional coherence term that allows the expression of a preference for the alignment between the joint and marginal directional changes in feature space to reach a counterfactual. This term enables the generation of counterfactual explanations that align with a set of marginal predictions based on expectations of how the outcome of the model varies by changing one feature at a time. We evaluate our method, named Coherent Directional Counterfactual Explainer (CoDiCE), and the impact of the two novel biases against existing methods such as DiCE, FACE, Prototypes, and Growing Spheres. Through a series of ablation experiments on both synthetic and real datasets with continuous and mixed-type features, we demonstrate the effectiveness of our method.

Create account to get full access

Overview

The paper proposes enhancements to counterfactual explanation search for interpretable machine learning models.
Key innovations include using diffusion distance and directional coherence to improve the feasibility and interpretability of counterfactual explanations.
The approach is model-agnostic and tested on tabular data, demonstrating improved performance over existing methods.

Plain English Explanation

Counterfactual explanations are an important tool in interpretable machine learning. They show how small changes to an input could lead to a different model prediction, helping users understand the reasoning behind a model's decision.

However, finding good counterfactual explanations can be challenging. The authors of this paper tackled two key issues: feasibility and interpretability. Feasibility means the counterfactual explanation should actually be possible in the real world, not just a hypothetical scenario. Interpretability means the explanation should be easy for a human to understand.

The authors addressed these challenges by incorporating two new ideas into the counterfactual search process:

Diffusion distance: This measures how different the counterfactual is from the original input. The authors used this to ensure the counterfactual is "close" to the original, making it more feasible.
Directional coherence: This measures how the changes in the counterfactual align with the direction the model's prediction changed. The authors used this to make the counterfactual more interpretable, as the changes would clearly explain the model's reasoning.

By adding these concepts, the authors developed a more practical and insightful approach to finding counterfactual explanations. They tested it on tabular data and showed it outperformed previous methods. This work helps advance the field of interpretable AI by making counterfactual explanations more useful in real-world applications.

Technical Explanation

The authors propose a novel approach to counterfactual explanation search that leverages two key ideas: diffusion distance and directional coherence.

Diffusion distance measures how different the counterfactual is from the original input. The authors use this to ensure the counterfactual is "close" to the original, making it more feasible in the real world. This is in contrast to previous methods that may have generated counterfactuals that were too dissimilar from the original.

Directional coherence measures how the changes in the counterfactual align with the direction the model's prediction changed. The authors use this to make the counterfactual more interpretable, as the changes would clearly explain the model's reasoning. This helps address issues with previous methods that may have produced counterfactuals that were not as insightful for users.

The authors integrate these concepts into a model-agnostic counterfactual search procedure. They evaluate their approach on tabular datasets and show it outperforms existing methods in terms of both feasibility and interpretability of the generated counterfactuals.

Critical Analysis

The authors acknowledge several limitations and areas for future work. First, the approach is currently limited to tabular data and may require adaptation for other data modalities like images or text. Additionally, the diffusion distance and directional coherence metrics, while useful, may not capture all aspects of feasibility and interpretability.

An interesting avenue for further research would be to explore how these concepts could be combined with other recent advances in counterfactual explanation, such as global counterfactual directions or generating counterfactual trajectories using latent diffusion models. Additionally, a comparative study of graph-based counterfactual explanations could provide further insights.

More broadly, the field of adapting counterfactual explanations to different use cases and user needs is an active area of research. The presented work is a valuable contribution, but continued innovation will be needed to fully realize the potential of counterfactual explanations in real-world interpretable AI systems.

Conclusion

This paper proposes an enhanced approach to counterfactual explanation search that addresses key challenges of feasibility and interpretability. By incorporating diffusion distance and directional coherence into the search process, the authors have developed a more practical and insightful method for generating counterfactual explanations.

The model-agnostic nature of the approach and the demonstrated improvements over existing techniques make this work an important contribution to the field of interpretable machine learning. As the use of AI systems becomes more widespread, tools like this that can help users understand and trust the reasoning behind model predictions will only grow in importance.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

CoLa-DCE -- Concept-guided Latent Diffusion Counterfactual Explanations

Franz Motzkus, Christian Hellert, Ute Schmid

Recent advancements in generative AI have introduced novel prospects and practical implementations. Especially diffusion models show their strength in generating diverse and, at the same time, realistic features, positioning them well for generating counterfactual explanations for computer vision models. Answering what if questions of what needs to change to make an image classifier change its prediction, counterfactual explanations align well with human understanding and consequently help in making model behavior more comprehensible. Current methods succeed in generating authentic counterfactuals, but lack transparency as feature changes are not directly perceivable. To address this limitation, we introduce Concept-guided Latent Diffusion Counterfactual Explanations (CoLa-DCE). CoLa-DCE generates concept-guided counterfactuals for any classifier with a high degree of control regarding concept selection and spatial conditioning. The counterfactuals comprise an increased granularity through minimal feature changes. The reference feature visualization ensures better comprehensibility, while the feature localization provides increased transparency of where changed what. We demonstrate the advantages of our approach in minimality and comprehensibility across multiple image classification models and datasets and provide insights into how our CoLa-DCE explanations help comprehend model errors like misclassification cases.

6/5/2024

cs.LG cs.AI

Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature Space

Yukai Zhang, Ao Xu, Zihao Li, Tieru Wu

In the realm of Artificial Intelligence (AI), the importance of Explainable Artificial Intelligence (XAI) is increasingly recognized, particularly as AI models become more integral to our lives. One notable single-instance XAI approach is counterfactual explanation, which aids users in comprehending a model's decisions and offers guidance on altering these decisions. Specifically in the context of image classification models, effective image counterfactual explanations can significantly enhance user understanding. This paper introduces a novel method for computing feature importance within the feature space of a black-box model. By employing information fusion techniques, our method maximizes the use of data to address feature counterfactual explanations in the feature space. Subsequently, we utilize an image generation model to transform these feature counterfactual explanations into image counterfactual explanations. Our experiments demonstrate that the counterfactual explanations generated by our method closely resemble the original images in both pixel and feature spaces. Additionally, our method outperforms established baselines, achieving impressive experimental results.

6/3/2024

cs.LG cs.CV

Global Counterfactual Directions

Bartlomiej Sobieski, Przemys{l}aw Biecek

Despite increasing progress in development of methods for generating visual counterfactual explanations, especially with the recent rise of Denoising Diffusion Probabilistic Models, previous works consider them as an entirely local technique. In this work, we take the first step at globalizing them. Specifically, we discover that the latent space of Diffusion Autoencoders encodes the inference process of a given classifier in the form of global directions. We propose a novel proxy-based approach that discovers two types of these directions with the use of only single image in an entirely black-box manner. Precisely, g-directions allow for flipping the decision of a given classifier on an entire dataset of images, while h-directions further increase the diversity of explanations. We refer to them in general as Global Counterfactual Directions (GCDs). Moreover, we show that GCDs can be naturally combined with Latent Integrated Gradients resulting in a new black-box attribution method, while simultaneously enhancing the understanding of counterfactual explanations. We validate our approach on existing benchmarks and show that it generalizes to real-world use-cases.

4/22/2024

cs.LG cs.AI cs.CV

🤔

Navigating Explanatory Multiverse Through Counterfactual Path Geometry

Kacper Sokol, Edward Small, Yueqing Xuan

Counterfactual explanations are the de facto standard when tasked with interpreting decisions of (opaque) predictive models. Their generation is often subject to algorithmic and domain-specific constraints -- such as density-based feasibility, and attribute (im)mutability or directionality of change -- that aim to maximise their real-life utility. In addition to desiderata with respect to the counterfactual instance itself, existence of a viable path connecting it with the factual data point, known as algorithmic recourse, has become an important technical consideration. While both of these requirements ensure that the steps of the journey as well as its destination are admissible, current literature neglects the multiplicity of such counterfactual paths. To address this shortcoming we introduce the novel concept of explanatory multiverse that encompasses all the possible counterfactual journeys. We then show how to navigate, reason about and compare the geometry of these trajectories with two methods: vector spaces and graphs. To this end, we overview their spacial properties -- such as affinity, branching, divergence and possible future convergence -- and propose an all-in-one metric, called opportunity potential, to quantify them. Implementing this (possibly interactive) explanatory process grants explainees agency by allowing them to select counterfactuals based on the properties of the journey leading to them in addition to their absolute differences. We show the flexibility, benefit and efficacy of such an approach through examples and quantitative evaluation on the German Credit and MNIST data sets.

5/7/2024

cs.LG cs.AI