Interval Abstractions for Robust Counterfactual Explanations

2404.13736

Published 4/23/2024 by Junqi Jiang, Francesco Leofante, Antonio Rago, Francesca Toni

📊

Abstract

Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research, providing recourse recommendations for users affected by the decisions of machine learning models. However, when slight changes occur in the parameters of the underlying model, CEs found by existing methods often become invalid for the updated models. The literature lacks a way to certify deterministic robustness guarantees for CEs under model changes, in that existing methods to improve CEs' robustness are heuristic, and the robustness performances are evaluated empirically using only a limited number of retrained models. To bridge this gap, we propose a novel interval abstraction technique for parametric machine learning models, which allows us to obtain provable robustness guarantees of CEs under the possibly infinite set of plausible model changes $Delta$. We formalise our robustness notion as the $Delta$-robustness for CEs, in both binary and multi-class classification settings. We formulate procedures to verify $Delta$-robustness based on Mixed Integer Linear Programming, using which we further propose two algorithms to generate CEs that are $Delta$-robust. In an extensive empirical study, we demonstrate how our approach can be used in practice by discussing two strategies for determining the appropriate hyperparameter in our method, and we quantitatively benchmark the CEs generated by eleven methods, highlighting the effectiveness of our algorithms in finding robust CEs.

Create account to get full access

Overview

This paper proposes a novel technique to obtain provable robustness guarantees for counterfactual explanations (CEs) of machine learning models, even when the underlying model parameters change.
Existing methods for generating CEs often produce explanations that become invalid when the model is slightly modified, limiting their practical usefulness.
The authors introduce an "interval abstraction" technique that can certify the robustness of CEs under a potentially infinite set of plausible model changes.
They formulate procedures to verify this robustness using mixed-integer linear programming, and propose algorithms to generate provably robust CEs.
The paper includes an extensive empirical evaluation demonstrating the effectiveness of the proposed approach.

Plain English Explanation

Counterfactual explanations (CEs) are a way to help users understand why a machine learning model made a particular decision about them. CEs provide recommendations for how a user's attributes would need to change in order to get a different outcome from the model.

However, a key limitation of existing CE methods is that they don't consider what happens when the underlying machine learning model is updated or changed slightly. In these cases, the CEs generated for the original model may no longer be valid or applicable.

To address this, the researchers developed a new technique that can provide provable robustness guarantees for CEs, even when the model parameters change. Their "interval abstraction" approach allows them to certify the robustness of CEs across a potentially infinite set of possible model updates.

This means that the CEs generated by their method will remain valid and useful, even if the machine learning model is refined or updated over time. The authors demonstrate the effectiveness of their approach through extensive experiments, showing how it outperforms other CE generation techniques in terms of robustness.

Overall, this research represents an important step forward in making counterfactual explanations more practical and reliable for real-world use cases, where model updates are common.

Technical Explanation

The key technical innovation in this paper is the introduction of an "interval abstraction" technique for parametric machine learning models. This allows the authors to obtain provable robustness guarantees for the counterfactual explanations (CEs) generated by their approach.

Specifically, the authors formalize a notion of "Δ-robustness" for CEs, which captures the idea that the CE should remain valid across a potentially infinite set of plausible model changes (represented by the parameter set Δ). They then develop procedures to verify Δ-robustness using mixed-integer linear programming.

Based on this robustness verification framework, the authors propose two algorithms to generate Δ-robust CEs. The first algorithm uses a binary search approach to find the most actionable CE that satisfies the Δ-robustness constraint. The second algorithm casts CE generation as a multi-objective optimization problem, jointly optimizing for actionability and robustness.

Through extensive experiments on several benchmark datasets, the authors demonstrate the effectiveness of their Δ-robust CE generation approach. They explore different strategies for selecting the appropriate hyperparameters in their method, and provide a quantitative comparison to 11 other CE generation techniques, highlighting the superior robustness of their algorithms.

Critical Analysis

The key strength of this research is the formal treatment of CE robustness, which addresses an important practical limitation of existing CE methods. By providing provable guarantees on the validity of CEs under model changes, the authors' approach can make counterfactual explanations more reliable and trustworthy for real-world applications.

That said, the paper does not discuss the computational complexity of the proposed robustness verification and CE generation procedures, which could be a potential concern for large-scale or high-dimensional problems. Additionally, the authors only consider a specific notion of robustness (Δ-robustness) and it's unclear how this relates to other robustness criteria that may be of interest in different contexts.

Another limitation is that the empirical evaluation is primarily focused on standard benchmark datasets, and it would be valuable to see how the method performs on more diverse and real-world datasets, where the assumptions of the underlying model may be more complex or challenging.

Finally, while the paper provides a strong technical foundation, further research could explore ways to make the Δ-robust CE generation more interpretable and user-friendly, perhaps by incorporating human preferences or domain knowledge into the optimization process.

Conclusion

This paper presents a novel technique for generating counterfactual explanations that are provably robust to changes in the underlying machine learning model. By introducing an "interval abstraction" approach and formalizing the notion of Δ-robustness, the authors have made an important contribution to the field of explainable AI.

The ability to provide reliable and trustworthy counterfactual explanations, even as models are updated over time, can significantly enhance the practical usefulness of these techniques for real-world applications. The extensive empirical evaluation demonstrates the effectiveness of the proposed approach, and highlights opportunities for further research to improve the interpretability and usability of Δ-robust CEs.

Overall, this work represents a significant step forward in making counterfactual explanations more robust and deployable in dynamic machine learning systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🧠

Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation

Junqi Jiang, Jianglin Lan, Francesco Leofante, Antonio Rago, Francesca Toni

Counterfactual Explanations (CEs) have received increasing interest as a major methodology for explaining neural network classifiers. Usually, CEs for an input-output pair are defined as data points with minimum distance to the input that are classified with a different label than the output. To tackle the established problem that CEs are easily invalidated when model parameters are updated (e.g. retrained), studies have proposed ways to certify the robustness of CEs under model parameter changes bounded by a norm ball. However, existing methods targeting this form of robustness are not sound or complete, and they may generate implausible CEs, i.e., outliers wrt the training dataset. In fact, no existing method simultaneously optimises for closeness and plausibility while preserving robustness guarantees. In this work, we propose Provably RObust and PLAusible Counterfactual Explanations (PROPLACE), a method leveraging on robust optimisation techniques to address the aforementioned limitations in the literature. We formulate an iterative algorithm to compute provably robust CEs and prove its convergence, soundness and completeness. Through a comparative experiment involving six baselines, five of which target robustness, we show that PROPLACE achieves state-of-the-art performances against metrics on three evaluation aspects.

4/5/2024

cs.LG cs.AI

Do Counterfactual Examples Complicate Adversarial Training?

Eric Yeats, Cameron Darwin, Eduardo Ortega, Frank Liu, Hai Li

We leverage diffusion models to study the robustness-performance tradeoff of robust classifiers. Our approach introduces a simple, pretrained diffusion method to generate low-norm counterfactual examples (CEs): semantically altered data which results in different true class membership. We report that the confidence and accuracy of robust models on their clean training data are associated with the proximity of the data to their CEs. Moreover, robust models perform very poorly when evaluated on the CEs directly, as they become increasingly invariant to the low-norm, semantic changes brought by CEs. The results indicate a significant overlap between non-robust and semantic features, countering the common assumption that non-robust features are not interpretable.

4/17/2024

cs.LG cs.CV

Weak Robust Compatibility Between Learning Algorithms and Counterfactual Explanation Generation Algorithms

Ao Xu, Tieru Wu

Counterfactual explanation generation is a powerful method for Explainable Artificial Intelligence. It can help users understand why machine learning models make specific decisions, and how to change those decisions. Evaluating the robustness of counterfactual explanation algorithms is therefore crucial. Previous literature has widely studied the robustness based on the perturbation of input instances. However, the robustness defined from the perspective of perturbed instances is sometimes biased, because this definition ignores the impact of learning algorithms on robustness. In this paper, we propose a more reasonable definition, Weak Robust Compatibility, based on the perspective of explanation strength. In practice, we propose WRC-Test to help us generate more robust counterfactuals. Meanwhile, we designed experiments to verify the effectiveness of WRC-Test. Theoretically, we introduce the concepts of PAC learning theory and define the concept of PAC WRC-Approximability. Based on reasonable assumptions, we establish oracle inequalities about weak robustness, which gives a sufficient condition for PAC WRC-Approximability.

6/3/2024

cs.LG

Counterfactual Explanations for Linear Optimization

Jannis Kurtz, c{S}. .Ilker Birbil, Dick den Hertog

The concept of counterfactual explanations (CE) has emerged as one of the important concepts to understand the inner workings of complex AI systems. In this paper, we translate the idea of CEs to linear optimization and propose, motivate, and analyze three different types of CEs: strong, weak, and relative. While deriving strong and weak CEs appears to be computationally intractable, we show that calculating relative CEs can be done efficiently. By detecting and exploiting the hidden convex structure of the optimization problem that arises in the latter case, we show that obtaining relative CEs can be done in the same magnitude of time as solving the original linear optimization problem. This is confirmed by an extensive numerical experiment study on the NETLIB library.

5/27/2024

cs.LG