Designing Decision Support Systems Using Counterfactual Prediction Sets

Read original: arXiv:2306.03928 - Published 7/17/2024 by Eleni Straitouri, Manuel Gomez Rodriguez

🔮

Overview

Traditional decision support systems focus on predicting ground truth labels, but struggle to help human experts understand when and how to use these predictions.
An alternative approach is to provide a set of label prediction values constructed using a conformal predictor, and ask experts to choose a label from this prediction set.
This paper revisits the design of these "prediction set" systems, developing a new methodology that does not require or assume an expert model.
The proposed methodology leverages the nested structure of the prediction sets and a counterfactual monotonicity assumption to achieve exponential improvements in regret compared to standard bandit algorithms.
The authors conducted a large-scale human subject study to compare their methodology to other baselines, finding that limiting experts' level of agency leads to greater system performance.

Plain English Explanation

Traditional AI-based decision support systems are designed to predict the true label or value for a given task. However, these systems often make mistakes, and it can be challenging for human experts to understand when and how to use the system's predictions to update their own judgments.

An alternative approach is to have the AI system provide a set of possible label predictions, rather than a single prediction. This set of predictions is constructed using a special technique called a "conformal predictor." The system then asks the human expert to choose a label from this prediction set, rather than just accepting the system's single prediction.

This paper takes a fresh look at the design of these "prediction set" decision support systems. Instead of assuming the system knows how the human expert will behave, the researchers developed a new methodology that doesn't require modeling the expert at all. Their approach leverages the way the prediction sets are structured, as well as a reasonable assumption about how predictions change as new information becomes available.

The researchers tested their new methodology by running a large experiment with over 2,700 human participants. They found that when the decision support system limits the expert's ability to completely override the system's predictions, the overall performance of the combined human-AI system is better than when the expert has full control.

Technical Explanation

The paper proposes a new methodology for designing decision support systems that provide a set of label predictions, rather than a single prediction, and force the human expert to choose a label from this prediction set.

The key innovations are:

Leveraging the nested structure of the prediction sets produced by any conformal predictor.
Exploiting a counterfactual monotonicity assumption - the idea that as more information becomes available, the set of plausible labels should get smaller.

By incorporating these two elements, the authors develop a decision support system that achieves exponential improvements in regret (a measure of overall performance) compared to standard "bandit" algorithms.

The authors evaluate their methodology through a large-scale human subject study involving 2,751 participants. They compare their approach to several baselines, including allowing the human expert full control over the final label choice. The results show that limiting the expert's level of agency leads to greater overall system performance than giving the expert unconstrained control.

Critical Analysis

The paper presents a novel and promising approach to designing decision support systems that can effectively complement human experts. The key strength is the authors' focus on leveraging the structure of conformal prediction sets, rather than relying on a model of how the human expert will behave.

However, the human subject study, while large in scale, was conducted in a relatively simplified, stylized setting. It remains to be seen how well this approach would scale and perform in more complex, real-world decision-making scenarios. Additionally, the assumption of counterfactual monotonicity, while reasonable, may not always hold in practice.

Further research is needed to better understand the limitations of this approach and explore ways to make it more robust. For example, the authors mention the potential for adversarial attacks on the conformal predictor, which could undermine the system's performance.

Overall, this paper represents an important step towards improving human-AI complementarity in decision-making tasks. The authors' innovative approach and thorough evaluation provide valuable insights for the design of future decision support systems.

Conclusion

This paper presents a new methodology for designing decision support systems that provide a set of label predictions, rather than a single prediction, and constrain the human expert to choose a label from this prediction set. By leveraging the nested structure of conformal prediction sets and a counterfactual monotonicity assumption, the authors achieve significant performance improvements over standard bandit algorithms.

The large-scale human subject study conducted by the researchers demonstrates the promise of this approach, showing that limiting the expert's level of agency can lead to better overall system performance than giving the expert full control. While the study was conducted in a simplified setting, the authors' innovative approach represents an important step towards developing more effective decision support systems that can better complement human expertise.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔮

Designing Decision Support Systems Using Counterfactual Prediction Sets

Eleni Straitouri, Manuel Gomez Rodriguez

Decision support systems for classification tasks are predominantly designed to predict the value of the ground truth labels. However, since their predictions are not perfect, these systems also need to make human experts understand when and how to use these predictions to update their own predictions. Unfortunately, this has been proven challenging. In this context, it has been recently argued that an alternative type of decision support systems may circumvent this challenge. Rather than providing a single label prediction, these systems provide a set of label prediction values constructed using a conformal predictor, namely a prediction set, and forcefully ask experts to predict a label value from the prediction set. However, the design and evaluation of these systems have so far relied on stylized expert models, questioning their promise. In this paper, we revisit the design of this type of systems from the perspective of online learning and develop a methodology that does not require, nor assumes, an expert model. Our methodology leverages the nested structure of the prediction sets provided by any conformal predictor and a natural counterfactual monotonicity assumption to achieve an exponential improvement in regret in comparison to vanilla bandit algorithms. We conduct a large-scale human subject study ($n = 2{,}751$) to compare our methodology to several competitive baselines. The results show that, for decision support systems based on prediction sets, limiting experts' level of agency leads to greater performance than allowing experts to always exercise their own agency. We have made available the data gathered in our human subject study as well as an open source implementation of our system at https://github.com/Networks-Learning/counterfactual-prediction-sets.

7/17/2024

Controlling Counterfactual Harm in Decision Support Systems Based on Prediction Sets

Eleni Straitouri, Suhas Thejaswi, Manuel Gomez Rodriguez

Decision support systems based on prediction sets help humans solve multiclass classification tasks by narrowing down the set of potential label values to a subset of them, namely a prediction set, and asking them to always predict label values from the prediction sets. While this type of systems have been proven to be effective at improving the average accuracy of the predictions made by humans, by restricting human agency, they may cause harm$unicode{x2014}$a human who has succeeded at predicting the ground-truth label of an instance on their own may have failed had they used these systems. In this paper, our goal is to control how frequently a decision support system based on prediction sets may cause harm, by design. To this end, we start by characterizing the above notion of harm using the theoretical framework of structural causal models. Then, we show that, under a natural, albeit unverifiable, monotonicity assumption, we can estimate how frequently a system may cause harm using only predictions made by humans on their own. Further, we also show that, under a weaker monotonicity assumption, which can be verified experimentally, we can bound how frequently a system may cause harm again using only predictions made by humans on their own. Building upon these assumptions, we introduce a computational framework to design decision support systems based on prediction sets that are guaranteed to cause harm less frequently than a user-specified value using conformal risk control. We validate our framework using real human predictions from two different human subject studies and show that, in decision support systems based on prediction sets, there is a trade-off between accuracy and counterfactual harm.

6/12/2024

Towards Human-AI Complementarity with Predictions Sets

Giovanni De Toni, Nastaran Okati, Suhas Thejaswi, Eleni Straitouri, Manuel Gomez-Rodriguez

Decision support systems based on prediction sets have proven to be effective at helping human experts solve classification tasks. Rather than providing single-label predictions, these systems provide sets of label predictions constructed using conformal prediction, namely prediction sets, and ask human experts to predict label values from these sets. In this paper, we first show that the prediction sets constructed using conformal prediction are, in general, suboptimal in terms of average accuracy. Then, we show that the problem of finding the optimal prediction sets under which the human experts achieve the highest average accuracy is NP-hard. More strongly, unless P = NP, we show that the problem is hard to approximate to any factor less than the size of the label set. However, we introduce a simple and efficient greedy algorithm that, for a large class of expert models and non-conformity scores, is guaranteed to find prediction sets that provably offer equal or greater performance than those constructed using conformal prediction. Further, using a simulation study with both synthetic and real expert predictions, we demonstrate that, in practice, our greedy algorithm finds near-optimal prediction sets offering greater performance than conformal prediction.

5/29/2024

Conformal Prediction Sets Improve Human Decision Making

Jesse C. Cresswell, Yi Sui, Bhargava Kumar, Noel Vouitsis

In response to everyday queries, humans explicitly signal uncertainty and offer alternative answers when they are unsure. Machine learning models that output calibrated prediction sets through conformal prediction mimic this human behaviour; larger sets signal greater uncertainty while providing alternatives. In this work, we study the usefulness of conformal prediction sets as an aid for human decision making by conducting a pre-registered randomized controlled trial with conformal prediction sets provided to human subjects. With statistical significance, we find that when humans are given conformal prediction sets their accuracy on tasks improves compared to fixed-size prediction sets with the same coverage guarantee. The results show that quantifying model uncertainty with conformal prediction is helpful for human-in-the-loop decision making and human-AI teams.

6/11/2024