Conformal Prediction with Learned Features

2404.17487

Published 4/29/2024 by Shayan Kiyani, George Pappas, Hamed Hassani

🔮

Abstract

In this paper, we focus on the problem of conformal prediction with conditional guarantees. Prior work has shown that it is impossible to construct nontrivial prediction sets with full conditional coverage guarantees. A wealth of research has considered relaxations of full conditional guarantees, relying on some predefined uncertainty structures. Departing from this line of thinking, we propose Partition Learning Conformal Prediction (PLCP), a framework to improve conditional validity of prediction sets through learning uncertainty-guided features from the calibration data. We implement PLCP efficiently with alternating gradient descent, utilizing off-the-shelf machine learning models. We further analyze PLCP theoretically and provide conditional guarantees for infinite and finite sample sizes. Finally, our experimental results over four real-world and synthetic datasets show the superior performance of PLCP compared to state-of-the-art methods in terms of coverage and length in both classification and regression scenarios.

Create account to get full access

Overview

This paper focuses on the problem of conformal prediction with conditional guarantees.
Prior research has shown it is impossible to construct nontrivial prediction sets with full conditional coverage guarantees.
The authors propose a new framework called Partition Learning Conformal Prediction (PLCP) to improve conditional validity of prediction sets by learning uncertainty-guided features from calibration data.
PLCP is implemented efficiently using alternating gradient descent and off-the-shelf machine learning models.
Theoretical analysis provides conditional guarantees for PLCP in both infinite and finite sample sizes.
Experiments on real-world and synthetic datasets show PLCP outperforms state-of-the-art methods in terms of coverage and length for both classification and regression.

Plain English Explanation

The paper tackles the challenge of creating prediction sets that can reliably estimate uncertainty for new data points, even when the data has complex, unknown patterns. Prior work has shown it's impossible to build perfect "conditional" prediction sets - ones that guarantee the right uncertainty for each individual prediction.

To address this, the authors propose a new framework called Partition Learning Conformal Prediction (PLCP). PLCP aims to learn helpful features from calibration data that can guide the construction of more accurate, conditional prediction sets. The key insight is that by partitioning the data based on learned uncertainty-related features, PLCP can tailor the prediction sets to each data partition.

The PLCP method is implemented efficiently using alternating gradient descent and standard machine learning models. Theoretically, the authors prove that PLCP provides conditional guarantees on the prediction set coverage, even with finite sample sizes.

Experiments on real-world and synthetic datasets show PLCP outperforms other state-of-the-art conformal prediction methods. PLCP is able to produce prediction sets with higher coverage rates and tighter lengths, for both classification and regression problems.

Technical Explanation

The paper introduces Partition Learning Conformal Prediction (PLCP), a new framework for improving the conditional validity of prediction sets. Unlike prior work that relies on predefined uncertainty structures, PLCP learns uncertainty-guided features from calibration data to better tailor the prediction sets to each data point.

The core idea behind PLCP is to partition the data based on these learned features, and then construct separate prediction sets for each partition. This allows the method to adapt the prediction sets to the local uncertainty characteristics of the data, rather than assuming a one-size-fits-all model.

The authors implement PLCP efficiently using an alternating gradient descent procedure, which iterates between learning the partitioning features and optimizing the prediction set parameters. Off-the-shelf machine learning models are used for the underlying predictive tasks.

Theoretically, the paper provides conditional coverage guarantees for PLCP in both the infinite and finite sample regimes. This ensures the prediction sets maintain the desired coverage rates, even for small datasets.

Experiments on four real-world and synthetic datasets demonstrate the superior performance of PLCP compared to state-of-the-art conformal prediction methods. PLCP achieves higher coverage rates and tighter prediction set lengths, for both classification and regression problems.

Critical Analysis

The paper makes a compelling case for the PLCP framework as a practical approach to improving conditional conformal prediction. By learning uncertainty-related features, PLCP is able to construct more targeted prediction sets that outperform prior methods.

However, the paper does acknowledge some limitations. The theoretical analysis assumes the partitioning features are learned perfectly, which may not hold in practice. Additionally, the experiments are limited to relatively small-scale datasets, and it's unclear how PLCP would scale to truly large-scale or high-dimensional problems.

Further research could explore ways to make the partitioning more robust, perhaps by incorporating uncertainty quantification directly into the feature learning process. Investigating PLCP's performance on a broader range of real-world applications would also help validate its practical utility.

Overall, the PLCP framework represents a promising direction for enhancing the practical applicability of conformal prediction methods. By adaptively learning uncertainty structures, it offers a path towards more reliable and informative prediction sets in complex, real-world scenarios.

Conclusion

This paper introduces Partition Learning Conformal Prediction (PLCP), a novel framework for improving the conditional validity of prediction sets. PLCP learns uncertainty-guided features from calibration data to partition the input space and construct tailored prediction sets for each partition.

The authors provide efficient implementations of PLCP using alternating gradient descent and off-the-shelf machine learning models. Theoretical analysis establishes conditional coverage guarantees for PLCP in both infinite and finite sample regimes.

Experimental results on real-world and synthetic datasets demonstrate the superior performance of PLCP compared to state-of-the-art conformal prediction methods. PLCP achieves higher coverage rates and tighter prediction set lengths for both classification and regression tasks.

The PLCP framework represents an important step forward in enhancing the practical applicability of conformal prediction. By adaptively learning uncertainty structures, it offers a promising path towards more reliable and informative prediction sets in complex, real-world scenarios.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔮

An Information Theoretic Perspective on Conformal Prediction

Alvaro H. C. Correia, Fabio Valerio Massoli, Christos Louizos, Arash Behboodi

Conformal Prediction (CP) is a distribution-free uncertainty estimation framework that constructs prediction sets guaranteed to contain the true answer with a user-specified probability. Intuitively, the size of the prediction set encodes a general notion of uncertainty, with larger sets associated with higher degrees of uncertainty. In this work, we leverage information theory to connect conformal prediction to other notions of uncertainty. More precisely, we prove three different ways to upper bound the intrinsic uncertainty, as described by the conditional entropy of the target variable given the inputs, by combining CP with information theoretical inequalities. Moreover, we demonstrate two direct and useful applications of such connection between conformal prediction and information theory: (i) more principled and effective conformal training objectives that generalize previous approaches and enable end-to-end training of machine learning models from scratch, and (ii) a natural mechanism to incorporate side information into conformal prediction. We empirically validate both applications in centralized and federated learning settings, showing our theoretical results translate to lower inefficiency (average prediction set size) for popular CP methods.

6/27/2024

cs.LG cs.IT stat.ML

Robust Conformal Prediction Using Privileged Information

Shai Feldman, Yaniv Romano

We develop a method to generate prediction sets with a guaranteed coverage rate that is robust to corruptions in the training data, such as missing or noisy variables. Our approach builds on conformal prediction, a powerful framework to construct prediction sets that are valid under the i.i.d assumption. Importantly, naively applying conformal prediction does not provide reliable predictions in this setting, due to the distribution shift induced by the corruptions. To account for the distribution shift, we assume access to privileged information (PI). The PI is formulated as additional features that explain the distribution shift, however, they are only available during training and absent at test time. We approach this problem by introducing a novel generalization of weighted conformal prediction and support our method with theoretical coverage guarantees. Empirical experiments on both real and synthetic datasets indicate that our approach achieves a valid coverage rate and constructs more informative predictions compared to existing methods, which are not supported by theoretical guarantees.

6/11/2024

cs.LG

Verifiably Robust Conformal Prediction

Linus Jeary, Tom Kuipers, Mehran Hosseini, Nicola Paoletti

Conformal Prediction (CP) is a popular uncertainty quantification method that provides distribution-free, statistically valid prediction sets, assuming that training and test data are exchangeable. In such a case, CP's prediction sets are guaranteed to cover the (unknown) true test output with a user-specified probability. Nevertheless, this guarantee is violated when the data is subjected to adversarial attacks, which often result in a significant loss of coverage. Recently, several approaches have been put forward to recover CP guarantees in this setting. These approaches leverage variations of randomised smoothing to produce conservative sets which account for the effect of the adversarial perturbations. They are, however, limited in that they only support $ell^2$-bounded perturbations and classification tasks. This paper introduces VRCP (Verifiably Robust Conformal Prediction), a new framework that leverages recent neural network verification methods to recover coverage guarantees under adversarial attacks. Our VRCP method is the first to support perturbations bounded by arbitrary norms including $ell^1$, $ell^2$, and $ell^infty$, as well as regression tasks. We evaluate and compare our approach on image classification tasks (CIFAR10, CIFAR100, and TinyImageNet) and regression tasks for deep reinforcement learning environments. In every case, VRCP achieves above nominal coverage and yields significantly more efficient and informative prediction regions than the SotA.

6/7/2024

cs.LO cs.AI cs.LG

New!Length Optimization in Conformal Prediction

Shayan Kiyani, George Pappas, Hamed Hassani

Conditional validity and length efficiency are two crucial aspects of conformal prediction (CP). Achieving conditional validity ensures accurate uncertainty quantification for data subpopulations, while proper length efficiency ensures that the prediction sets remain informative and non-trivial. Despite significant efforts to address each of these issues individually, a principled framework that reconciles these two objectives has been missing in the CP literature. In this paper, we develop Conformal Prediction with Length-Optimization (CPL) - a novel framework that constructs prediction sets with (near-) optimal length while ensuring conditional validity under various classes of covariate shifts, including the key cases of marginal and group-conditional coverage. In the infinite sample regime, we provide strong duality results which indicate that CPL achieves conditional validity and length optimality. In the finite sample regime, we show that CPL constructs conditionally valid prediction sets. Our extensive empirical evaluations demonstrate the superior prediction set size performance of CPL compared to state-of-the-art methods across diverse real-world and synthetic datasets in classification, regression, and text-related settings.

6/28/2024

stat.ML cs.AI cs.LG