Robust prediction under missingness shifts

2406.16484

Published 6/26/2024 by Patrick Rockenschaub, Zhicong Xian, Alireza Zamanian, Marta Piperno, Octavia-Andreea Ciora, Elisabeth Pachl, Narges Ahmidi

stat.ML cs.LG

Robust prediction under missingness shifts

Abstract

Prediction becomes more challenging with missing covariates. What method is chosen to handle missingness can greatly affect how models perform. In many real-world problems, the best prediction performance is achieved by models that can leverage the informative nature of a value being missing. Yet, the reasons why a covariate goes missing can change once a model is deployed in practice. If such a missingness shift occurs, the conditional probability of a value being missing differs in the target data. Prediction performance in the source data may no longer be a good selection criterion, and approaches that do not rely on informative missingness may be preferable. However, we show that the Bayes predictor remains unchanged by ignorable shifts for which the probability of missingness only depends on observed data. Any consistent estimator of the Bayes predictor may therefore result in robust prediction under those conditions, although we show empirically that different methods appear robust to different types of shifts. If the missingness shift is non-ignorable, the Bayes predictor may change due to the shift. While neither approach recovers the Bayes predictor in this case, we found empirically that disregarding missingness was most beneficial when it was highly informative.

Create account to get full access

Overview

This paper explores the challenge of making robust predictions when there are shifts in the underlying data distribution, particularly related to missing data.
The researchers propose a method to train predictive models that can maintain performance even when the patterns of missing data change between the training and testing data.
The paper builds on related work in areas like learning when concepts shift, invariant probabilistic prediction, and estimating model performance under covariate shift.

Plain English Explanation

Predictive models are often trained on historical data, but in the real world, the underlying patterns in the data can change over time. One common issue is that certain data points may be missing from the training data, but present in the test data, or vice versa. This can cause the model to perform poorly when deployed, as it was not trained to handle these "missingness shifts."

The researchers in this paper propose a new approach to train models that are more robust to changes in missingness patterns. The key idea is to explicitly model the process by which data can be missing, and then use this model to train the predictive model to be less sensitive to these shifts. This allows the model to maintain good performance even when the missingness patterns change between training and testing.

By making predictive models more resilient to real-world data changes, this work can help improve the reliability and robustness of AI systems deployed in high-stakes applications like healthcare or finance.

Technical Explanation

The paper formulates the problem of "robust prediction under missingness shifts" - the challenge of training models that can make accurate predictions even when the patterns of missing data change between the training and test distributions.

The researchers propose a two-stage approach to address this problem. First, they learn a model of the missingness process - how certain features become missing as a function of the observed data. They then use this missingness model to train the final predictive model, encouraging it to be less sensitive to changes in the missingness patterns.

Specifically, the method consists of:

Learning a model of the missingness process using the training data.
Augmenting the predictive model with the missingness model, and training the combined model end-to-end.
At test time, using the learned missingness model to reason about and correct for the impact of missing data.

The authors demonstrate the effectiveness of their approach on both synthetic and real-world datasets, showing that it can significantly outperform standard techniques when there are shifts in missingness between training and test.

Critical Analysis

The paper provides a thoughtful and principled approach to the important problem of handling missingness shifts in predictive modeling. By explicitly modeling the missingness process, the method goes beyond simple imputation or ignoring missing data.

One limitation noted by the authors is that their approach relies on the missingness process being "not too complex" - if the true missingness mechanism is highly nonlinear or interactive, the proposed model may struggle to capture it accurately. Additionally, the paper only considers shifts in the missingness pattern, but not other types of distribution shift that may occur in practice.

Further research could explore ways to make the missingness modeling more flexible, or to jointly model other types of distribution shift beyond just missingness. It would also be valuable to test the approach on a wider range of real-world datasets and applications to fully understand its strengths and weaknesses.

Overall, this work represents an important step towards building more reliable and robust predictive algorithms that can maintain performance in the face of real-world data changes.

Conclusion

This paper introduces a new approach for training predictive models that are robust to shifts in the patterns of missing data between the training and test distributions. By explicitly modeling the missingness process and incorporating that into the training of the predictive model, the method can help maintain good performance even when the missingness changes.

The proposed technique builds on related work in areas like invariant probabilistic prediction and training conditional coverage bounds under covariate shift. By making predictive models more resilient to real-world data changes, this research can contribute to the development of more reliable and trustworthy AI systems for high-stakes applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🛸

Robust Design and Evaluation of Predictive Algorithms under Unobserved Confounding

Ashesh Rambachan, Amanda Coston, Edward Kennedy

Predictive algorithms inform consequential decisions in settings where the outcome is selectively observed given choices made by human decision makers. We propose a unified framework for the robust design and evaluation of predictive algorithms in selectively observed data. We impose general assumptions on how much the outcome may vary on average between unselected and selected units conditional on observed covariates and identified nuisance parameters, formalizing popular empirical strategies for imputing missing data such as proxy outcomes and instrumental variables. We develop debiased machine learning estimators for the bounds on a large class of predictive performance estimands, such as the conditional likelihood of the outcome, a predictive algorithm's mean square error, true/false positive rate, and many others, under these assumptions. In an administrative dataset from a large Australian financial institution, we illustrate how varying assumptions on unobserved confounding leads to meaningful changes in default risk predictions and evaluations of credit scores across sensitive groups.

5/21/2024

cs.CY cs.LG

Invariant Probabilistic Prediction

Alexander Henzi, Xinwei Shen, Michael Law, Peter Buhlmann

In recent years, there has been a growing interest in statistical methods that exhibit robust performance under distribution changes between training and test data. While most of the related research focuses on point predictions with the squared error loss, this article turns the focus towards probabilistic predictions, which aim to comprehensively quantify the uncertainty of an outcome variable given covariates. Within a causality-inspired framework, we investigate the invariance and robustness of probabilistic predictions with respect to proper scoring rules. We show that arbitrary distribution shifts do not, in general, admit invariant and robust probabilistic predictions, in contrast to the setting of point prediction. We illustrate how to choose evaluation metrics and restrict the class of distribution shifts to allow for identifiability and invariance in the prototypical Gaussian heteroscedastic linear model. Motivated by these findings, we propose a method to yield invariant probabilistic predictions, called IPP, and study the consistency of the underlying parameters. Finally, we demonstrate the empirical performance of our proposed procedure on simulated as well as on single-cell data.

6/18/2024

cs.LG stat.ML

Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction

Kulunu Dharmakeerthi, YoonHaeng Hur, Tengyuan Liang

Practitioners often deploy a learned prediction model in a new environment where the joint distribution of covariate and response has shifted. In observational data, the distribution shift is often driven by unobserved confounding factors lurking in the environment, with the underlying mechanism unknown. Confounding can obfuscate the definition of the best prediction model (concept shift) and shift covariates to domains yet unseen (covariate shift). Therefore, a model maximizing prediction accuracy in the source environment could suffer a significant accuracy drop in the target environment. This motivates us to study the domain adaptation problem with observational data: given labeled covariate and response pairs from a source environment, and unlabeled covariates from a target environment, how can one predict the missing target response reliably? We root the adaptation problem in a linear structural causal model to address endogeneity and unobserved confounding. We study the necessity and benefit of leveraging exogenous, invariant covariate representations to cure concept shifts and improve target prediction. This further motivates a new representation learning method for adaptation that optimizes for a lower-dimensional linear subspace and, subsequently, a prediction model confined to that subspace. The procedure operates on a non-convex objective-that naturally interpolates between predictability and stability/invariance-constrained on the Stiefel manifold. We study the optimization landscape and prove that, when the regularization is sufficient, nearly all local optima align with an invariant linear subspace resilient to both concept and covariate shift. In terms of predictability, we show a model that uses the learned lower-dimensional subspace can incur a nearly ideal gap between target and source risk. Three real-world data sets are investigated to validate our method and theory.

6/26/2024

cs.LG stat.ML

🌀

Training-Conditional Coverage Bounds under Covariate Shift

Mehrdad Pournaderi, Yu Xiang

Training-conditional coverage guarantees in conformal prediction concern the concentration of the error distribution, conditional on the training data, below some nominal level. The conformal prediction methodology has recently been generalized to the covariate shift setting, namely, the covariate distribution changes between the training and test data. In this paper, we study the training-conditional coverage properties of a range of conformal prediction methods under covariate shift via a weighted version of the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality tailored for distribution change. The result for the split conformal method is almost assumption-free, while the results for the full conformal and jackknife+ methods rely on strong assumptions including the uniform stability of the training algorithm.

5/28/2024

stat.ML cs.LG