Robustly estimating heterogeneity in factorial data using Rashomon Partitions

Read original: arXiv:2404.02141 - Published 8/15/2024 by Aparajithan Venkateswaran, Anirudh Sankar, Arun G. Chandrasekhar, Tyler H. McCormick

📊

Overview

Researchers often analyze how an outcome of interest varies with different combinations of observable factors (covariates).
Existing approaches either search for a single "optimal" partition of the covariate space or sample from all possible partitions.
These methods ignore the reality that many different partitions may be statistically indistinguishable, despite having very different implications.
The paper proposes an alternative approach called Rashomon Partition Sets (RPSs) to address this issue.

Plain English Explanation

Imagine you're a doctor trying to understand how different drug combinations affect a patient's health. Or you're a sociologist studying how technology adoption depends on factors like incentives and demographics. In situations like these, researchers often want to understand how an outcome of interest (like health or technology use) varies with different combinations of observable factors (like drugs or demographics).

Existing methods for analyzing this type of data have limitations. Some approaches try to find a single "best" way to partition the space of factors, making assumptions about how the factors are related. Others simply sample from all the possible ways to divide up the factor space. But the reality is that there are often many statistically equivalent ways to partition the factor space, even though those partitions have very different real-world implications.

The new approach proposed in this paper, called Rashomon Partition Sets (RPSs), aims to address this issue. Instead of searching for a single optimal partition or randomly sampling from all possible partitions, RPSs incorporate all the partitions that are statistically similar to the best one. This gives a more complete picture of how the outcome varies with the different factor combinations, without making assumptions about the relationships between the factors.

The key innovation is using a specific type of statistical prior (called an $\ell_0$ prior) that makes no assumptions about how the factors are related. This prior helps the researchers identify all the plausible partitions of the factor space, rather than just one. Once they have this set of partitions, they can calculate the probability of any interesting outcome (like the effect of a drug combination) across all the partitions in the set.

By considering multiple statistically equivalent partitions, this approach allows for more robust conclusions than conventional methods that focus on a single "optimal" partition. The researchers demonstrate the benefits of this approach through simulations and real-world case studies on topics like charitable giving, chromosomal structure, and microfinance.

Technical Explanation

The paper addresses the common statistical problem of understanding how an outcome of interest varies with combinations of observable covariates. This could involve questions like: How do various drug combinations affect health outcomes? Or how does technology adoption depend on incentives and demographics?

Existing approaches to this problem fall into two main categories:

Searching for a single "optimal" partition of the covariate space, under assumptions about the associations between covariates.
Sampling from the entire set of possible partitions.

Both of these approaches have limitations, as they ignore the reality that many statistically equivalent partitions may exist, even though they have very different implications for policy or science.

To address this, the authors propose an alternative framework called Rashomon Partition Sets (RPSs). Each RPS contains a set of partitions of the covariate space that have similar posterior probability values, even if they offer substantively different explanations. Crucially, the prior used to define the RPS makes no assumptions about associations between covariates.

Specifically, the authors use the $\ell_0$ prior, which they show is minimax optimal for this problem. Given the RPS, they can then calculate the posterior probability of any measurable function of the feature effects on the outcome, conditional on being in the RPS.

The authors also characterize the approximation error of the RPS relative to the entire posterior distribution, and provide bounds on the size of the RPS. Simulations demonstrate that this framework allows for more robust conclusions compared to conventional regularization techniques.

The authors apply their RPS method to three empirical case studies: analyzing price effects on charitable giving, investigating chromosomal structure (telomere length), and studying the introduction of microfinance.

Critical Analysis

The authors acknowledge several caveats and limitations of their approach. First, the computational complexity of identifying the RPS can be challenging, especially for high-dimensional covariate spaces. They note that future research is needed to develop more scalable algorithms.

Additionally, the $\ell_0$ prior used to define the RPS may be sensitive to the specific choice of tuning parameters. The authors suggest exploring alternative priors that can incorporate subject-matter knowledge about the relationships between covariates.

While the RPS framework offers a more comprehensive view of plausible partitions compared to single-partition methods, it does not guarantee that the true data-generating process is contained within the RPS. There may still be partitions outside the RPS that are also statistically viable.

Finally, the authors emphasize that the RPS should be interpreted as a sensitivity analysis, rather than a definitive answer. Researchers should still critically examine the substantive implications of the different partitions within the RPS and how they align with domain knowledge.

Conclusion

This paper introduces a novel statistical framework called Rashomon Partition Sets (RPSs) to address the challenge of understanding how an outcome of interest varies with combinations of observable covariates. By considering a set of statistically equivalent partitions of the covariate space, rather than a single "optimal" partition, the RPS approach allows for more robust conclusions that account for the inherent uncertainty in these types of analyses.

The key innovation is the use of a prior that makes no assumptions about the relationships between covariates, enabling the identification of a diverse set of plausible partitions. This is an important advance over existing methods that either search for a single partition or randomly sample from all possibilities.

While the RPS framework has some computational and methodological limitations, it represents a significant step forward in providing a more comprehensive and nuanced understanding of how outcomes vary with complex combinations of factors. As researchers continue to grapple with such questions across various domains, tools like RPSs will become increasingly valuable for informing policy decisions and advancing scientific knowledge.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Robustly estimating heterogeneity in factorial data using Rashomon Partitions

Aparajithan Venkateswaran, Anirudh Sankar, Arun G. Chandrasekhar, Tyler H. McCormick

Many statistical analyses, in both observational data and randomized control trials, ask: how does the outcome of interest vary with combinations of observable covariates? How do various drug combinations affect health outcomes, or how does technology adoption depend on incentives and demographics? Our goal is to partition this factorial space into pools of covariate combinations where the outcome differs across the pools (but not within a pool). Existing approaches (i) search for a single optimal partition under assumptions about the association between covariates or (ii) sample from the entire set of possible partitions. Both these approaches ignore the reality that, especially with correlation structure in covariates, many ways to partition the covariate space may be statistically indistinguishable, despite very different implications for policy or science. We develop an alternative perspective, called Rashomon Partition Sets (RPSs). Each item in the RPS partitions the space of covariates using a tree-like geometry. RPSs incorporate all partitions that have posterior values near the maximum a posteriori partition, even if they offer substantively different explanations, and do so using a prior that makes no assumptions about associations between covariates. This prior is the $ell_0$ prior, which we show is minimax optimal. Given the RPS we calculate the posterior of any measurable function of the feature effects vector on outcomes, conditional on being in the RPS. We also characterize approximation error relative to the entire posterior and provide bounds on the size of the RPS. Simulations demonstrate this framework allows for robust conclusions relative to conventional regularization techniques. We apply our method to three empirical settings: price effects on charitable giving, chromosomal structure (telomere length), and the introduction of microfinance.

8/15/2024

🌿

Local Discovery by Partitioning: Polynomial-Time Causal Discovery Around Exposure-Outcome Pairs

Jacqueline Maasch, Weishen Pan, Shantanu Gupta, Volodymyr Kuleshov, Kyra Gan, Fei Wang

Causal discovery is crucial for causal inference in observational studies, as it can enable the identification of valid adjustment sets (VAS) for unbiased effect estimation. However, global causal discovery is notoriously hard in the nonparametric setting, with exponential time and sample complexity in the worst case. To address this, we propose local discovery by partitioning (LDP): a local causal discovery method that is tailored for downstream inference tasks without requiring parametric and pretreatment assumptions. LDP is a constraint-based procedure that returns a VAS for an exposure-outcome pair under latent confounding, given sufficient conditions. The total number of independence tests performed is worst-case quadratic with respect to the cardinality of the variable set. Asymptotic theoretical guarantees are numerically validated on synthetic graphs. Adjustment sets from LDP yield less biased and more precise average treatment effect estimates than baseline discovery algorithms, with LDP outperforming on confounder recall, runtime, and test count for VAS discovery. Notably, LDP ran at least 1300x faster than baselines on a benchmark.

6/4/2024

📶

The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance

Jon Donnelly, Srikar Katta, Cynthia Rudin, Edward P. Browne

Quantifying variable importance is essential for answering high-stakes questions in fields like genetics, public policy, and medicine. Current methods generally calculate variable importance for a given model trained on a given dataset. However, for a given dataset, there may be many models that explain the target outcome equally well; without accounting for all possible explanations, different researchers may arrive at many conflicting yet equally valid conclusions given the same data. Additionally, even when accounting for all possible explanations for a given dataset, these insights may not generalize because not all good explanations are stable across reasonable data perturbations. We propose a new variable importance framework that quantifies the importance of a variable across the set of all good models and is stable across the data distribution. Our framework is extremely flexible and can be integrated with most existing model classes and global variable importance metrics. We demonstrate through experiments that our framework recovers variable importance rankings for complex simulation setups where other methods fail. Further, we show that our framework accurately estimates the true importance of a variable for the underlying data distribution. We provide theoretical guarantees on the consistency and finite sample error rates for our estimator. Finally, we demonstrate its utility with a real-world case study exploring which genes are important for predicting HIV load in persons with HIV, highlighting an important gene that has not previously been studied in connection with HIV. Code is available at https://github.com/jdonnelly36/Rashomon_Importance_Distribution.

4/3/2024

Amazing Things Come From Having Many Good Models

Cynthia Rudin, Chudi Zhong, Lesia Semenova, Margo Seltzer, Ronald Parr, Jiachang Liu, Srikar Katta, Jon Donnelly, Harry Chen, Zachery Boner

The Rashomon Effect, coined by Leo Breiman, describes the phenomenon that there exist many equally good predictive models for the same dataset. This phenomenon happens for many real datasets and when it does, it sparks both magic and consternation, but mostly magic. In light of the Rashomon Effect, this perspective piece proposes reshaping the way we think about machine learning, particularly for tabular data problems in the nondeterministic (noisy) setting. We address how the Rashomon Effect impacts (1) the existence of simple-yet-accurate models, (2) flexibility to address user preferences, such as fairness and monotonicity, without losing performance, (3) uncertainty in predictions, fairness, and explanations, (4) reliable variable importance, (5) algorithm choice, specifically, providing advanced knowledge of which algorithms might be suitable for a given problem, and (6) public policy. We also discuss a theory of when the Rashomon Effect occurs and why. Our goal is to illustrate how the Rashomon Effect can have a massive impact on the use of machine learning for complex problems in society.

7/11/2024