Adaptive Experimentation When You Can't Experiment

Read original: arXiv:2406.10738 - Published 6/18/2024 by Yao Zhao, Kwang-Sung Jun, Tanner Fiez, Lalit Jain

🌿

Overview

Introduces the "confounded pure exploration transductive linear bandit" (CPET-LB) problem
Motivates the problem using the example of online services that can't directly assign users to control or treatment groups
Proposes an adaptive experimental design approach for learning the best-performing treatment in "encouragement designs"
Considers a linear structural equation model and formulates pure exploration linear bandits in this setting
Presents elimination-style algorithms with sample complexity upper bounds matching a minimax lower bound
Demonstrates the efficacy of the approach through experiments

Plain English Explanation

Online services often can't directly assign users to different experiences, like control or treatment groups, due to business or practical reasons. If users self-select into these groups, it can lead to biased estimates of the actual treatment effects.

Instead, these services can use "encouragement designs" - a randomized approach that incentivizes users toward a specific treatment. The CPET-LB methodology provides an adaptive experimental design to help these services learn the best-performing treatment in such settings.

The researchers consider a more general linear model and formulate the problem as a "pure exploration linear bandit." While pure exploration has been well-studied in standard adaptive experimental design, this is the first work to consider the setting where the noise is "confounded" - meaning it's influenced by other factors.

The paper presents elimination-style algorithms that use experimental design methods and a novel confidence interval estimator. These algorithms have strong theoretical guarantees, with sample complexity bounds that nearly match a lower bound for the best possible performance.

The researchers also demonstrate the effectiveness of their approach through experiments, showing how it can help online services make better-informed decisions about user experiences.

Technical Explanation

The CPET-LB problem considers a setting where online services can't directly assign users to control or treatment groups, but can use a randomized "encouragement design" to incentivize users toward a specific treatment.

The researchers model this as a linear structural equation, which captures a more general underlying relationship than the standard adaptive experimental design settings that have been extensively studied. In this confounded setting, the noise is influenced by other factors, making it a challenge to accurately estimate the true treatment effects.

The paper presents elimination-style algorithms that combine experimental design methods with a novel finite-time confidence interval on an instrumental variable-style estimator. These algorithms have theoretical guarantees, with sample complexity upper bounds that nearly match a minimax lower bound - indicating they're performing close to the best possible.

The experimental results demonstrate the efficacy of the CPET-LB approach, showing how it can help online services learn the best-performing treatment in these "encouragement design" settings where direct user assignment is not possible.

Critical Analysis

The paper acknowledges some limitations, such as the assumption of a linear structural equation model, and suggests exploring more general nonlinear relationships as an area for future research.

Additionally, the paper does not address how the proposed algorithms might scale to larger, more complex real-world settings, or how they might handle practical considerations like computational constraints or the need for interpretable models.

While the theoretical guarantees are strong, it would be valuable to see further empirical evaluation of the algorithms' performance in realistic online service scenarios, including comparisons to alternative approaches like data-driven switchback experiments or targeted sequential indirect experiment design.

Overall, the CPET-LB work represents an important step forward in enabling continuous improvement through adaptive experiments for online services with limitations on direct user assignment. Further research and real-world evaluation will be crucial to assess the broader applicability and practical impact of this approach.

Conclusion

The CPET-LB paper introduces a novel problem and solution for online services that can't directly assign users to control or treatment groups. By using a randomized "encouragement design" and a novel adaptive experimental approach, the researchers demonstrate a way to learn the best-performing treatment even in the presence of confounding factors.

This work advances the state of the art in adaptive experimental design for causal discovery, providing online services with a principled methodology to make more informed decisions about user experiences when direct assignment is not feasible. The strong theoretical guarantees and promising experimental results suggest this approach could have significant real-world impact, though further research and evaluation will be needed to fully understand its limitations and broader applicability.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌿

Adaptive Experimentation When You Can't Experiment

Yao Zhao, Kwang-Sung Jun, Tanner Fiez, Lalit Jain

This paper introduces the emph{confounded pure exploration transductive linear bandit} (texttt{CPET-LB}) problem. As a motivating example, often online services cannot directly assign users to specific control or treatment experiences either for business or practical reasons. In these settings, naively comparing treatment and control groups that may result from self-selection can lead to biased estimates of underlying treatment effects. Instead, online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment. Our methodology provides online services with an adaptive experimental design approach for learning the best-performing treatment for such textit{encouragement designs}. We consider a more general underlying model captured by a linear structural equation and formulate pure exploration linear bandits in this setting. Though pure exploration has been extensively studied in standard adaptive experimental design settings, we believe this is the first work considering a setting where noise is confounded. Elimination-style algorithms using experimental design methods in combination with a novel finite-time confidence interval on an instrumental variable style estimator are presented with sample complexity upper bounds nearly matching a minimax lower bound. Finally, experiments are conducted that demonstrate the efficacy of our approach.

6/18/2024

Adaptive Online Experimental Design for Causal Discovery

Muhammad Qasim Elahi, Lai Wei, Murat Kocaoglu, Mahsa Ghasemi

Causal discovery aims to uncover cause-and-effect relationships encoded in causal graphs by leveraging observational, interventional data, or their combination. The majority of existing causal discovery methods are developed assuming infinite interventional data. We focus on data interventional efficiency and formalize causal discovery from the perspective of online learning, inspired by pure exploration in bandit problems. A graph separating system, consisting of interventions that cut every edge of the graph at least once, is sufficient for learning causal graphs when infinite interventional data is available, even in the worst case. We propose a track-and-stop causal discovery algorithm that adaptively selects interventions from the graph separating system via allocation matching and learns the causal graph based on sampling history. Given any desired confidence value, the algorithm determines a termination condition and runs until it is met. We analyze the algorithm to establish a problem-dependent upper bound on the expected number of required interventional samples. Our proposed algorithm outperforms existing methods in simulations across various randomly generated causal graphs. It achieves higher accuracy, measured by the structural hamming distance (SHD) between the learned causal graph and the ground truth, with significantly fewer samples.

6/26/2024

Asymmetric Graph Error Control with Low Complexity in Causal Bandits

Chen Peng, Di Zhang, Urbashi Mitra

In this paper, the causal bandit problem is investigated, in which the objective is to select an optimal sequence of interventions on nodes in a causal graph. It is assumed that the graph is governed by linear structural equations; it is further assumed that both the causal topology and the distribution of interventions are unknown. By exploiting the causal relationships between the nodes whose signals contribute to the reward, interventions are optimized. First, based on the difference between the two types of graph identification errors (false positives and negatives), a causal graph learning method is proposed, which strongly reduces sample complexity relative to the prior art by learning sub-graphs. Under the assumption of Gaussian exogenous inputs and minimum-mean squared error weight estimation, a new uncertainty bound tailored to the causal bandit problem is derived. This uncertainty bound drives an upper confidence bound based intervention selection to optimize the reward. To cope with non-stationary bandits, a sub-graph change detection mechanism is proposed, with high sample efficiency. Numerical results compare the new methodology to existing schemes and show a substantial performance improvement in both stationary and non-stationary settings. Compared to existing approaches, the proposed scheme takes 67% fewer samples to learn the causal structure and achieves an average reward gain of 85%.

8/22/2024

🌀

Data-Driven Switchback Experiments: Theoretical Tradeoffs and Empirical Bayes Designs

Ruoxuan Xiong, Alex Chin, Sean J. Taylor

We study the design and analysis of switchback experiments conducted on a single aggregate unit. The design problem is to partition the continuous time space into intervals and switch treatments between intervals, in order to minimize the estimation error of the treatment effect. We show that the estimation error depends on four factors: carryover effects, periodicity, serially correlated outcomes, and impacts from simultaneous experiments. We derive a rigorous bias-variance decomposition and show the tradeoffs of the estimation error from these factors. The decomposition provides three new insights in choosing a design: First, balancing the periodicity between treated and control intervals reduces the variance; second, switching less frequently reduces the bias from carryover effects while increasing the variance from correlated outcomes, and vice versa; third, randomizing interval start and end points reduces both bias and variance from simultaneous experiments. Combining these insights, we propose a new empirical Bayes design approach. This approach uses prior data and experiments for designing future experiments. We illustrate this approach using real data from a ride-sharing platform, yielding a design that reduces MSE by 33% compared to the status quo design used on the platform.

6/12/2024