Causally Abstracted Multi-armed Bandits

2404.17493

Published 4/29/2024 by Fabio Massimo Zennaro, Nicholas Bishop, Joel Dyer, Yorgos Felekis, Anisoara Calinescu, Michael Wooldridge, Theodoros Damoulas

cs.LG cs.AI

🖼️

Abstract

Multi-armed bandits (MAB) and causal MABs (CMAB) are established frameworks for decision-making problems. The majority of prior work typically studies and solves individual MAB and CMAB in isolation for a given problem and associated data. However, decision-makers are often faced with multiple related problems and multi-scale observations where joint formulations are needed in order to efficiently exploit the problem structures and data dependencies. Transfer learning for CMABs addresses the situation where models are defined on identical variables, although causal connections may differ. In this work, we extend transfer learning to setups involving CMABs defined on potentially different variables, with varying degrees of granularity, and related via an abstraction map. Formally, we introduce the problem of causally abstracted MABs (CAMABs) by relying on the theory of causal abstraction in order to express a rigorous abstraction map. We propose algorithms to learn in a CAMAB, and study their regret. We illustrate the limitations and the strengths of our algorithms on a real-world scenario related to online advertising.

Get summaries of the top AI research delivered straight to your inbox:

Overview

Multi-armed bandits (MAB) and causal MABs (CMAB) are frameworks for decision-making problems
Typically, prior work studies and solves individual MAB and CMAB problems in isolation
Decision-makers often face multiple related problems and multi-scale observations, where joint formulations are needed to exploit problem structures and data dependencies
This work extends transfer learning for CMABs to setups involving CMABs defined on different variables, with varying granularity, and related via an abstraction map

Plain English Explanation

Multi-armed bandits (MABs) and causal MABs (CMABs) are tools used to make decisions in complex situations. Typically, researchers study and solve individual MAB or CMAB problems separately for a given problem and data.

However, decision-makers often face multiple related problems and observations at different scales. In these cases, it would be better to consider the problems jointly, to take advantage of the relationships between them and the structure of the data.

This paper extends a technique called "transfer learning" for CMABs to more complex setups. Transfer learning allows models trained on one problem to be adapted to a related problem, even if the causal relationships between variables are slightly different.

The authors generalize this idea to situations where the related problems are defined on different variables, at different levels of detail, but are connected through an "abstraction map" that explains how the variables relate to each other. They call this new framework "causally abstracted MABs" (CAMABs).

The paper proposes algorithms for learning in a CAMAB setting and analyzes their performance. It also shows how these techniques can be applied to a real-world problem in online advertising.

Technical Explanation

The authors introduce the problem of causally abstracted multi-armed bandits (CAMABs), which extends transfer learning for CMABs to setups where the CMAB problems are defined on potentially different variables, with varying degrees of granularity, and related via an abstraction map.

Formally, the authors rely on the theory of causal abstraction to express a rigorous abstraction map that connects the different CMAB problems. They propose algorithms to learn in a CAMAB setting and analyze their regret, which is a measure of how much the algorithms underperform compared to the optimal decisions.

The authors illustrate the strengths and limitations of their algorithms on a real-world scenario related to online advertising, where there are multiple related decision-making problems at different levels of granularity.

Critical Analysis

The paper introduces an interesting generalization of transfer learning for CMABs, allowing the technique to be applied to a broader range of decision-making problems. The authors' use of causal abstraction theory provides a principled way to connect the different CMAB problems.

However, the paper does not extensively discuss the potential limitations of the CAMAB framework. For example, the quality of the abstraction map and the assumptions required for it to be valid are not explored in depth. Additionally, the computational complexity of the proposed algorithms is not analyzed, which could be an important consideration for real-world applications.

It would also be valuable to see the CAMAB framework applied to a wider range of problem domains beyond online advertising, to better understand its broader applicability and limitations.

Overall, the paper presents a promising extension of transfer learning for CMABs, but further research is needed to fully evaluate the strengths, weaknesses, and practical implications of the CAMAB approach.

Conclusion

This paper introduces the concept of causally abstracted multi-armed bandits (CAMABs), which extends transfer learning techniques for causal multi-armed bandits (CMABs) to settings where the underlying problems are defined on different variables and at different levels of granularity.

The authors leverage the theory of causal abstraction to formally connect these related CMAB problems and propose algorithms for learning in a CAMAB setting. While the paper demonstrates the potential of this approach on a real-world online advertising problem, further research is needed to fully understand the limitations and broader applicability of the CAMAB framework.

Overall, this work represents an important step forward in adapting decision-making tools like MABs and CMABs to the complex, interconnected problems faced by real-world decision-makers.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Multi-Agent Bandit Learning through Heterogeneous Action Erasure Channels

Osama A. Hanna, Merve Karakas, Lin F. Yang, Christina Fragouli

Multi-Armed Bandit (MAB) systems are witnessing an upswing in applications within multi-agent distributed environments, leading to the advancement of collaborative MAB algorithms. In such settings, communication between agents executing actions and the primary learner making decisions can hinder the learning process. A prevalent challenge in distributed learning is action erasure, often induced by communication delays and/or channel noise. This results in agents possibly not receiving the intended action from the learner, subsequently leading to misguided feedback. In this paper, we introduce novel algorithms that enable learners to interact concurrently with distributed agents across heterogeneous action erasure channels with different action erasure probabilities. We illustrate that, in contrast to existing bandit algorithms, which experience linear regret, our algorithms assure sub-linear regret guarantees. Our proposed solutions are founded on a meticulously crafted repetition protocol and scheduling of learning across heterogeneous channels. To our knowledge, these are the first algorithms capable of effectively learning through heterogeneous action erasure channels. We substantiate the superior performance of our algorithm through numerical experiments, emphasizing their practical significance in addressing issues related to communication constraints and delays in multi-agent environments.

4/30/2024

cs.LG cs.DC cs.MA

🏅

Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback

Guojun Xiong, Jian Li

Restless multi-armed bandits (RMAB) play a central role in modeling sequential decision making problems under an instantaneous activation constraint that at most B arms can be activated at any decision epoch. Each restless arm is endowed with a state that evolves independently according to a Markov decision process regardless of being activated or not. In this paper, we consider the task of learning in episodic RMAB with unknown transition functions and adversarial rewards, which can change arbitrarily across episodes. Further, we consider a challenging but natural bandit feedback setting that only adversarial rewards of activated arms are revealed to the decision maker (DM). The goal of the DM is to maximize its total adversarial rewards during the learning process while the instantaneous activation constraint must be satisfied in each decision epoch. We develop a novel reinforcement learning algorithm with two key contributors: a novel biased adversarial reward estimator to deal with bandit feedback and unknown transitions, and a low-complexity index policy to satisfy the instantaneous activation constraint. We show $tilde{mathcal{O}}(Hsqrt{T})$ regret bound for our algorithm, where $T$ is the number of episodes and $H$ is the episode length. To our best knowledge, this is the first algorithm to ensure $tilde{mathcal{O}}(sqrt{T})$ regret for adversarial RMAB in our considered challenging settings.

5/3/2024

cs.LG cs.AI stat.ML

📶

Sequential Monte Carlo Bandits

I~nigo Urteaga, Chris H. Wiggins

We extend Bayesian multi-armed bandit (MAB) algorithms beyond their original setting by making use of sequential Monte Carlo (SMC) methods. A MAB is a sequential decision making problem where the goal is to learn a policy that maximizes long term payoff, where only the reward of the executed action is observed. In the stochastic MAB, the reward for each action is generated from an unknown distribution, often assumed to be stationary. To decide which action to take next, a MAB agent must learn the characteristics of the unknown reward distribution, e.g., compute its sufficient statistics. However, closed-form expressions for these statistics are analytically intractable except for simple, stationary cases. We here utilize SMC for estimation of the statistics Bayesian MAB agents compute, and devise flexible policies that can address a rich class of bandit problems: i.e., MABs with nonlinear, stateless- and context-dependent reward distributions that evolve over time. We showcase how non-stationary bandits, where time dynamics are modeled via linear dynamical systems, can be successfully addressed by SMC-based Bayesian bandit agents. We empirically demonstrate good regret performance of the proposed SMC-based bandit policies in several MAB scenarios that have remained elusive, i.e., in non-stationary bandits with nonlinear rewards.

4/8/2024

stat.ML cs.LG

🤯

Federated Combinatorial Multi-Agent Multi-Armed Bandits

Fares Fourati, Mohamed-Slim Alouini, Vaneet Aggarwal

This paper introduces a federated learning framework tailored for online combinatorial optimization with bandit feedback. In this setting, agents select subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals. Our framework transforms any offline resilient single-agent $(alpha-epsilon)$-approximation algorithm, having a complexity of $tilde{mathcal{O}}(frac{psi}{epsilon^beta})$, where the logarithm is omitted, for some function $psi$ and constant $beta$, into an online multi-agent algorithm with $m$ communicating agents and an $alpha$-regret of no more than $tilde{mathcal{O}}(m^{-frac{1}{3+beta}} psi^frac{1}{3+beta} T^frac{2+beta}{3+beta})$. This approach not only eliminates the $epsilon$ approximation error but also ensures sublinear growth with respect to the time horizon $T$ and demonstrates a linear speedup with an increasing number of communicating agents. Additionally, the algorithm is notably communication-efficient, requiring only a sublinear number of communication rounds, quantified as $tilde{mathcal{O}}left(psi T^frac{beta}{beta+1}right)$. Furthermore, the framework has been successfully applied to online stochastic submodular maximization using various offline algorithms, yielding the first results for both single-agent and multi-agent settings and recovering specialized single-agent theoretical guarantees. We empirically validate our approach to a stochastic data summarization problem, illustrating the effectiveness of the proposed framework, even in single-agent scenarios.

5/10/2024

cs.LG cs.AI cs.DM cs.MA stat.ML