Federated Combinatorial Multi-Agent Multi-Armed Bandits

Read original: arXiv:2405.05950 - Published 5/10/2024 by Fares Fourati, Mohamed-Slim Alouini, Vaneet Aggarwal

🤯

Overview

This paper introduces a federated learning framework for online combinatorial optimization with bandit feedback.
The framework transforms any offline resilient single-agent algorithm into an online multi-agent algorithm with sublinear regret and communication-efficient properties.
The approach is applied to online stochastic submodular maximization, yielding the first theoretical guarantees for both single-agent and multi-agent settings.
The effectiveness of the proposed framework is empirically validated on a stochastic data summarization problem.

Plain English Explanation

In this paper, the researchers present a new federated learning framework for solving online combinatorial optimization problems. In this setting, multiple agents (e.g., devices or users) need to select subsets of "arms" (options) and observe noisy rewards for these subsets, without knowing the individual rewards of each arm. The agents can cooperate and share information at specific intervals.

The key innovation of the framework is that it can transform any existing single-agent resilient optimization algorithm into an online multi-agent algorithm with several desirable properties:

It eliminates the approximation error (epsilon) of the original algorithm.
It ensures sublinear growth in the time horizon and a linear speedup with more communicating agents.
It requires only a sublinear number of communication rounds, making it communication-efficient.

The researchers apply this framework to the problem of online stochastic submodular maximization, which has applications in areas like data summarization. They show that their approach can recover specialized theoretical guarantees for both single-agent and multi-agent settings, outperforming previous methods.

Technical Explanation

The paper introduces a federated learning framework for online combinatorial optimization with bandit feedback. In this setting, agents select subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.

The framework transforms any offline resilient single-agent $(alpha-epsilon)$-approximation algorithm, with a complexity of $\tilde{\mathcal{O}}(\frac{\psi}{\epsilon^{\beta}})$, into an online multi-agent algorithm with $m$ communicating agents and an $\alpha$-regret of no more than $\tilde{\mathcal{O}}(m^{-\frac{1}{3+\beta}} \psi^{\frac{1}{3+\beta}} T^{\frac{2+\beta}{3+\beta}})$. This approach eliminates the $\epsilon$ approximation error and ensures sublinear growth with respect to the time horizon $T$ and a linear speedup with an increasing number of communicating agents.

Additionally, the algorithm is communication-efficient, requiring only a sublinear number of communication rounds, quantified as $\tilde{\mathcal{O}}(\psi T^{\frac{\beta}{\beta+1}})$.

The framework is applied to online stochastic submodular maximization using various offline algorithms, yielding the first results for both single-agent and multi-agent settings and recovering specialized single-agent theoretical guarantees.

The researchers empirically validate their approach on a stochastic data summarization problem, demonstrating the effectiveness of the proposed framework, even in single-agent scenarios.

Critical Analysis

The paper presents a comprehensive federated learning framework for online combinatorial optimization with bandit feedback, which is a significant contribution to the field. The theoretical guarantees provided, including the elimination of approximation error and the communication-efficient properties, are impressive.

However, the framework relies on the availability of an offline resilient single-agent algorithm, which may not always be the case. Additionally, the framework assumes that the agents can cooperate and share information at specific intervals, which may not be feasible in all real-world scenarios, especially in adversarial settings.

Further research could explore the performance of the framework in more challenging settings, such as when the agents have limited communication or face adversarial environments. Additionally, investigating the practical implementation and deployment of the framework in real-world applications would be valuable.

Conclusion

This paper introduces a powerful federated learning framework for online combinatorial optimization with bandit feedback. By transforming any offline resilient single-agent algorithm into an online multi-agent algorithm, the framework achieves sublinear regret and communication-efficient properties, which are crucial for practical applications.

The successful application of the framework to online stochastic submodular maximization, with theoretical guarantees for both single-agent and multi-agent settings, demonstrates its versatility and potential impact on a wide range of combinatorial optimization problems. The empirical validation on a stochastic data summarization problem further showcases the effectiveness of the proposed approach.

While the framework has some limitations, such as the reliance on offline algorithms and the assumption of cooperative agents, the paper's contributions represent a significant advancement in the field of federated learning and online combinatorial optimization.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤯

Federated Combinatorial Multi-Agent Multi-Armed Bandits

Fares Fourati, Mohamed-Slim Alouini, Vaneet Aggarwal

This paper introduces a federated learning framework tailored for online combinatorial optimization with bandit feedback. In this setting, agents select subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals. Our framework transforms any offline resilient single-agent $(alpha-epsilon)$-approximation algorithm, having a complexity of $tilde{mathcal{O}}(frac{psi}{epsilon^beta})$, where the logarithm is omitted, for some function $psi$ and constant $beta$, into an online multi-agent algorithm with $m$ communicating agents and an $alpha$-regret of no more than $tilde{mathcal{O}}(m^{-frac{1}{3+beta}} psi^frac{1}{3+beta} T^frac{2+beta}{3+beta})$. This approach not only eliminates the $epsilon$ approximation error but also ensures sublinear growth with respect to the time horizon $T$ and demonstrates a linear speedup with an increasing number of communicating agents. Additionally, the algorithm is notably communication-efficient, requiring only a sublinear number of communication rounds, quantified as $tilde{mathcal{O}}left(psi T^frac{beta}{beta+1}right)$. Furthermore, the framework has been successfully applied to online stochastic submodular maximization using various offline algorithms, yielding the first results for both single-agent and multi-agent settings and recovering specialized single-agent theoretical guarantees. We empirically validate our approach to a stochastic data summarization problem, illustrating the effectiveness of the proposed framework, even in single-agent scenarios.

5/10/2024

A Federated Online Restless Bandit Framework for Cooperative Resource Allocation

Jingwen Tong, Xinran Li, Liqun Fu, Jun Zhang, Khaled B. Letaief

Restless multi-armed bandits (RMABs) have been widely utilized to address resource allocation problems with Markov reward processes (MRPs). Existing works often assume that the dynamics of MRPs are known prior, which makes the RMAB problem solvable from an optimization perspective. Nevertheless, an efficient learning-based solution for RMABs with unknown system dynamics remains an open problem. In this paper, we study the cooperative resource allocation problem with unknown system dynamics of MRPs. This problem can be modeled as a multi-agent online RMAB problem, where multiple agents collaboratively learn the system dynamics while maximizing their accumulated rewards. We devise a federated online RMAB framework to mitigate the communication overhead and data privacy issue by adopting the federated learning paradigm. Based on this framework, we put forth a Federated Thompson Sampling-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem. The FedTSWI algorithm enjoys a high communication and computation efficiency, and a privacy guarantee. Moreover, we derive a regret upper bound for the FedTSWI algorithm. Finally, we demonstrate the effectiveness of the proposed algorithm on the case of online multi-user multi-channel access. Numerical results show that the proposed algorithm achieves a fast convergence rate of $mathcal{O}(sqrt{Tlog(T)})$ and better performance compared with baselines. More importantly, its sample complexity decreases with the number of agents.

6/13/2024

Federated $mathcal{X}$-armed Bandit with Flexible Personalisation

Ali Arabzadeh, James A. Grant, David S. Leslie

This paper introduces a novel approach to personalised federated learning within the $mathcal{X}$-armed bandit framework, addressing the challenge of optimising both local and global objectives in a highly heterogeneous environment. Our method employs a surrogate objective function that combines individual client preferences with aggregated global knowledge, allowing for a flexible trade-off between personalisation and collective learning. We propose a phase-based elimination algorithm that achieves sublinear regret with logarithmic communication overhead, making it well-suited for federated settings. Theoretical analysis and empirical evaluations demonstrate the effectiveness of our approach compared to existing methods. Potential applications of this work span various domains, including healthcare, smart home devices, and e-commerce, where balancing personalisation with global insights is crucial.

9/12/2024

🐍

Federated Multi-Armed Bandits Under Byzantine Attacks

Artun Saday, .Ilker Demirel, Yiu{g}it Y{i}ld{i}r{i}m, Cem Tekin

Multi-armed bandits (MAB) is a sequential decision-making model in which the learner controls the trade-off between exploration and exploitation to maximize its cumulative reward. Federated multi-armed bandits (FMAB) is an emerging framework where a cohort of learners with heterogeneous local models play an MAB game and communicate their aggregated feedback to a server to learn a globally optimal arm. Two key hurdles in FMAB are communication-efficient learning and resilience to adversarial attacks. To address these issues, we study the FMAB problem in the presence of Byzantine clients who can send false model updates threatening the learning process. We analyze the sample complexity and the regret of $beta$-optimal arm identification. We borrow tools from robust statistics and propose a median-of-means (MoM)-based online algorithm, Fed-MoM-UCB, to cope with Byzantine clients. In particular, we show that if the Byzantine clients constitute less than half of the cohort, the cumulative regret with respect to $beta$-optimal arms is bounded over time with high probability, showcasing both communication efficiency and Byzantine resilience. We analyze the interplay between the algorithm parameters, a discernibility margin, regret, communication cost, and the arms' suboptimality gaps. We demonstrate Fed-MoM-UCB's effectiveness against the baselines in the presence of Byzantine attacks via experiments.

6/18/2024