Stochastic Bandits Robust to Adversarial Attacks

Read original: arXiv:2408.08859 - Published 8/19/2024 by Xuchuang Wang, Jinhang Zuo, Xutong Liu, John C. S. Lui, Mohammad Hajiesmaili

Stochastic Bandits Robust to Adversarial Attacks

Overview

This paper proposes a new approach for stochastic multi-armed bandits that is robust to adversarial attacks.
The authors introduce a novel algorithm called Adversarial-Robust Stochastic Bandits (ARSB) that can effectively handle adversarial corruptions of reward signals.
Experiments show that ARSB outperforms existing methods in terms of regret minimization under adversarial attacks.

Plain English Explanation

The paper focuses on a type of machine learning problem called the multi-armed bandit problem. In this problem, a learner must choose which "arm" (option) to pull from a set of options in order to maximize the total reward received over time.

However, the authors consider a scenario where the rewards for each arm can be corrupted by an adversary who is trying to mislead the learner. This is known as an "adversarial attack" on the bandit problem.

To address this challenge, the researchers propose a new algorithm called Adversarial-Robust Stochastic Bandits (ARSB). ARSB is designed to be resistant to these adversarial attacks, allowing it to still perform well and minimize regret (the difference between the optimal reward and the actual reward received) even when the rewards are being manipulated.

Through experiments, the authors demonstrate that ARSB outperforms existing bandit algorithms when the rewards are subject to adversarial corruption. This makes ARSB a promising approach for applications where the data may be untrustworthy or subject to malicious tampering.

Technical Explanation

The paper starts by formally defining the multi-armed bandit (MAB) problem with adversarial attacks. In this setting, the learner must choose which arm to pull at each round, and the rewards for each arm can be corrupted by an adversary.

The authors then introduce their Adversarial-Robust Stochastic Bandits (ARSB) algorithm. ARSB works by maintaining confidence intervals for the mean rewards of each arm, and then choosing the arm with the highest upper confidence bound. Crucially, ARSB also incorporates a "robustification" step that adjusts the confidence intervals to be more conservative in the face of potential adversarial attacks.

The paper provides a regret analysis of ARSB, showing that it achieves sublinear regret bounds even in the presence of adversarial corruptions. This is a significant improvement over existing bandit algorithms that can suffer linear regret when the rewards are manipulated.

The authors also conduct experiments comparing ARSB to other bandit algorithms on synthetic datasets with adversarial attacks. The results demonstrate that ARSB consistently outperforms the baselines in terms of cumulative regret, validating the effectiveness of their approach.

Critical Analysis

The paper makes a valuable contribution by addressing the important problem of adversarial attacks in multi-armed bandit settings. The proposed ARSB algorithm is a novel and principled approach that provides strong theoretical guarantees and empirical performance.

One potential limitation is that the analysis and experiments focus on the classic stochastic multi-armed bandit problem. It would be interesting to see how ARSB performs in more complex bandit settings, such as contextual bandits or combinatorial bandits, where the adversary may have additional opportunities to manipulate the environment.

Additionally, the paper does not consider the case of imprecise or uncertain rewards, which could be another important real-world challenge. Extending the ARSB approach to handle such uncertainty could further enhance its practical applicability.

Overall, this paper presents a compelling solution to a critical problem in the field of multi-armed bandits. The ARSB algorithm and its theoretical and empirical analysis make significant advances in the state of the art and open up promising directions for future research.

Conclusion

This paper introduces a novel algorithm called Adversarial-Robust Stochastic Bandits (ARSB) that is designed to be robust to adversarial attacks in multi-armed bandit problems. The authors provide a rigorous analysis of ARSB's regret bounds and demonstrate its superior performance compared to existing methods through experiments.

The work addresses an important practical challenge in machine learning, where the data used to train models may be subject to malicious manipulation. By developing algorithms like ARSB that can effectively handle adversarial corruptions, the research community can work towards building more trustworthy and reliable machine learning systems. This has wide-ranging implications for applications where the input data may be untrustworthy or adversarial in nature.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Stochastic Bandits Robust to Adversarial Attacks

Xuchuang Wang, Jinhang Zuo, Xutong Liu, John C. S. Lui, Mohammad Hajiesmaili

This paper investigates stochastic multi-armed bandit algorithms that are robust to adversarial attacks, where an attacker can first observe the learner's action and {then} alter their reward observation. We study two cases of this model, with or without the knowledge of an attack budget $C$, defined as an upper bound of the summation of the difference between the actual and altered rewards. For both cases, we devise two types of algorithms with regret bounds having additive or multiplicative $C$ dependence terms. For the known attack budget case, we prove our algorithms achieve the regret bound of ${O}((K/Delta)log T + KC)$ and $tilde{O}(sqrt{KTC})$ for the additive and multiplicative $C$ terms, respectively, where $K$ is the number of arms, $T$ is the time horizon, $Delta$ is the gap between the expected rewards of the optimal arm and the second-best arm, and $tilde{O}$ hides the logarithmic factors. For the unknown case, we prove our algorithms achieve the regret bound of $tilde{O}(sqrt{KT} + KC^2)$ and $tilde{O}(KCsqrt{T})$ for the additive and multiplicative $C$ terms, respectively. In addition to these upper bound results, we provide several lower bounds showing the tightness of our bounds and the optimality of our algorithms. These results delineate an intrinsic separation between the bandits with attacks and corruption models [Lykouris et al., 2018].

8/19/2024

🌀

Blocking Bandits

Soumya Basu, Rajat Sen, Sujay Sanghavi, Sanjay Shakkottai

We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same product recommendation repeatedly) or infeasible (e.g. compute job scheduling on machines). We show that with prior knowledge of the rewards and delays of all the arms, the problem of optimizing cumulative reward does not admit any pseudo-polynomial time algorithm (in the number of arms) unless randomized exponential time hypothesis is false, by mapping to the PINWHEEL scheduling problem. Subsequently, we show that a simple greedy algorithm that plays the available arm with the highest reward is asymptotically $(1-1/e)$ optimal. When the rewards are unknown, we design a UCB based algorithm which is shown to have $c log T + o(log T)$ cumulative regret against the greedy algorithm, leveraging the free exploration of arms due to the unavailability. Finally, when all the delays are equal the problem reduces to Combinatorial Semi-bandits providing us with a lower bound of $c' log T+ omega(log T)$.

7/31/2024

🏷️

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints

Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Federico Fusco

We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints. Our goal is to design best-of-both-worlds algorithms that perform optimally under both stochastic and adversarial constraints. Previous works address this problem via primal-dual methods, and require some stringent assumptions, namely the Slater's condition, and in adversarial settings, they either assume knowledge of a lower bound on the Slater's parameter, or impose strong requirements on the primal and dual regret minimizers such as requiring weak adaptivity. We propose an alternative and more natural approach based on optimistic estimations of the constraints. Surprisingly, we show that estimating the constraints with an UCB-like approach guarantees optimal performances. Our algorithm consists of two main components: (i) a regret minimizer working on emph{moving strategy sets} and (ii) an estimate of the feasible set as an optimistic weighted empirical mean of previous samples. The key challenge in this approach is designing adaptive weights that meet the different requirements for stochastic and adversarial constraints. Our algorithm is significantly simpler than previous approaches, and has a cleaner analysis. Moreover, ours is the first best-of-both-worlds algorithm providing bounds logarithmic in the number of constraints. Additionally, in stochastic settings, it provides $widetilde O(sqrt{T})$ regret emph{without} Slater's condition.

5/28/2024

🗣️

Adversarial Attacks on Combinatorial Multi-Armed Bandits

Rishab Balasubramanian, Jiawei Li, Prasad Tadepalli, Huazheng Wang, Qingyun Wu, Haoyu Zhao

We study reward poisoning attacks on Combinatorial Multi-armed Bandits (CMAB). We first provide a sufficient and necessary condition for the attackability of CMAB, a notion to capture the vulnerability and robustness of CMAB. The attackability condition depends on the intrinsic properties of the corresponding CMAB instance such as the reward distributions of super arms and outcome distributions of base arms. Additionally, we devise an attack algorithm for attackable CMAB instances. Contrary to prior understanding of multi-armed bandits, our work reveals a surprising fact that the attackability of a specific CMAB instance also depends on whether the bandit instance is known or unknown to the adversary. This finding indicates that adversarial attacks on CMAB are difficult in practice and a general attack strategy for any CMAB instance does not exist since the environment is mostly unknown to the adversary. We validate our theoretical findings via extensive experiments on real-world CMAB applications including probabilistic maximum covering problem, online minimum spanning tree, cascading bandits for online ranking, and online shortest path.

6/5/2024