Symmetric Linear Bandits with Hidden Symmetry

Read original: arXiv:2405.13899 - Published 5/24/2024 by Nam Phuong Tran, The Anh Ta, Debmalya Mandal, Long Tran-Thanh

💬

Overview

The paper explores high-dimensional linear bandits with hidden symmetry structure, where the reward is invariant under certain transformations on the set of arms.
The authors propose a method based on model selection within the collection of low-dimensional subspaces to learn the correct symmetry in an online setting.
The algorithm achieves regret bounds that depend on the dimension of the true low-dimensional subspace, rather than the ambient dimension, which can be much larger.

Plain English Explanation

In high-dimensional linear bandit problems, the learner is trying to maximize the reward by selecting the best action (or "arm") from a large set of possible actions. These problems often have some hidden structure, such as sparsity or [symmetry], that can be exploited to achieve better performance.

The paper focuses on the case where the reward is symmetric, meaning it doesn't change when the set of arms is transformed in certain ways. This type of structure can be found in many practical applications, but it may not be known to the learner in advance. The authors' goal is to develop an algorithm that can learn the correct symmetry structure online, without being explicitly told what it is.

Their proposed method works by considering a collection of possible low-dimensional subspaces that could represent the hidden symmetry. The algorithm then selects the best subspace based on the data it observes, and uses this to guide its decisions and achieve good regret (i.e., the difference between the optimal reward and the reward obtained by the algorithm).

The key advantage of this approach is that the regret bounds depend on the dimension of the true low-dimensional subspace, rather than the full ambient dimension, which can be much larger. This means the algorithm can perform well even in high-dimensional settings where the underlying structure is relatively simple.

Technical Explanation

The paper studies the problem of high-dimensional linear bandits with hidden symmetry structure. In this setting, the reward function is assumed to be invariant under certain groups of transformations on the set of arms. This type of structure is more general than the commonly studied sparsity assumption, and covers many standard structures, including sparsity, as special cases.

The authors propose a method based on model selection within the collection of low-dimensional subspaces to learn the correct symmetry in an online setting. Specifically, they consider a collection of possible low-dimensional subspaces that could represent the hidden symmetry, and use a multi-armed bandit algorithm to select the best subspace based on the observed data.

The key technical contributions of the paper include:

Characterizing the structure of the collection of hidden symmetry subspaces, and showing that it can be represented as the intersection of a small number of low-dimensional subspaces.
Developing a model selection-based algorithm that can efficiently explore this collection and identify the correct subspace.
Providing regret bounds for the proposed algorithm, which depend on the dimension of the true low-dimensional subspace, rather than the ambient dimension.

Under an additional assumption of well-separated models, the authors further improve the regret bound to $O(d_0\sqrt{T\log(d)})$, where $d_0$ is the dimension of the true low-dimensional subspace and $T$ is the time horizon.

Critical Analysis

The paper makes a valuable contribution by studying a more general class of high-dimensional linear bandits with hidden symmetry structure, which encompasses many practical scenarios beyond the commonly studied sparsity assumption.

One potential limitation is the reliance on the assumption of well-separated models, which may not always hold in practice. It would be interesting to see if the authors' techniques can be extended to the more general case without this assumption, potentially at the cost of slightly worse regret bounds.

Additionally, the paper does not explore the robustness of the proposed method to model misspecification or other forms of uncertainty, which could be an important consideration in real-world applications. [Extensions to the robust and locally private settings could be valuable future research directions.

Overall, the paper presents a solid theoretical foundation for learning high-dimensional linear bandits with hidden symmetry structure, and the authors' approach of leveraging model selection techniques seems promising for practical deployment in a variety of applications.

Conclusion

This paper addresses the important problem of high-dimensional linear bandits with hidden symmetry structure, which can arise in many real-world scenarios. The authors propose a model selection-based algorithm that can efficiently learn the correct symmetry structure in an online setting, and provide regret bounds that depend on the dimension of the true low-dimensional subspace rather than the ambient dimension.

The key takeaway is that by exploiting the underlying structure of the problem, even in cases where it is not known in advance, it is possible to achieve strong performance in high-dimensional settings. This has important implications for the design of practical bandit algorithms that can scale to large, complex decision-making problems.

Overall, the paper makes a valuable contribution to the field of high-dimensional bandit optimization and opens up interesting directions for further research, such as exploring robustness to model misspecification and extensions to the locally private setting.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Symmetric Linear Bandits with Hidden Symmetry

Nam Phuong Tran, The Anh Ta, Debmalya Mandal, Long Tran-Thanh

High-dimensional linear bandits with low-dimensional structure have received considerable attention in recent studies due to their practical significance. The most common structure in the literature is sparsity. However, it may not be available in practice. Symmetry, where the reward is invariant under certain groups of transformations on the set of arms, is another important inductive bias in the high-dimensional case that covers many standard structures, including sparsity. In this work, we study high-dimensional symmetric linear bandits where the symmetry is hidden from the learner, and the correct symmetry needs to be learned in an online setting. We examine the structure of a collection of hidden symmetry and provide a method based on model selection within the collection of low-dimensional subspaces. Our algorithm achieves a regret bound of $ O(d_0^{1/3} T^{2/3} log(d))$, where $d$ is the ambient dimension which is potentially very large, and $d_0$ is the dimension of the true low-dimensional subspace such that $d_0 ll d$. With an extra assumption on well-separated models, we can further improve the regret to $ O(d_0sqrt{Tlog(d)} )$.

5/24/2024

Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

Tianyuan Jin, Kyoungseok Jang, Nicol`o Cesa-Bianchi

We study stochastic linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors), from which it chooses an element and obtains a stochastic reward. The expected reward is a fixed but unknown linear function of the chosen action. We study sparse regret bounds, that depend on the number $S$ of non-zero coefficients in the linear reward function. Previous works focused on the case where $S$ is known, or the action sets satisfy additional assumptions. In this work, we obtain the first sparse regret bounds that hold when $S$ is unknown and the action sets are adversarially generated. Our techniques combine online to confidence set conversions with a novel randomized model selection approach over a hierarchy of nested confidence sets. When $S$ is known, our analysis recovers state-of-the-art bounds for adversarial action sets. We also show that a variant of our approach, using Exp3 to dynamically select the confidence sets, can be used to improve the empirical performance of stochastic linear bandits while enjoying a regret bound with optimal dependence on the time horizon.

6/4/2024

Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery

Yassir Jedra, William R'eveillard, Stefan Stojanovic, Alexandre Proutiere

We study contextual bandits with low-rank structure where, in each round, if the (context, arm) pair $(i,j)in [m]times [n]$ is selected, the learner observes a noisy sample of the $(i,j)$-th entry of an unknown low-rank reward matrix. Successive contexts are generated randomly in an i.i.d. manner and are revealed to the learner. For such bandits, we present efficient algorithms for policy evaluation, best policy identification and regret minimization. For policy evaluation and best policy identification, we show that our algorithms are nearly minimax optimal. For instance, the number of samples required to return an $varepsilon$-optimal policy with probability at least $1-delta$ typically scales as ${r(m+n)over varepsilon^2}log(1/delta)$. Our regret minimization algorithm enjoys minimax guarantees typically scaling as $r^{7/4}(m+n)^{3/4}sqrt{T}$, which improves over existing algorithms. All the proposed algorithms consist of two phases: they first leverage spectral methods to estimate the left and right singular subspaces of the low-rank reward matrix. We show that these estimates enjoy tight error guarantees in the two-to-infinity norm. This in turn allows us to reformulate our problems as a misspecified linear bandit problem with dimension roughly $r(m+n)$ and misspecification controlled by the subspace recovery error, as well as to design the second phase of our algorithms efficiently.

7/8/2024

⛏️

Linear bandits with polylogarithmic minimax regret

Josep Lumbreras, Marco Tomamichel

We study a noise model for linear stochastic bandits for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector. We introduce an algorithm for this problem that exhibits a minimax regret scaling as $log^3(T)$ in the time horizon $T$, in stark contrast the square root scaling of this regret for typical bandit algorithms. Our strategy, based on weighted least-squares estimation, achieves the eigenvalue relation $lambda_{min} ( V_t ) = Omega (sqrt{lambda_{max}(V_t ) })$ for the design matrix $V_t$ at each time step $t$ through geometrical arguments that are independent of the noise model and might be of independent interest. This allows us to tightly control the expected regret in each time step to be of the order $O(frac1{t})$, leading to the logarithmic scaling of the cumulative regret.

5/30/2024