Active clustering with bandit feedback

Read original: arXiv:2406.11485 - Published 6/18/2024 by Victor Thuot (MISTEA), Alexandra Carpentier (CELESTE), Christophe Giraud (CELESTE), Nicolas Verzelen (MISTEA)

🔗

Overview

The paper investigates the Active Clustering Problem (ACP), where a learner interacts with a stochastic multi-armed bandit with high-dimensional feedback
The arms are divided into hidden groups, and the learner's task is to uncover this hidden partition with the smallest budget (i.e., fewest observations) and a low probability of error
The paper (i) derives a lower bound for the budget required to solve the ACP, and (ii) introduces the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes

Plain English Explanation

The researchers are studying a challenging problem called the Active Clustering Problem (ACP). Imagine you have a slot machine with many different arms (options), and each arm gives you a different reward when you pull it. However, the arms are secretly divided into groups, and the arms within the same group have the same average reward.

Your goal is to figure out how the arms are grouped together, but you want to do this as efficiently as possible - by pulling the arms as few times as possible. The researchers show that there is a lower limit on how few pulls you need to do this with a high chance of getting the grouping right. They also introduce a new algorithm called ACB that can usually achieve this lower limit, doing better than a simple strategy of pulling each arm the same number of times.

Importantly, the researchers find that in this active setting, where you can choose which arms to pull, there is no gap between the computational power you need and the information you need to solve the problem. This is different from some other related problems, where there can be a gap.

Technical Explanation

The researchers investigate the Active Clustering Problem (ACP), where a learner interacts with an N-armed stochastic bandit with d-dimensional subGaussian feedback. There exists a hidden partition of the arms into K groups, such that arms within the same group share the same mean vector.

The learner's task is to uncover this hidden partition with the smallest budget (i.e., the least number of observations) and a probability of error smaller than a prescribed constant δ. The researchers (i) derive a non-asymptotic lower bound for the budget required to solve the ACP, and (ii) introduce the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes.

The researchers show that ACB outperforms a uniform sampling strategy, where each arm is pulled the same number of times. Importantly, the researchers establish that there is no computation-information gap in the active setting, unlike in the batch setting (where all observations are collected before analysis). This is in contrast to other related problems, such as federated combinatorial multi-agent multi-armed bandits and neural active learning beyond bandits, where such gaps can exist.

Critical Analysis

The researchers acknowledge some limitations of their work. For example, they assume that the arms' rewards are subGaussian, which may not always hold in practice. Additionally, the researchers focus on the setting where the number of groups K is known in advance, which may not always be the case in real-world applications.

Further research could explore relaxing these assumptions, or investigate the performance of the ACB algorithm in more complex scenarios, such as the tree bandits with generative Bayes setting. It would also be interesting to see how the ACB algorithm compares to other active clustering approaches, such as the data-driven upper confidence bounds method.

Conclusion

In this paper, the researchers make significant contributions to the understanding of the Active Clustering Problem (ACP). They derive a non-asymptotic lower bound for the budget required to solve the problem and introduce the computationally efficient ACB algorithm, which matches this lower bound in most regimes. Importantly, the researchers establish that there is no computation-information gap in the active setting, in contrast to other related problems.

These findings have important implications for the design of efficient exploration strategies in high-dimensional bandit problems, with potential applications in areas such as recommendation systems, online advertising, and interactive learning. The insights from this research could also inform the development of new active learning algorithms that can more effectively uncover hidden structures in complex data.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔗

Active clustering with bandit feedback

Victor Thuot (MISTEA), Alexandra Carpentier (CELESTE), Christophe Giraud (CELESTE), Nicolas Verzelen (MISTEA)

We investigate the Active Clustering Problem (ACP). A learner interacts with an $N$-armed stochastic bandit with $d$-dimensional subGaussian feedback. There exists a hidden partition of the arms into $K$ groups, such that arms within the same group, share the same mean vector. The learner's task is to uncover this hidden partition with the smallest budget - i.e., the least number of observation - and with a probability of error smaller than a prescribed constant $delta$. In this paper, (i) we derive a non-asymptotic lower bound for the budget, and (ii) we introduce the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes. We improve on the performance of a uniform sampling strategy. Importantly, contrary to the batch setting, we establish that there is no computation-information gap in the active setting.

6/18/2024

🔗

Optimal Clustering with Bandit Feedback

Junwen Yang, Zixin Zhong, Vincent Y. F. Tan

This paper considers the problem of online clustering with bandit feedback. A set of arms (or items) can be partitioned into various groups that are unknown. Within each group, the observations associated to each of the arms follow the same distribution with the same mean vector. At each time step, the agent queries or pulls an arm and obtains an independent observation from the distribution it is associated to. Subsequent pulls depend on previous ones as well as the previously obtained samples. The agent's task is to uncover the underlying partition of the arms with the least number of arm pulls and with a probability of error not exceeding a prescribed constant $delta$. The problem proposed finds numerous applications from clustering of variants of viruses to online market segmentation. We present an instance-dependent information-theoretic lower bound on the expected sample complexity for this task, and design a computationally efficient and asymptotically optimal algorithm, namely Bandit Online Clustering (BOC). The algorithm includes a novel stopping rule for adaptive sequential testing that circumvents the need to exactly solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower bound asymptotically, and significantly outperforms a non-adaptive baseline algorithm.

5/16/2024

A General Framework for Clustering and Distribution Matching with Bandit Feedback

Recep Can Yavas, Yuqi Huang, Vincent Y. F. Tan, Jonathan Scarlett

We develop a general framework for clustering and distribution matching problems with bandit feedback. We consider a $K$-armed bandit model where some subset of $K$ arms is partitioned into $M$ groups. Within each group, the random variable associated to each arm follows the same distribution on a finite alphabet. At each time step, the decision maker pulls an arm and observes its outcome from the random variable associated to that arm. Subsequent arm pulls depend on the history of arm pulls and their outcomes. The decision maker has no knowledge of the distributions of the arms or the underlying partitions. The task is to devise an online algorithm to learn the underlying partition of arms with the least number of arm pulls on average and with an error probability not exceeding a pre-determined value $delta$. Several existing problems fall under our general framework, including finding $M$ pairs of arms, odd arm identification, and $M$-ary clustering of $K$ arms belong to our general framework. We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$. Furthermore, we develop a computationally-efficient online algorithm based on the Track-and-Stop method and Frank--Wolfe algorithm, and show that the average number of arm pulls of our algorithm asymptotically matches that of the lower bound. Our refined analysis also uncovers a novel bound on the speed at which the average number of arm pulls of our algorithm converges to the fundamental limit as $delta$ vanishes.

9/11/2024

🏷️

Fast Rates for Bandit PAC Multiclass Classification

Liad Erez, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran

We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$-PAC version of the problem, with sample complexity of $Obig( (operatorname{poly}(K) + 1 / varepsilon^2) log (|H| / delta) big)$ for any finite hypothesis class $H$. In terms of the leading dependence on $varepsilon$, this improves upon existing bounds for the problem, that are of the form $O(K/varepsilon^2)$. We also provide an extension of this result to general classes and establish similar sample complexity bounds in which $log |H|$ is replaced by the Natarajan dimension. This matches the optimal rate in the full-information version of the problem and resolves an open question studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011) who demonstrated that the multiplicative price of bandit feedback in realizable PAC learning is $Theta(K)$. We complement this by revealing a stark contrast with the agnostic case, where the price of bandit feedback is only $O(1)$ as $varepsilon to 0$. Our algorithm utilizes a stochastic optimization technique to minimize a log-barrier potential based on Frank-Wolfe updates for computing a low-variance exploration distribution over the hypotheses, and is made computationally efficient provided access to an ERM oracle over $H$.

6/19/2024