Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification

Read original: arXiv:2109.14412 - Published 4/23/2024 by James A. Grant, David S. Leslie

🏷️

Overview

Considers an online binary classification problem where the learner can observe the true label if they assign a positive (1) label
Learner faces a trade-off between classification accuracy and information gain
This problem is studied as a partial monitoring problem with side information, assuming a logistic regression model linking features to true classes
Main contribution is a study of the performance of Thompson Sampling (TS) for this problem, showing improved Bayesian regret bounds compared to previous approaches
Also experimentally verifies that efficient approximations to TS and Information Directed Sampling via Pólya-Gamma augmentation outperform existing methods

Plain English Explanation

In this research, the authors look at a variation of an online binary classification problem, where a learner (like a machine learning model) is tasked with assigning labels of 0 or 1 to a series of items. The twist is that if the learner chooses to label an item as 1, they immediately get to see the item's true label. This creates a trade-off for the learner - they can either focus on getting the short-term classifications right, or they can choose to label some items as 1 in order to gain more information about the underlying patterns and improve their long-term performance.

The authors frame this problem as a "partial monitoring problem with side information," meaning the learner doesn't have full information about the true labels, but they do have some additional data (the "side information") that can help them make better decisions. In this case, the side information is a logistic regression model that links the features of each item to its true class.

The main contribution of the paper is a detailed analysis of how a technique called Thompson Sampling performs on this problem. Thompson Sampling is a way of making decisions under uncertainty that has been shown to work well in other, similar problems. The authors use some advanced mathematical tools to prove that Thompson Sampling can achieve better performance guarantees (in terms of "Bayesian regret") compared to previous approaches.

They also experimentally show that efficient approximations to Thompson Sampling and another related technique called Information Directed Sampling tend to outperform existing methods when applied to this problem in practice.

Technical Explanation

The authors consider an online binary classification problem where a learner sequentially assigns labels (0 or 1) to items with unknown true classes. If the learner chooses label 1, they immediately observe the true label of the item. This setup creates a trade-off between short-term classification accuracy and long-term information gain, which the authors frame as a partial monitoring problem with side information.

Specifically, the authors assume a logistic regression model linking the item features to the true classes. Their main contribution is a theoretical analysis of the performance of Thompson Sampling (TS) for this problem. Using recent information-theoretic tools, they show that TS achieves a Bayesian regret bound of improved order compared to previous approaches.

The authors also experimentally evaluate efficient approximations to TS and Information Directed Sampling (IDS) via Pólya-Gamma augmentation. They find that these methods outperform existing techniques in terms of empirical performance on the "apple tasting" problem.

Critical Analysis

The paper provides a thorough theoretical and empirical analysis of the "apple tasting" problem, a novel online binary classification task with partial feedback. The authors' focus on leveraging Thompson Sampling and related techniques is well-justified, as these methods have shown strong performance in other contextual bandit and online learning settings.

One potential limitation of the research is the reliance on the specific logistic regression model for linking features to true classes. While this assumption allows for a more tractable analysis, it may not capture more complex relationships that could arise in real-world applications. Exploring extensions to more flexible models would be an interesting direction for future work.

Additionally, the authors do not discuss potential practical challenges in implementing the proposed algorithms, such as computational complexity or sensitivity to hyperparameter choices. A more in-depth discussion of these implementation details and their impact on the performance of the methods would help readers better understand the practical tradeoffs.

Overall, this paper makes a valuable contribution to the literature on partial monitoring problems and demonstrates the effectiveness of Thompson Sampling and related techniques in this setting. The clear explanations and analyses provide a solid foundation for further research in this area.

Conclusion

This research explores a variant of online binary classification where the learner can observe the true label of an item if they choose to assign a positive (1) label. The authors frame this as a partial monitoring problem with side information in the form of a logistic regression model linking item features to true classes.

The key contribution of the paper is a thorough analysis of the performance of Thompson Sampling (TS) for this problem, showing improved Bayesian regret bounds compared to previous approaches. The authors also demonstrate that efficient approximations to TS and Information Directed Sampling outperform existing methods in empirical evaluations.

These findings have important implications for the design of online learning and decision-making systems, particularly in scenarios where there is a trade-off between short-term performance and long-term information gain. The insights from this research can help inform the development of more effective and adaptable classification models in a variety of real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏷️

Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification

James A. Grant, David S. Leslie

We consider a variant of online binary classification where a learner sequentially assigns labels ($0$ or $1$) to items with unknown true class. If, but only if, the learner chooses label $1$ they immediately observe the true label of the item. The learner faces a trade-off between short-term classification accuracy and long-term information gain. This problem has previously been studied under the name of the `apple tasting' problem. We revisit this problem as a partial monitoring problem with side information, and focus on the case where item features are linked to true classes via a logistic regression model. Our principal contribution is a study of the performance of Thompson Sampling (TS) for this problem. Using recently developed information-theoretic tools, we show that TS achieves a Bayesian regret bound of an improved order to previous approaches. Further, we experimentally verify that efficient approximations to TS and Information Directed Sampling via P'{o}lya-Gamma augmentation have superior empirical performance to existing methods.

4/23/2024

🎲

Apple Tasting: Combinatorial Dimensions and Minimax Rates

Vinod Raman, Unique Subedi, Ananth Raman, Ambuj Tewari

In online binary classification under emph{apple tasting} feedback, the learner only observes the true label if it predicts ``1. First studied by cite{helmbold2000apple}, we revisit this classical partial-feedback setting and study online learnability from a combinatorial perspective. We show that the Littlestone dimension continues to provide a tight quantitative characterization of apple tasting in the agnostic setting, closing an open question posed by cite{helmbold2000apple}. In addition, we give a new combinatorial parameter, called the Effective width, that tightly quantifies the minimax expected mistakes in the realizable setting. As a corollary, we use the Effective width to establish a emph{trichotomy} of the minimax expected number of mistakes in the realizable setting. In particular, we show that in the realizable setting, the expected number of mistakes of any learner, under apple tasting feedback, can be $Theta(1), Theta(sqrt{T})$, or $Theta(T)$. This is in contrast to the full-information realizable setting where only $Theta(1)$ and $Theta(T)$ are possible.

6/21/2024

➖

Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning

Hongseok Namkoong, Samuel Daulton, Eytan Bakshy

Thompson sampling (TS) has emerged as a robust technique for contextual bandit problems. However, TS requires posterior inference and optimization for action generation, prohibiting its use in many online platforms where latency and ease of deployment are of concern. We operationalize TS by proposing a novel imitation-learning-based algorithm that distills a TS policy into an explicit policy representation, allowing fast decision-making and easy deployment in mobile and server-based environments. Using batched data collected under the imitation policy, our algorithm iteratively performs offline updates to the TS policy, and learns a new explicit policy representation to imitate it. Empirically, our imitation policy achieves performance comparable to batch TS while allowing more than an order of magnitude reduction in decision-time latency. Buoyed by low latency and simplicity of implementation, our algorithm has been successfully deployed in multiple video upload systems for Meta. Using a randomized controlled trial, we show our algorithm resulted in significant improvements in video quality and watch time.

7/23/2024

Online Learning of Decision Trees with Thompson Sampling

Ayman Chaouki, Jesse Read, Albert Bifet

Decision Trees are prominent prediction models for interpretable Machine Learning. They have been thoroughly researched, mostly in the batch setting with a fixed labelled dataset, leading to popular algorithms such as C4.5, ID3 and CART. Unfortunately, these methods are of heuristic nature, they rely on greedy splits offering no guarantees of global optimality and often leading to unnecessarily complex and hard-to-interpret Decision Trees. Recent breakthroughs addressed this suboptimality issue in the batch setting, but no such work has considered the online setting with data arriving in a stream. To this end, we devise a new Monte Carlo Tree Search algorithm, Thompson Sampling Decision Trees (TSDT), able to produce optimal Decision Trees in an online setting. We analyse our algorithm and prove its almost sure convergence to the optimal tree. Furthermore, we conduct extensive experiments to validate our findings empirically. The proposed TSDT outperforms existing algorithms on several benchmarks, all while presenting the practical advantage of being tailored to the online setting.

4/10/2024