Graph Neural Thompson Sampling

Read original: arXiv:2406.10686 - Published 6/24/2024 by Shuang Wu, Arash A. Amini

Overview

This paper introduces a new algorithm called "Graph Neural Thompson Sampling" (GNTS) for solving online learning problems on graph-structured data.
The GNTS algorithm combines the power of graph neural networks (GNNs) to capture complex dependencies in the data with the exploration-exploitation trade-off of the Thompson Sampling (TS) algorithm.
The authors demonstrate the effectiveness of GNTS on several graph-based online learning tasks, including node classification, link prediction, and recommendation systems.

Plain English Explanation

The paper proposes a new algorithm called "Graph Neural Thompson Sampling" (GNTS) to tackle online learning problems on graph-structured data. Online learning is when a system needs to make decisions and learn from them in real-time, rather than having all the data upfront.

Graphs are a way of representing data where things (called nodes) are connected to each other (called edges). For example, a social network can be represented as a graph, where each person is a node, and the connections between them are the edges.

The key insight of the GNTS algorithm is to combine the power of graph neural networks (GNNs) with the exploration-exploitation trade-off of the Thompson Sampling algorithm. GNNs are good at capturing the complex dependencies in graph-structured data, while Thompson Sampling helps the algorithm explore new options while also exploiting what it has already learned.

The authors show that GNTS outperforms other state-of-the-art algorithms on various graph-based online learning tasks, such as node classification (predicting the type of a node in the graph), link prediction (predicting new connections between nodes), and recommendation systems (suggesting new items to users based on the graph of past interactions).

Technical Explanation

The paper formulates the online learning problem on graph-structured data as a multi-armed bandit problem, where each arm corresponds to a possible action (e.g., classifying a node, recommending an item) and the goal is to maximize the cumulative reward over time.

The key components of the GNTS algorithm are:

Graph Neural Network: The authors use a GNN to learn a representation of the graph structure and node features. This allows GNTS to capture the complex dependencies in the data.
Thompson Sampling: GNTS uses the Thompson Sampling algorithm to balance the exploration of new actions and the exploitation of actions that have performed well in the past.
Efficient Sampling: To make the sampling process efficient, the authors propose a variational inference approach that approximates the posterior distribution of the model parameters.

The authors evaluate GNTS on several graph-based online learning tasks, including node classification, link prediction, and recommendation systems. They show that GNTS outperforms other state-of-the-art algorithms, such as ε-greedy Thompson Sampling and Feel-Good Thompson Sampling, by a significant margin.

Critical Analysis

The paper provides a novel and promising approach to solving online learning problems on graph-structured data. The combination of GNNs and Thompson Sampling is a clever idea that allows the algorithm to effectively capture the complex dependencies in the data while also balancing exploration and exploitation.

However, the paper does not address several potential limitations and areas for further research:

The authors do not discuss the computational complexity of the GNTS algorithm, which could be a concern for large-scale, real-world problems.
The paper only evaluates GNTS on a limited set of tasks and datasets. It would be valuable to see how the algorithm performs on a wider range of graph-based online learning problems.
The authors do not provide a theoretical analysis of the regret bounds or convergence properties of GNTS, which could be an important area for future work.

Overall, the GNTS algorithm presents an exciting new approach to graph-based online learning, and the authors have made a valuable contribution to the field. However, there is still room for further research and improvements to address the potential limitations of the current work.

Conclusion

The "Graph Neural Thompson Sampling" (GNTS) algorithm introduced in this paper represents an important step forward in solving online learning problems on graph-structured data. By combining the representational power of graph neural networks with the exploration-exploitation trade-off of Thompson Sampling, GNTS is able to effectively capture the complex dependencies in the data and make informed decisions over time.

The authors demonstrate the effectiveness of GNTS on several real-world graph-based online learning tasks, including node classification, link prediction, and recommendation systems. The results show that GNTS outperforms other state-of-the-art algorithms, making it a promising approach for a wide range of applications that involve graph-structured data.

While the paper does not address all the potential limitations of the GNTS algorithm, it represents an important contribution to the field of online learning and graph neural networks. As the research in this area continues to evolve, the insights and techniques presented in this paper are likely to have a lasting impact on the development of more sophisticated and effective algorithms for solving complex, real-world problems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Graph Neural Thompson Sampling

Shuang Wu, Arash A. Amini

We consider an online decision-making problem with a reward function defined over graph-structured data. We formally formulate the problem as an instance of graph action bandit. We then propose texttt{GNN-TS}, a Graph Neural Network (GNN) powered Thompson Sampling (TS) algorithm which employs a GNN approximator for estimating the mean reward function and the graph neural tangent features for uncertainty estimation. We prove that, under certain boundness assumptions on the reward function, GNN-TS achieves a state-of-the-art regret bound which is (1) sub-linear of order $tilde{mathcal{O}}((tilde{d} T)^{1/2})$ in the number of interaction rounds, $T$, and a notion of effective dimension $tilde{d}$, and (2) independent of the number of graph nodes. Empirical results validate that our proposed texttt{GNN-TS} exhibits competitive performance and scales well on graph action bandit problems.

6/24/2024

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu

Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al., 2021), which was previously known to be computationally intractable in general. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.

6/19/2024

➖

Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning

Hongseok Namkoong, Samuel Daulton, Eytan Bakshy

Thompson sampling (TS) has emerged as a robust technique for contextual bandit problems. However, TS requires posterior inference and optimization for action generation, prohibiting its use in many online platforms where latency and ease of deployment are of concern. We operationalize TS by proposing a novel imitation-learning-based algorithm that distills a TS policy into an explicit policy representation, allowing fast decision-making and easy deployment in mobile and server-based environments. Using batched data collected under the imitation policy, our algorithm iteratively performs offline updates to the TS policy, and learns a new explicit policy representation to imitate it. Empirically, our imitation policy achieves performance comparable to batch TS while allowing more than an order of magnitude reduction in decision-time latency. Buoyed by low latency and simplicity of implementation, our algorithm has been successfully deployed in multiple video upload systems for Meta. Using a randomized controlled trial, we show our algorithm resulted in significant improvements in video quality and watch time.

7/23/2024

Efficient and Adaptive Posterior Sampling Algorithms for Bandits

Bingshan Hu, Zhiming Huang, Tianyue H. Zhang, Mathias L'ecuyer, Nidhi Hegde

We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T le 288 e^{64}$, we derive a more practical bound that tightens the coefficient of the leading term %from $288 e^{64}$ to $1270$. Additionally, motivated by large-scale real-world applications that require scalability, adaptive computational resource allocation, and a balance in utility and computation, we propose two parameterized Thompson Sampling-based algorithms: Thompson Sampling with Model Aggregation (TS-MA-$alpha$) and Thompson Sampling with Timestamp Duelling (TS-TD-$alpha$), where $alpha in [0,1]$ controls the trade-off between utility and computation. Both algorithms achieve $O left(Kln^{alpha+1}(T)/Delta right)$ regret bound, where $K$ is the number of arms, $T$ is the finite learning horizon, and $Delta$ denotes the single round performance loss when pulling a sub-optimal arm.

5/3/2024