The Update-Equivalence Framework for Decision-Time Planning

Read original: arXiv:2304.13138 - Published 5/14/2024 by Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown

🖼️

Overview

The paper explores a new approach to decision-time planning in imperfect-information games, which are games where players have incomplete information about the game state.
Traditional methods for decision-time planning in imperfect-information games involve solving subgames, but this becomes computationally infeasible as the amount of non-public information grows.
The paper introduces an alternative framework called "update equivalence" that doesn't rely on solving subgames, allowing for scalability to games with large amounts of non-public information.
The authors derive two search algorithms based on this framework: one for fully cooperative games using mirror descent, and one for adversarial games using magnetic mirror descent.
The algorithms are validated on the game of Hanabi, a standard benchmark for search in fully cooperative imperfect-information games, where the mirror descent approach outperforms or matches public information-based search while using significantly less computation time.

Plain English Explanation

In many games, players have incomplete information about the game state, such as the cards held by their opponents. Decision-time planning involves revising or constructing a strategy at the time of play, rather than pre-computing a fixed strategy. This has been key to achieving superhuman performance in games like chess and Go.

However, existing decision-time planning methods for imperfect-information games involve solving subgames, which becomes computationally infeasible as the amount of hidden information grows. To address this, the researchers introduce a new framework called "update equivalence." Instead of solving subgames, the algorithms in this framework replicate the updates of last-iterate algorithms, which don't rely on public information.

This allows the algorithms to scale to games with large amounts of hidden information. The authors derive two specific algorithms using this framework: one for fully cooperative games using mirror descent, and one for adversarial games using magnetic mirror descent.

The researchers test these algorithms on the game of Hanabi, a challenging fully cooperative imperfect-information game. They find that their mirror descent approach exceeds or matches the performance of public information-based search, while using two orders of magnitude less computation time. This is the first time a non-public-information-based algorithm has outperformed public-information-based approaches in a domain they have historically dominated.

Technical Explanation

The paper introduces a new framework for decision-time planning in imperfect-information games called "update equivalence." Rather than solving subgames, as in previous approaches, the algorithms in this framework replicate the updates of last-iterate algorithms, which do not rely on public information.

This allows the algorithms to scale to games with large amounts of non-public information, where solving subgames becomes computationally infeasible. The authors derive two specific algorithms using this framework: one for fully cooperative games based on mirror descent, and one for adversarial games based on magnetic mirror descent.

The mirror descent algorithm is provably sound and is validated on the game of Hanabi, a standard benchmark for search in fully cooperative imperfect-information games. The authors find that their approach exceeds or matches the performance of public information-based search while using two orders of magnitude less search time. This is the first instance of a non-public-information-based algorithm outperforming public-information-based approaches in a domain they have historically dominated.

Critical Analysis

The paper presents a promising new approach to decision-time planning in imperfect-information games, addressing a key limitation of existing methods. By avoiding the need to solve subgames, the update-equivalence framework allows for scalability to games with large amounts of non-public information.

However, the paper does not explore the limits of this scalability. It would be valuable to understand how the performance of the algorithms scales as the amount of non-public information grows, both in terms of computational requirements and solution quality. Additionally, the paper focuses on fully cooperative and adversarial games, but many real-world imperfect-information scenarios involve more nuanced relationships between agents.

Further research could investigate the applicability of the update-equivalence framework to mixed-motive games or other game types where the relationship between agents is more complex. Lastly, the paper does not provide a detailed analysis of the limitations or failure modes of the proposed algorithms, which would be valuable for understanding their practical deployment.

Conclusion

The paper introduces a new framework for decision-time planning in imperfect-information games that addresses a key limitation of existing approaches. By replicating the updates of last-iterate algorithms instead of solving subgames, the update-equivalence framework allows for scalability to games with large amounts of non-public information.

The authors demonstrate the effectiveness of this approach through the development of two specific algorithms: one for fully cooperative games using mirror descent, and one for adversarial games using magnetic mirror descent. These algorithms are shown to outperform or match public information-based search in the benchmark game of Hanabi, while using significantly less computational resources.

This research represents an important step forward in the field of decision-time planning for imperfect-information games, with the potential to enable superhuman performance in a wider range of real-world applications where information is incomplete or asymmetric.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

The Update-Equivalence Framework for Decision-Time Planning

Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown

The process of revising (or constructing) a policy at execution time -- known as decision-time planning -- has been key to achieving superhuman performance in perfect-information games like chess and Go. A recent line of work has extended decision-time planning to imperfect-information games, leading to superhuman performance in poker. However, these methods involve solving subgames whose sizes grow quickly in the amount of non-public information, making them unhelpful when the amount of non-public information is large. Motivated by this issue, we introduce an alternative framework for decision-time planning that is not based on solving subgames, but rather on update equivalence. In this update-equivalence framework, decision-time planning algorithms replicate the updates of last-iterate algorithms, which need not rely on public information. This facilitates scalability to games with large amounts of non-public information. Using this framework, we derive a provably sound search algorithm for fully cooperative games based on mirror descent and a search algorithm for adversarial games based on magnetic mirror descent. We validate the performance of these algorithms in cooperative and adversarial domains, notably in Hanabi, the standard benchmark for search in fully cooperative imperfect-information games. Here, our mirror descent approach exceeds or matches the performance of public information-based search while using two orders of magnitude less search time. This is the first instance of a non-public-information-based algorithm outperforming public-information-based approaches in a domain they have historically dominated.

5/14/2024

Tree Search for Simultaneous Move Games via Equilibrium Approximation

Ryan Yu, Alex Olshevsky, Peter Chin

Neural network supported tree-search has shown strong results in a variety of perfect information multi-agent tasks. However, the performance of these methods on partial information games has generally been below competing approaches. Here we study the class of simultaneous-move games, which are a subclass of partial information games which are most similar to perfect information games: both agents know the game state with the exception of the opponent's move, which is revealed only after each agent makes its own move. Simultaneous move games include popular benchmarks such as Google Research Football and Starcraft. In this study we answer the question: can we take tree search algorithms trained through self-play from perfect information settings and adapt them to simultaneous move games without significant loss of performance? We answer this question by deriving a practical method that attempts to approximate a coarse correlated equilibrium as a subroutine within a tree search. Our algorithm works on cooperative, competitive, and mixed tasks. Our results are better than the current best MARL algorithms on a wide range of accepted baseline environments.

6/18/2024

🔍

Imperfect-Recall Games: Equilibrium Concepts and Their Complexity

Emanuel Tewolde, Brian Hu Zhang, Caspar Oesterheld, Manolis Zampetakis, Tuomas Sandholm, Paul W. Goldberg, Vincent Conitzer

We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before. An example is the absentminded driver game, as well as team games in which the members have limited communication capabilities. In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer settings across three different solution concepts: Nash, multiselves based on evidential decision theory (EDT), and multiselves based on causal decision theory (CDT). We are interested in both exact and approximate solution computation. As special cases, we consider (1) single-player games, (2) two-player zero-sum games and relationships to maximin values, and (3) games without exogenous stochasticity (chance nodes). We relate these problems to the complexity classes P, PPAD, PLS, $Sigma_2^P$ , $exists$R, and $exists forall$R.

6/26/2024

Polynomial-time Approximation Scheme for Equilibriums of Games

Hongbo Sun, Chongkun Xia, Junbo Tan, Bo Yuan, Xueqian Wang, Bin Liang

Whether a PTAS (polynomial-time approximation scheme) exists for game equilibriums has been an open question, and the absence of this polynomial-time algorithm has indications and consequences in three fields, such as the practicality of methods in algorithmic game theory, non-stationarity and curse of multiagency in MARL (multi-agent reinforcement learning), and the tractability of PPAD in computational complexity theory. In this paper, we introduce a geometric object called equilibrium bundle, which leads to a fundamental leap in the understanding of game equilibriums. Regarding the equilibrium bundle, first, we formalize perfect equilibriums of dynamic games as the zero points of its canonical section, second, we formalize a hybrid iteration of dynamic programming and interior point method as a line search on it, such that the method is an FPTAS (fully PTAS) for any perfect equilibrium of any dynamic game, implying PPAD=FP, third, we give the existence and oddness theorems of it as an extension of those of Nash equilibriums. As intermediate results, we introduce a concept called policy cone to give the sufficient and necessary condition for dynamic programming to converge to perfect equilibriums, and introduce two concepts called unbiased barrier problem and unbiased KKT conditions to make the interior point method to approximate Nash equilibriums. In experiment, the line search process is animated, and the method is tested on 2000 randomly generated dynamic games where it converges to a perfect equilibrium in every single case.

9/10/2024