The Update-Equivalence Framework for Decision-Time Planning
0
🖼️
Sign in to get full access
Overview
- The paper explores a new approach to decision-time planning in imperfect-information games, which are games where players have incomplete information about the game state.
- Traditional methods for decision-time planning in imperfect-information games involve solving subgames, but this becomes computationally infeasible as the amount of non-public information grows.
- The paper introduces an alternative framework called "update equivalence" that doesn't rely on solving subgames, allowing for scalability to games with large amounts of non-public information.
- The authors derive two search algorithms based on this framework: one for fully cooperative games using mirror descent, and one for adversarial games using magnetic mirror descent.
- The algorithms are validated on the game of Hanabi, a standard benchmark for search in fully cooperative imperfect-information games, where the mirror descent approach outperforms or matches public information-based search while using significantly less computation time.
Plain English Explanation
In many games, players have incomplete information about the game state, such as the cards held by their opponents. Decision-time planning involves revising or constructing a strategy at the time of play, rather than pre-computing a fixed strategy. This has been key to achieving superhuman performance in games like chess and Go.
However, existing decision-time planning methods for imperfect-information games involve solving subgames, which becomes computationally infeasible as the amount of hidden information grows. To address this, the researchers introduce a new framework called "update equivalence." Instead of solving subgames, the algorithms in this framework replicate the updates of last-iterate algorithms, which don't rely on public information.
This allows the algorithms to scale to games with large amounts of hidden information. The authors derive two specific algorithms using this framework: one for fully cooperative games using mirror descent, and one for adversarial games using magnetic mirror descent.
The researchers test these algorithms on the game of Hanabi, a challenging fully cooperative imperfect-information game. They find that their mirror descent approach exceeds or matches the performance of public information-based search, while using two orders of magnitude less computation time. This is the first time a non-public-information-based algorithm has outperformed public-information-based approaches in a domain they have historically dominated.
Technical Explanation
The paper introduces a new framework for decision-time planning in imperfect-information games called "update equivalence." Rather than solving subgames, as in previous approaches, the algorithms in this framework replicate the updates of last-iterate algorithms, which do not rely on public information.
This allows the algorithms to scale to games with large amounts of non-public information, where solving subgames becomes computationally infeasible. The authors derive two specific algorithms using this framework: one for fully cooperative games based on mirror descent, and one for adversarial games based on magnetic mirror descent.
The mirror descent algorithm is provably sound and is validated on the game of Hanabi, a standard benchmark for search in fully cooperative imperfect-information games. The authors find that their approach exceeds or matches the performance of public information-based search while using two orders of magnitude less search time. This is the first instance of a non-public-information-based algorithm outperforming public-information-based approaches in a domain they have historically dominated.
Critical Analysis
The paper presents a promising new approach to decision-time planning in imperfect-information games, addressing a key limitation of existing methods. By avoiding the need to solve subgames, the update-equivalence framework allows for scalability to games with large amounts of non-public information.
However, the paper does not explore the limits of this scalability. It would be valuable to understand how the performance of the algorithms scales as the amount of non-public information grows, both in terms of computational requirements and solution quality. Additionally, the paper focuses on fully cooperative and adversarial games, but many real-world imperfect-information scenarios involve more nuanced relationships between agents.
Further research could investigate the applicability of the update-equivalence framework to mixed-motive games or other game types where the relationship between agents is more complex. Lastly, the paper does not provide a detailed analysis of the limitations or failure modes of the proposed algorithms, which would be valuable for understanding their practical deployment.
Conclusion
The paper introduces a new framework for decision-time planning in imperfect-information games that addresses a key limitation of existing approaches. By replicating the updates of last-iterate algorithms instead of solving subgames, the update-equivalence framework allows for scalability to games with large amounts of non-public information.
The authors demonstrate the effectiveness of this approach through the development of two specific algorithms: one for fully cooperative games using mirror descent, and one for adversarial games using magnetic mirror descent. These algorithms are shown to outperform or match public information-based search in the benchmark game of Hanabi, while using significantly less computational resources.
This research represents an important step forward in the field of decision-time planning for imperfect-information games, with the potential to enable superhuman performance in a wider range of real-world applications where information is incomplete or asymmetric.
This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!
Related Papers
🖼️
0
The Update-Equivalence Framework for Decision-Time Planning
Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown
The process of revising (or constructing) a policy at execution time -- known as decision-time planning -- has been key to achieving superhuman performance in perfect-information games like chess and Go. A recent line of work has extended decision-time planning to imperfect-information games, leading to superhuman performance in poker. However, these methods involve solving subgames whose sizes grow quickly in the amount of non-public information, making them unhelpful when the amount of non-public information is large. Motivated by this issue, we introduce an alternative framework for decision-time planning that is not based on solving subgames, but rather on update equivalence. In this update-equivalence framework, decision-time planning algorithms replicate the updates of last-iterate algorithms, which need not rely on public information. This facilitates scalability to games with large amounts of non-public information. Using this framework, we derive a provably sound search algorithm for fully cooperative games based on mirror descent and a search algorithm for adversarial games based on magnetic mirror descent. We validate the performance of these algorithms in cooperative and adversarial domains, notably in Hanabi, the standard benchmark for search in fully cooperative imperfect-information games. Here, our mirror descent approach exceeds or matches the performance of public information-based search while using two orders of magnitude less search time. This is the first instance of a non-public-information-based algorithm outperforming public-information-based approaches in a domain they have historically dominated.
Read more5/14/2024
0
Tree Search for Simultaneous Move Games via Equilibrium Approximation
Ryan Yu, Alex Olshevsky, Peter Chin
Neural network supported tree-search has shown strong results in a variety of perfect information multi-agent tasks. However, the performance of these methods on partial information games has generally been below competing approaches. Here we study the class of simultaneous-move games, which are a subclass of partial information games which are most similar to perfect information games: both agents know the game state with the exception of the opponent's move, which is revealed only after each agent makes its own move. Simultaneous move games include popular benchmarks such as Google Research Football and Starcraft. In this study we answer the question: can we take tree search algorithms trained through self-play from perfect information settings and adapt them to simultaneous move games without significant loss of performance? We answer this question by deriving a practical method that attempts to approximate a coarse correlated equilibrium as a subroutine within a tree search. Our algorithm works on cooperative, competitive, and mixed tasks. Our results are better than the current best MARL algorithms on a wide range of accepted baseline environments.
Read more6/18/2024
🔍
0
Imperfect-Recall Games: Equilibrium Concepts and Their Complexity
Emanuel Tewolde, Brian Hu Zhang, Caspar Oesterheld, Manolis Zampetakis, Tuomas Sandholm, Paul W. Goldberg, Vincent Conitzer
We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before. An example is the absentminded driver game, as well as team games in which the members have limited communication capabilities. In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer settings across three different solution concepts: Nash, multiselves based on evidential decision theory (EDT), and multiselves based on causal decision theory (CDT). We are interested in both exact and approximate solution computation. As special cases, we consider (1) single-player games, (2) two-player zero-sum games and relationships to maximin values, and (3) games without exogenous stochasticity (chance nodes). We relate these problems to the complexity classes P, PPAD, PLS, $Sigma_2^P$ , $exists$R, and $exists forall$R.
Read more6/26/2024
0
Polynomial-time Approximation Scheme for Equilibriums of Games
Hongbo Sun, Chongkun Xia, Junbo Tan, Bo Yuan, Xueqian Wang, Bin Liang
Whether a PTAS (polynomial-time approximation scheme) exists for game equilibriums has been an open question, and the absence of this polynomial-time algorithm has indications and consequences in three fields, such as the practicality of methods in algorithmic game theory, non-stationarity and curse of multiagency in MARL (multi-agent reinforcement learning), and the tractability of PPAD in computational complexity theory. In this paper, we introduce a geometric object called equilibrium bundle, which leads to a fundamental leap in the understanding of game equilibriums. Regarding the equilibrium bundle, first, we formalize perfect equilibriums of dynamic games as the zero points of its canonical section, second, we formalize a hybrid iteration of dynamic programming and interior point method as a line search on it, such that the method is an FPTAS (fully PTAS) for any perfect equilibrium of any dynamic game, implying PPAD=FP, third, we give the existence and oddness theorems of it as an extension of those of Nash equilibriums. As intermediate results, we introduce a concept called policy cone to give the sufficient and necessary condition for dynamic programming to converge to perfect equilibriums, and introduce two concepts called unbiased barrier problem and unbiased KKT conditions to make the interior point method to approximate Nash equilibriums. In experiment, the line search process is animated, and the method is tested on 2000 randomly generated dynamic games where it converges to a perfect equilibrium in every single case.
Read more9/10/2024