Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers

2106.09435

Published 4/19/2024 by Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, Thore Graepel

🏋️

Abstract

Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting. We propose Joint Policy-Space Response Oracles (JPSRO), an algorithm for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium. We further suggest correlated equilibria (CE) as promising meta-solvers, and propose a novel solution concept Maximum Gini Correlated Equilibrium (MGCE), a principled and computationally efficient family of solutions for solving the correlated equilibrium selection problem. We conduct several experiments using CE meta-solvers for JPSRO and demonstrate convergence on n-player, general-sum games.

Create account to get full access

Overview

The paper proposes an algorithm called Joint Policy-Space Response Oracles (JPSRO) for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium.
The authors suggest correlated equilibria (CE) as promising meta-solvers and propose a novel solution concept called Maximum Gini Correlated Equilibrium (MGCE), a computationally efficient family of solutions for solving the correlated equilibrium selection problem.
The paper presents several experiments using CE meta-solvers for JPSRO and demonstrates convergence on n-player, general-sum games.

Plain English Explanation

The paper focuses on a type of game called n-player, general-sum extensive form games. These are complex games with multiple players and where the overall outcome is not necessarily a zero-sum game (i.e., one player's gain is not the other's loss). The authors propose an algorithm called Joint Policy-Space Response Oracles (JPSRO) that can train agents to play these games and guaranteed to converge to an equilibrium, where no player can improve their outcome by changing their strategy.

The authors also suggest using correlated equilibria (CE) as a way to solve these games. CE is a type of solution where the players' strategies are correlated, rather than independent. The paper introduces a new solution concept called Maximum Gini Correlated Equilibrium (MGCE), which is a computationally efficient way to find a CE solution.

The researchers conduct experiments using CE meta-solvers with the JPSRO algorithm, and demonstrate that this approach can converge on n-player, general-sum games.

Technical Explanation

The paper proposes the Joint Policy-Space Response Oracles (JPSRO) algorithm for training agents in n-player, general-sum extensive form games. JPSRO works by having each agent maintain a set of policy oracles, which are models that can predict the best responses of the other agents. The agents then use these oracles to update their own policies, with the goal of converging to an equilibrium.

The authors also suggest using correlated equilibria (CE) as a meta-solver for these games. CE is a solution concept where the players' strategies are correlated, rather than independent. The paper introduces a novel solution concept called Maximum Gini Correlated Equilibrium (MGCE), which is a computationally efficient way to find a CE solution.

The researchers conduct several experiments using CE meta-solvers with the JPSRO algorithm and demonstrate convergence on n-player, general-sum games. The experiments show that the JPSRO algorithm, combined with CE meta-solvers, can effectively train agents to play these complex games and reach an equilibrium.

Critical Analysis

The paper presents a promising approach for training agents in n-player, general-sum extensive form games, which is a significant advancement beyond the well-studied two-player, constant-sum setting. The authors' use of correlated equilibria (CE) and the introduction of the Maximum Gini Correlated Equilibrium (MGCE) solution concept appear to be valuable contributions.

However, the paper does not address the potential limitations of the JPSRO algorithm, such as the computational complexity of maintaining and updating the policy oracles, or the sensitivity of the algorithm to the initial conditions and hyperparameters. Additionally, the paper does not provide a comprehensive analysis of the performance of the MGCE solution concept compared to other CE solution methods.

Further research could explore the scalability of the JPSRO algorithm to larger-scale games, as well as investigate the robustness of the approach to noisy or incomplete information, which are common challenges in real-world applications. Comparisons to other equilibrium-finding algorithms, such as No-Regret Learning or Best Response Shaping, could also provide valuable insights.

Conclusion

The paper presents a significant advancement in the field of multi-agent reinforcement learning by proposing the JPSRO algorithm for training agents in n-player, general-sum extensive form games. The authors' use of correlated equilibria and the introduction of the MGCE solution concept are promising approaches for solving these complex games.

The experimental results demonstrate the potential of this approach, but further research is needed to address the algorithm's limitations and explore its broader applicability. As the field of multi-agent systems continues to evolve, contributions like this paper will be crucial in developing more robust and scalable solutions for real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles

Jiesong Lian

A popular approach for solving zero-sum games is to maintain populations of policies to approximate the Nash Equilibrium (NE). Previous studies have shown that Policy Space Response Oracle (PSRO) algorithm is an effective multi-agent reinforcement learning framework for solving such games. However, repeatedly training new policies from scratch to approximate Best Response (BR) to opponents' mixed policies at each iteration is both inefficient and costly. While some PSRO variants initialize a new policy by inheriting from past BR policies, this approach limits the exploration of new policies, especially against challenging opponents. To address this issue, we propose Fusion-PSRO, which employs policy fusion to initialize policies for better approximation to BR. By selecting high-quality base policies from meta-NE, policy fusion fuses the base policies into a new policy through model averaging. This approach allows the initialized policies to incorporate multiple expert policies, making it easier to handle difficult opponents compared to inheriting from past BR policies or initializing from scratch. Moreover, our method only modifies the policy initialization phase, allowing its application to nearly all PSRO variants without additional training overhead. Our experiments on non-transitive matrix games, Leduc Poker, and the more complex Liars Dice demonstrate that Fusion-PSRO enhances the performance of nearly all PSRO variants, achieving lower exploitability.

6/24/2024

cs.GT cs.AI cs.LG cs.MA

🛠️

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng

We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $O(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $O(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.

5/3/2024

cs.LG cs.GT

🤖

Towards Principled Superhuman AI for Multiplayer Symmetric Games

Jiawei Ge, Yuanhao Wang, Wenzhe Li, Chi Jin

Multiplayer games, when the number of players exceeds two, present unique challenges that fundamentally distinguish them from the extensively studied two-player zero-sum games. These challenges arise from the non-uniqueness of equilibria and the risk of agents performing highly suboptimally when adopting equilibrium strategies. While a line of recent works developed learning systems successfully achieving human-level or even superhuman performance in popular multiplayer games such as Mahjong, Poker, and Diplomacy, two critical questions remain unaddressed: (1) What is the correct solution concept that AI agents should find? and (2) What is the general algorithmic framework that provably solves all games within this class? This paper takes the first step towards solving these unique challenges of multiplayer games by provably addressing both questions in multiplayer symmetric normal-form games. We also demonstrate that many meta-algorithms developed in prior practical systems for multiplayer games can fail to achieve even the basic goal of obtaining agent's equal share of the total reward.

6/7/2024

cs.LG cs.MA stat.ML

Policy Space Response Oracles: A Survey

Ariyan Bighashdel, Yongzhao Wang, Stephen McAleer, Rahul Savani, Frans A. Oliehoek

Game theory provides a mathematical way to study the interaction between multiple decision makers. However, classical game-theoretic analysis is limited in scalability due to the large number of strategies, precluding direct application to more complex scenarios. This survey provides a comprehensive overview of a framework for large games, known as Policy Space Response Oracles (PSRO), which holds promise to improve scalability by focusing attention on sufficient subsets of strategies. We first motivate PSRO and provide historical context. We then focus on the strategy exploration problem for PSRO: the challenge of assembling effective subsets of strategies that still represent the original game well with minimum computational cost. We survey current research directions for enhancing the efficiency of PSRO, and explore the applications of PSRO across various domains. We conclude by discussing open questions and future research.

5/28/2024

cs.GT cs.AI cs.MA