Partially Observable Stochastic Games with Neural Perception Mechanisms

Read original: arXiv:2310.11566 - Published 7/2/2024 by Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska

🧠

Overview

Stochastic games are a well-established model for multi-agent decision-making under uncertainty
In practical applications, agents often have only partial observability of their environment
Agents increasingly use data-driven approaches like neural networks to perceive their environment
This paper proposes a new model called neuro-symbolic partially-observable stochastic games (NS-POSGs) to address these challenges

Plain English Explanation

The paper discusses a new model called neuro-symbolic partially-observable stochastic games (NS-POSGs) that can be used to study multi-agent decision-making in real-world scenarios with uncertainty and incomplete information. In many practical applications, the agents (e.g., robots, autonomous vehicles) don't have full visibility of their surroundings and instead rely on data-driven approaches like neural networks to perceive their environment.

The NS-POSG model aims to capture this type of setting, where one agent has complete information about the environment, while the other agent only has partial, data-driven observations. The authors present a new method called one-sided NS-HSVI to approximately solve these types of games, which leverages the structure of the model to represent beliefs and make decisions efficiently.

The paper demonstrates the practical applicability of this approach through examples in pedestrian-vehicle and pursuit-evasion scenarios. These types of multi-agent interactions with partial observability are increasingly common in real-world AI applications like self-driving cars and robotics.

Technical Explanation

The paper introduces the neuro-symbolic partially-observable stochastic game (NS-POSG) model, which extends continuous-space concurrent stochastic games to incorporate neural network-based perception mechanisms. The authors focus on a one-sided setting, where one agent has complete information about the environment, while the other agent relies on discrete, data-driven observations.

To solve these types of games, the authors present a new method called one-sided NS-HSVI, which builds on the HSVI algorithm for approximating solutions to partially observable stochastic games. The key innovations in one-sided NS-HSVI include:

Using neural network pre-image analysis to construct finite polyhedral representations of the agent's beliefs
Employing particle-based representations for beliefs to handle the continuous state space
Exploiting the piecewise constant structure of the NS-POSG model to enable efficient computation

The authors demonstrate the practical applicability of their approach through experiments in pedestrian-vehicle and pursuit-evasion scenarios, which are common multi-agent settings with partial observability.

Critical Analysis

The paper presents a novel and relevant extension of stochastic game models to handle partial observability and data-driven perception mechanisms. The proposed NS-POSG model and one-sided NS-HSVI solution method address important practical challenges in multi-agent decision-making under uncertainty.

One potential limitation mentioned in the paper is the computational complexity of the approach, particularly for larger state spaces. The authors note that the particle-based belief representation and pre-image analysis can become expensive as the dimensionality increases. Exploring more efficient belief tracking and decision-making algorithms could be an area for further research.

Additionally, the paper focuses on a one-sided setting, where one agent has complete information. Extending the approach to handle mutual partial observability or more complex information asymmetries could broaden the applicability of the framework. Validating the method on a wider range of real-world multi-agent scenarios would also help demonstrate its practical utility.

Overall, the paper makes a valuable contribution by introducing the NS-POSG model and a novel solution technique. The work highlights the importance of accounting for partial observability and data-driven perception in multi-agent systems, an area that will become increasingly crucial as AI agents are deployed in complex, real-world environments.

Conclusion

This paper presents a new model called neuro-symbolic partially-observable stochastic games (NS-POSGs) that can be used to study multi-agent decision-making in scenarios with uncertainty and incomplete information. The authors introduce a solution method called one-sided NS-HSVI that leverages the structure of the model to efficiently represent beliefs and make decisions.

The practical applicability of the approach is demonstrated through examples in pedestrian-vehicle and pursuit-evasion scenarios, which are common settings with partial observability. This work highlights the importance of accounting for data-driven perception mechanisms and partial information in multi-agent systems, an area that will become increasingly relevant as AI technologies are deployed in complex, real-world environments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🧠

Partially Observable Stochastic Games with Neural Perception Mechanisms

Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska

Stochastic games are a well established model for multi-agent sequential decision making under uncertainty. In practical applications, though, agents often have only partial observability of their environment. Furthermore, agents increasingly perceive their environment using data-driven approaches such as neural networks trained on continuous data. We propose the model of neuro-symbolic partially-observable stochastic games (NS-POSGs), a variant of continuous-space concurrent stochastic games that explicitly incorporates neural perception mechanisms. We focus on a one-sided setting with a partially-informed agent using discrete, data-driven observations and another, fully-informed agent. We present a new method, called one-sided NS-HSVI, for approximate solution of one-sided NS-POSGs, which exploits the piecewise constant structure of the model. Using neural network pre-image analysis to construct finite polyhedral representations and particle-based representations for beliefs, we implement our approach and illustrate its practical applicability to the analysis of pedestrian-vehicle and pursuit-evasion scenarios.

7/2/2024

HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska

We consider a variant of continuous-state partially-observable stochastic games with neural perception mechanisms and an asymmetric information structure. One agent has partial information, with the observation function implemented as a neural network, while the other agent is assumed to have full knowledge of the state. We present, for the first time, an efficient online method to compute an $varepsilon$-minimax strategy profile, which requires only one linear program to be solved for each agent at every stage, instead of a complex estimation of opponent counterfactual values. For the partially-informed agent, we propose a continual resolving approach which uses lower bounds, pre-computed offline with heuristic search value iteration (HSVI), instead of opponent counterfactual values. This inherits the soundness of continual resolving at the cost of pre-computing the bound. For the fully-informed agent, we propose an inferred-belief strategy, where the agent maintains an inferred belief about the belief of the partially-informed agent based on (offline) upper bounds from HSVI, guaranteeing $varepsilon$-distance to the value of the game at the initial belief known to both agents.

4/17/2024

🧠

Point-Based Value Iteration for POMDPs with Neural Perception Mechanisms

Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska

The increasing trend to integrate neural networks and conventional software components in safety-critical settings calls for methodologies for their formal modelling, verification and correct-by-construction policy synthesis. We introduce neuro-symbolic partially observable Markov decision processes (NS-POMDPs), a variant of continuous-state POMDPs with discrete observations and actions, in which the agent perceives a continuous-state environment using a neural {revise perception mechanism} and makes decisions symbolically. The perception mechanism classifies inputs such as images and sensor values into symbolic percepts, which are used in decision making. We study the problem of optimising discounted cumulative rewards for NS-POMDPs. Working directly with the continuous state space, we exploit the underlying structure of the model and the neural perception mechanism to propose a novel piecewise linear and convex representation (P-PWLC) in terms of polyhedra covering the state space and value vectors, and extend Bellman backups to this representation. We prove the convexity and continuity of value functions and present two value iteration algorithms that ensure finite representability. The first is a classical (exact) value iteration algorithm extending the $alpha$-functions of Porta {em et al} (2006) to the P-PWLC representation for continuous-state spaces. The second is a point-based (approximate) method called NS-HSVI, which uses the P-PWLC representation and belief-value induced functions to approximate value functions from below and above for two types of beliefs, particle-based and region-based. Using a prototype implementation, we show the practical applicability of our approach on two case studies that employ (trained) ReLU neural networks as perception functions, by synthesising (approximately) optimal strategies.

8/9/2024

🏅

Partially Observable Multi-Agent Reinforcement Learning with Information Sharing

Xiangyu Liu, Kaiqing Zhang

We study provable multi-agent reinforcement learning (RL) in the general framework of partially observable stochastic games (POSGs). To circumvent the known hardness results and the use of computationally intractable oracles, we advocate leveraging the potential emph{information-sharing} among agents, a common practice in empirical multi-agent RL, and a standard model for multi-agent control systems with communications. We first establish several computational complexity results to justify the necessity of information-sharing, as well as the observability assumption that has enabled quasi-efficient single-agent RL with partial observations, for efficiently solving POSGs. {Inspired by the inefficiency of planning in the ground-truth model,} we then propose to further emph{approximate} the shared common information to construct an {approximate model} of the POSG, in which planning an approximate emph{equilibrium} (in terms of solving the original POSG) can be quasi-efficient, i.e., of quasi-polynomial-time, under the aforementioned assumptions. Furthermore, we develop a partially observable multi-agent RL algorithm that is emph{both} statistically and computationally quasi-efficient. {Finally, beyond equilibrium learning, we extend our algorithmic framework to finding the emph{team-optimal solution} in cooperative POSGs, i.e., decentralized partially observable Markov decision processes, a much more challenging goal. We establish concrete computational and sample complexities under several common structural assumptions of the model.} We hope our study could open up the possibilities of leveraging and even designing different emph{information structures}, a well-studied notion in control theory, for developing both sample- and computation-efficient partially observable multi-agent RL.

9/5/2024