Learning to Play Pursuit-Evasion with Dynamic and Sensor Constraints

Read original: arXiv:2405.05372 - Published 5/10/2024 by Burak M. Gonultas, Volkan Isler

Learning to Play Pursuit-Evasion with Dynamic and Sensor Constraints

Overview

This paper explores how to train agents to play a pursuit-evasion game, where one agent (the pursuer) tries to catch another agent (the evader), under dynamic and sensor constraints.
The authors propose a method that combines reinforcement learning and model-based control to enable the agents to learn effective strategies for the pursuit-evasion game.
The method is evaluated in simulation experiments, demonstrating its ability to train effective pursuers and evaders under different environmental conditions.

Plain English Explanation

In this research, the authors are looking at how to train computer-controlled agents to play a game where one agent (the "pursuer") tries to catch another agent (the "evader"). The catch is that these agents have to obey certain rules or constraints, like limits on their speed and the information they can gather about their environment.

The key idea is to use a combination of two different AI techniques - reinforcement learning and model-based control. Reinforcement learning allows the agents to learn effective strategies by trial and error, while model-based control helps them plan their moves more efficiently given the constraints they face.

The researchers tested this approach in computer simulations, and found that it could train both effective pursuers (the agents trying to catch the evaders) and effective evaders (the agents trying to get away). This suggests the method could be useful for a variety of applications, like training robots or autonomous vehicles to navigate and interact in complex, constrained environments.

Technical Explanation

The paper proposes a method for training agents to play a pursuit-evasion game under dynamic and sensor constraints. The method combines reinforcement learning and model-based control to enable the agents to learn effective strategies.

The reinforcement learning component allows the agents to learn through trial and error, exploring different actions and being rewarded for successful outcomes. The model-based control component uses a predictive model of the environment to help the agents plan their moves more efficiently given the dynamic and sensor constraints they face.

The authors evaluate their method in simulation experiments, training both pursuers and evaders to play the pursuit-evasion game. The results demonstrate the ability of the proposed approach to learn effective strategies under different environmental conditions, including variations in the agents' speed, sensor range, and field of view.

Critical Analysis

The paper presents a promising approach to training agents for pursuit-evasion games with dynamic and sensor constraints. However, the authors acknowledge that the method has some limitations. For example, the simulation experiments do not capture the full complexity of real-world environments, and the agents' performance may degrade when deployed in the physical world.

Additionally, the paper does not address the potential for the trained agents to be used in malicious ways, such as for intercepting or evading real-world vehicles. Further research would be needed to understand and mitigate these risks.

Overall, the paper presents an interesting and potentially impactful approach to a challenging problem in AI and robotics. However, it is important to consider the broader implications and potential applications of this technology, both positive and negative.

Conclusion

This paper introduces a novel method for training agents to play a pursuit-evasion game under dynamic and sensor constraints. By combining reinforcement learning and model-based control, the authors demonstrate the ability to train effective pursuers and evaders in simulation experiments.

The proposed approach has the potential to be useful in a variety of applications, such as robotics and autonomous vehicle navigation, where agents need to navigate and interact in complex, constrained environments. However, it is important to consider the broader implications and potential risks of this technology, and to continue exploring ways to ensure it is developed and deployed responsibly.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Learning to Play Pursuit-Evasion with Dynamic and Sensor Constraints

Burak M. Gonultas, Volkan Isler

We present a multi-agent reinforcement learning approach to solve a pursuit-evasion game between two players with car-like dynamics and sensing limitations. We develop a curriculum for an existing multi-agent deterministic policy gradient algorithm to simultaneously obtain strategies for both players, and deploy the learned strategies on real robots moving as fast as 2 m/s in indoor environments. Through experiments we show that the learned strategies improve over existing baselines by up to 30% in terms of capture rate for the pursuer. The learned evader model has up to 5% better escape rate over the baselines even against our competitive pursuer model. We also present experiment results which show how the pursuit-evasion game and its results evolve as the player dynamics and sensor constraints are varied. Finally, we deploy learned policies on physical robots for a game between the F1TENTH and JetRacer platforms and show that the learned strategies can be executed on real-robots. Our code and supplementary material including videos from experiments are available at https: //gonultasbu.github.io/pursuit-evasion/.

5/10/2024

A Dual Curriculum Learning Framework for Multi-UAV Pursuit-Evasion in Diverse Environments

Jiayu Chen, Guosheng Li, Chao Yu, Xinyi Yang, Botian Xu, Huazhong Yang, Yu Wang

This paper addresses multi-UAV pursuit-evasion, where a group of drones cooperates to capture a fast evader in a confined environment with obstacles. Existing heuristic algorithms, which simplify the pursuit-evasion problem, often lack expressive coordination strategies and struggle to capture the evader in extreme scenarios, such as when the evader moves at high speeds. In contrast, reinforcement learning (RL) has been applied to this problem and has the potential to obtain highly cooperative capture strategies. However, RL-based methods face challenges in training for complex 3-dimensional scenarios with diverse task settings due to the vast exploration space. The dynamics constraints of drones further restrict the ability of reinforcement learning to acquire high-performance capture strategies. In this work, we introduce a dual curriculum learning framework, named DualCL, which addresses multi-UAV pursuit-evasion in diverse environments and demonstrates zero-shot transfer ability to unseen scenarios. DualCL comprises two main components: the Intrinsic Parameter Curriculum Proposer, which progressively suggests intrinsic parameters from easy to hard to improve the capture capability of drones, and the External Environment Generator, tasked with exploring unresolved scenarios and generating appropriate training distributions of external environment parameters. The simulation experimental results show that DualCL significantly outperforms baseline methods, achieving over 90% capture rate and reducing the capture timestep by at least 27.5% in the training scenarios. Additionally, it exhibits the best zero-shot generalization ability in unseen environments. Moreover, we demonstrate the transferability of our pursuit strategy from simulation to real-world environments. Further details can be found on the project website at https://sites.google.com/view/dualcl.

5/1/2024

A Game Between Two Identical Dubins Cars: Evading a Conic Sensor in Minimum Time

Ubaldo Ruiz

A fundamental task in mobile robotics is keeping an intelligent agent under surveillance with an autonomous robot as it travels in the environment. This work studies a version of that problem involving one of the most popular vehicle platforms in robotics. In particular, we consider two identical Dubins cars moving on a plane without obstacles. One of them plays as the pursuer, and it is equipped with a limited field-of-view detection region modeled as a semi-infinite cone with its apex at the pursuer's position. The pursuer aims to maintain the other Dubins car, which plays as the evader, as much time as possible inside its detection region. On the contrary, the evader wants to escape as soon as possible. In this work, employing differential game theory, we find the time-optimal motion strategies near the game's end. The analysis of those trajectories reveals the existence of at least two singular surfaces: a Transition Surface and an Evader's Universal Surface. We also found that the barrier's standard construction produces a surface that partially lies outside the playing space and fails to define a closed region, implying that an additional procedure is required to determine all configurations where the evader escapes.

6/14/2024

Grasper: A Generalist Pursuer for Pursuit-Evasion Problems

Pengdeng Li, Shuxin Li, Xinrun Wang, Jakub Cerny, Youzhi Zhang, Stephen McAleer, Hau Chan, Bo An

Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments such as urban street networks. Recent advancements have demonstrated the effectiveness of the pre-training and fine-tuning paradigm in PSRO to improve scalability in solving large-scale PEGs. However, these methods primarily focus on specific PEGs with fixed initial conditions that may vary substantially in real-world scenarios, which significantly hinders the applicability of the traditional methods. To address this issue, we introduce Grasper, a GeneRAlist purSuer for Pursuit-Evasion pRoblems, capable of efficiently generating pursuer policies tailored to specific PEGs. Our contributions are threefold: First, we present a novel architecture that offers high-quality solutions for diverse PEGs, comprising critical components such as (i) a graph neural network (GNN) to encode PEGs into hidden vectors, and (ii) a hypernetwork to generate pursuer policies based on these hidden vectors. As a second contribution, we develop an efficient three-stage training method involving (i) a pre-pretraining stage for learning robust PEG representations through self-supervised graph learning techniques like GraphMAE, (ii) a pre-training stage utilizing heuristic-guided multi-task pre-training (HMP) where heuristic-derived reference policies (e.g., through Dijkstra's algorithm) regularize pursuer policies, and (iii) a fine-tuning stage that employs PSRO to generate pursuer policies on designated PEGs. Finally, we perform extensive experiments on synthetic and real-world maps, showcasing Grasper's significant superiority over baselines in terms of solution quality and generalizability. We demonstrate that Grasper provides a versatile approach for solving pursuit-evasion problems across a broad range of scenarios, enabling practical deployment in real-world situations.

4/22/2024