Safe POMDP Online Planning among Dynamic Agents via Adaptive Conformal Prediction

Read original: arXiv:2404.15557 - Published 9/10/2024 by Shili Sheng, Pian Yu, David Parker, Marta Kwiatkowska, Lu Feng

Safe POMDP Online Planning among Dynamic Agents via Adaptive Conformal Prediction

Overview

This paper presents a method for safe online planning in partially observable Markov decision processes (POMDPs) with dynamic agents.
The key innovation is the use of adaptive conformal prediction to provide safety guarantees during planning.
The approach is evaluated in simulated environments with moving obstacles and demonstrates improved safety and performance compared to baseline methods.

Plain English Explanation

The paper describes a new way to plan actions in complex, uncertain environments where there are moving obstacles or other unpredictable elements. This type of planning problem is known as a partially observable Markov decision process (POMDP).

The main challenge in these environments is ensuring the planned actions are safe - that is, they avoid collisions or other hazardous outcomes. The researchers developed a technique called adaptive conformal prediction that provides mathematical guarantees about the safety of the planned actions, even as the environment changes.

Essentially, the system learns from past experience to make increasingly accurate predictions about the future state of the environment. This allows it to plan actions that are provably safe, in the sense that the probability of a bad outcome is below a specified threshold.

The researchers tested their approach in simulated environments with moving obstacles, and found that it outperformed other POMDP planning methods in terms of both safety and overall task performance. This suggests the technique could be useful for real-world applications like autonomous driving, robotics, or other domains where safe decision-making in uncertain conditions is critical.

Technical Explanation

The paper presents a safe POMDP online planning method that uses adaptive conformal prediction to provide safety guarantees during planning among dynamic agents.

The key components are:

POMDP Formulation: The planning problem is modeled as a POMDP, which captures the partial observability and stochasticity of the environment.
Adaptive Conformal Prediction: This technique is used to construct safety-constrained action sets during planning. It learns a predictive model of the environment dynamics and uses this to bound the probability of unsafe outcomes.
Receding Horizon Planning: The system plans a sequence of actions over a finite horizon, then executes the first action and replans at the next time step.
Evaluations: The approach is evaluated in simulated environments with moving obstacles. It is shown to outperform baseline POMDP planning methods in terms of both safety and task performance.

The adaptive conformal prediction component is novel, as it allows the planner to automatically adjust the safety constraints based on the current state of the environment. This is in contrast to fixed, conservative safety constraints that may overly restrict the agent's behavior.

Critical Analysis

The paper presents a well-designed and thorough evaluation of the proposed safe POMDP planning approach. The use of adaptive conformal prediction to provide safety guarantees is a clever innovation that addresses a key challenge in these types of planning problems.

However, the paper does not extensively discuss the potential limitations or caveats of the approach. For example, it is unclear how the method would scale to high-dimensional state spaces or environments with a very large number of dynamic agents. The reliance on accurate predictive models of the environment dynamics could also be a potential weakness, as these models may be difficult to obtain in complex real-world settings.

Additionally, the paper does not explore the computational complexity of the approach or provide a detailed analysis of its runtime performance. This information would be valuable for understanding the practical applicability of the method.

Overall, the research represents a promising step forward in safe decision-making for POMDPs, but further work is needed to fully understand the strengths, weaknesses, and limitations of the adaptive conformal prediction technique in more realistic and challenging scenarios.

Conclusion

This paper introduces a novel safe POMDP online planning method that leverages adaptive conformal prediction to provide formal safety guarantees during decision-making among dynamic agents. The key innovation is the use of a learning-based approach to automatically adjust the safety constraints based on the current state of the environment.

The method is shown to outperform baseline POMDP planning techniques in simulated environments with moving obstacles, suggesting it could be a valuable tool for applications like autonomous navigation, robotics, and other domains where safe decision-making under uncertainty is critical. While the paper does not fully explore the limitations of the approach, it represents an important contribution to the field of safe reinforcement learning and decision-making under partial observability.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Safe POMDP Online Planning among Dynamic Agents via Adaptive Conformal Prediction

Shili Sheng, Pian Yu, David Parker, Marta Kwiatkowska, Lu Feng

Online planning for partially observable Markov decision processes (POMDPs) provides efficient techniques for robot decision-making under uncertainty. However, existing methods fall short of preventing safety violations in dynamic environments. This work presents a novel safe POMDP online planning approach that maximizes expected returns while providing probabilistic safety guarantees amidst environments populated by multiple dynamic agents. Our approach utilizes data-driven trajectory prediction models of dynamic agents and applies Adaptive Conformal Prediction (ACP) to quantify the uncertainties in these predictions. Leveraging the obtained ACP-based trajectory predictions, our approach constructs safety shields on-the-fly to prevent unsafe actions within POMDP online planning. Through experimental evaluation in various dynamic environments using real-world pedestrian trajectory data, the proposed approach has been shown to effectively maintain probabilistic safety guarantees while accommodating up to hundreds of dynamic agents.

9/10/2024

ConstrainedZero: Chance-Constrained POMDP Planning using Learned Probabilistic Failure Surrogates and Adaptive Safety Constraints

Robert J. Moss, Arec Jamgochian, Johannes Fischer, Anthony Corso, Mykel J. Kochenderfer

To plan safely in uncertain environments, agents must balance utility with safety constraints. Safe planning problems can be modeled as a chance-constrained partially observable Markov decision process (CC-POMDP) and solutions often use expensive rollouts or heuristics to estimate the optimal value and action-selection policy. This work introduces the ConstrainedZero policy iteration algorithm that solves CC-POMDPs in belief space by learning neural network approximations of the optimal value and policy with an additional network head that estimates the failure probability given a belief. This failure probability guides safe action selection during online Monte Carlo tree search (MCTS). To avoid overemphasizing search based on the failure estimates, we introduce $Delta$-MCTS, which uses adaptive conformal inference to update the failure threshold during planning. The approach is tested on a safety-critical POMDP benchmark, an aircraft collision avoidance system, and the sustainability problem of safe CO$_2$ storage. Results show that by separating safety constraints from the objective we can achieve a target level of safety without optimizing the balance between rewards and costs.

5/2/2024

Learning Online Belief Prediction for Efficient POMDP Planning in Autonomous Driving

Zhiyu Huang, Chen Tang, Chen Lv, Masayoshi Tomizuka, Wei Zhan

Effective decision-making in autonomous driving relies on accurate inference of other traffic agents' future behaviors. To achieve this, we propose an online belief-update-based behavior prediction model and an efficient planner for Partially Observable Markov Decision Processes (POMDPs). We develop a Transformer-based prediction model, enhanced with a recurrent neural memory model, to dynamically update latent belief state and infer the intentions of other agents. The model can also integrate the ego vehicle's intentions to reflect closed-loop interactions among agents, and it learns from both offline data and online interactions. For planning, we employ a Monte-Carlo Tree Search (MCTS) planner with macro actions, which reduces computational complexity by searching over temporally extended action steps. Inside the MCTS planner, we use predicted long-term multi-modal trajectories to approximate future updates, which eliminates iterative belief updating and improves the running efficiency. Our approach also incorporates deep Q-learning (DQN) as a search prior, which significantly improves the performance of the MCTS planner. Experimental results from simulated environments validate the effectiveness of our proposed method. The online belief update model can significantly enhance the accuracy and temporal consistency of predictions, leading to improved decision-making performance. Employing DQN as a search prior in the MCTS planner considerably boosts its performance and outperforms an imitation learning-based prior. Additionally, we show that the MCTS planning with macro actions substantially outperforms the vanilla method in terms of performance and efficiency.

6/19/2024

👀

Recursively-Constrained Partially Observable Markov Decision Processes

Qi Heng Ho, Tyler Becker, Benjamin Kraske, Zakariya Laouar, Martin S. Feather, Federico Rossi, Morteza Lahijanian, Zachary N. Sunberg

Many sequential decision problems involve optimizing one objective function while imposing constraints on other objectives. Constrained Partially Observable Markov Decision Processes (C-POMDP) model this case with transition uncertainty and partial observability. In this work, we first show that C-POMDPs violate the optimal substructure property over successive decision steps and thus may exhibit behaviors that are undesirable for some (e.g., safety critical) applications. Additionally, online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation. To address these drawbacks, we introduce the Recursively-Constrained POMDP (RC-POMDP), which imposes additional history-dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies and that optimal policies obey Bellman's principle of optimality. We also present a point-based dynamic programming algorithm for RC-POMDPs. Evaluations on benchmark problems demonstrate the efficacy of our algorithm and show that policies for RC-POMDPs produce more desirable behaviors than policies for C-POMDPs.

6/6/2024