Planning Domain Model Acquisition from State Traces without Action Parameters

Read original: arXiv:2402.10726 - Published 8/21/2024 by Tom'av{s} Balyo, Martin Suda, Luk'av{s} Chrpa, Dominik v{S}afr'anek, Stephan Gocht, Filip Dvov{r}'ak, Roman Bart'ak, G. Michael Youngblood

📈

Overview

This paper explores learning action models in STRIPS planning domains when the parameters of the actions are not provided.
It defines two levels of trace quality based on the information available in the state traces.
It presents algorithms for learning action models in each of these settings and evaluates them experimentally.

Plain English Explanation

In the field of automated planning, researchers often work with STRIPS domain models, which describe the actions an agent can take and the effects those actions have on the world. Traditionally, when learning these domain models from data, researchers have assumed that the names and parameters of the actions are already known. This means their main task is to figure out the preconditions and effects of the given actions.

This paper explores a more challenging scenario, where the parameters of the actions are not provided. It defines two levels of trace quality based on how much information is given:

Level 1 (L1): The state traces are labeled with action names, so we know the number and names of the actions, but we don't know the number or types of parameters.
Level 2 (L2): The state traces are labeled with both action names and the objects that serve as parameters for the grounded actions. Here we still need to figure out the types of parameters in the learned actions.

The paper presents algorithms for learning action models in each of these settings and compares them to the state-of-the-art tool FAMA on a variety of planning benchmarks. The results show that the new algorithms are faster, can handle larger inputs, and produce action models that are more similar to the reference models.

Technical Explanation

The paper tackles the problem of learning STRIPS domain models from state traces when the parameters of the actions are not provided. This is a more challenging setting compared to previous approaches, which assumed the names and parameters of the actions were known.

The authors define two levels of trace quality:

Level 1 (L1): The state traces are labeled with action names, so the number and names of the actions can be deduced, but the number and types of parameters are unknown.
Level 2 (L2): The state traces are labeled with both action names and the objects that serve as parameters for the grounded actions. Here the types of parameters in the learned actions still need to be deduced.

For each level, the paper presents an algorithm for learning the action models. The L1 algorithm first identifies the actions and their parameters by analyzing the state changes in the traces. It then uses this information to infer the preconditions and effects of the actions. The L2 algorithm builds on this, using the provided parameter information to further refine the types of the action parameters.

The algorithms are evaluated on a large collection of IPC planning benchmarks and compared to the state-of-the-art tool FAMA. The results show that the new algorithms are faster, can handle larger inputs, and produce action models that are more similar to the reference models.

Critical Analysis

The paper presents an important advance in the field of automated planning by addressing the more challenging scenario where the parameters of the actions are not provided. This is a realistic setting that can arise in many real-world applications, so the ability to learn accurate domain models in this context is valuable.

One potential limitation of the approach is that it relies on the availability of high-quality state traces, which may not always be easy to obtain in practice. The authors acknowledge this and suggest that incorporating additional background knowledge or using active learning techniques could be a fruitful direction for future research.

Another area for further investigation is the scalability of the algorithms, especially for larger and more complex planning domains. While the experiments demonstrate the superior performance of the new algorithms compared to FAMA, it would be interesting to see how they fare on even larger and more challenging benchmarks.

Overall, this paper makes a significant contribution by expanding the capabilities of STRIPS domain model acquisition and paves the way for more realistic and practical planning systems. The technical explanations are clear and the experimental results are compelling, suggesting that this work is a valuable addition to the field.

Conclusion

This paper presents novel algorithms for learning STRIPS domain models from state traces when the parameters of the actions are not provided. By defining two levels of trace quality and developing tailored algorithms for each, the authors have made an important advancement in the field of automated planning.

The experimental evaluation shows that the new algorithms outperform the state-of-the-art tool FAMA in terms of speed, scalability, and the accuracy of the learned action models. This suggests that these techniques could be valuable for building more robust and practical planning systems, particularly in real-world scenarios where the full details of the actions may not be known a priori.

While the paper highlights some potential limitations and areas for future research, it represents a significant step forward in the challenging task of learning domain models from limited information. By making these advances, the authors have contributed to the ongoing efforts to develop more powerful and versatile planning systems that can adapt to a wide range of real-world challenges.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📈

Planning Domain Model Acquisition from State Traces without Action Parameters

Tom'av{s} Balyo, Martin Suda, Luk'av{s} Chrpa, Dominik v{S}afr'anek, Stephan Gocht, Filip Dvov{r}'ak, Roman Bart'ak, G. Michael Youngblood

Previous STRIPS domain model acquisition approaches that learn from state traces start with the names and parameters of the actions to be learned. Therefore their only task is to deduce the preconditions and effects of the given actions. In this work, we explore learning in situations when the parameters of learned actions are not provided. We define two levels of trace quality based on which information is provided and present an algorithm for each. In one level (L1), the states in the traces are labeled with action names, so we can deduce the number and names of the actions, but we still need to work out the number and types of parameters. In the other level (L2), the states are additionally labeled with objects that constitute the parameters of the corresponding grounded actions. Here we still need to deduce the types of the parameters in the learned actions. We experimentally evaluate the proposed algorithms and compare them with the state-of-the-art learning tool FAMA on a large collection of IPC benchmarks. The evaluation shows that our new algorithms are faster, can handle larger inputs and provide better results in terms of learning action models more similar to reference models.

8/21/2024

Learning Planning Abstractions from Language

Weiyu Liu, Geng Chen, Joy Hsu, Jiayuan Mao, Jiajun Wu

This paper presents a framework for learning state and action abstractions in sequential decision-making domains. Our framework, planning abstraction from language (PARL), utilizes language-annotated demonstrations to automatically discover a symbolic and abstract action space and induce a latent state abstraction based on it. PARL consists of three stages: 1) recovering object-level and action concepts, 2) learning state abstractions, abstract action feasibility, and transition models, and 3) applying low-level policies for abstract actions. During inference, given the task description, PARL first makes abstract action plans using the latent transition and feasibility functions, then refines the high-level plan using low-level policies. PARL generalizes across scenarios involving novel object instances and environments, unseen concept compositions, and tasks that require longer planning horizons than settings it is trained on.

5/8/2024

Learning Abstract World Model for Value-preserving Planning with Options

Rafael Rodriguez-Sanchez, George Konidaris

General-purpose agents require fine-grained controls and rich sensory inputs to perform a wide range of tasks. However, this complexity often leads to intractable decision-making. Traditionally, agents are provided with task-specific action and observation spaces to mitigate this challenge, but this reduces autonomy. Instead, agents must be capable of building state-action spaces at the correct abstraction level from their sensorimotor experiences. We leverage the structure of a given set of temporally-extended actions to learn abstract Markov decision processes (MDPs) that operate at a higher level of temporal and state granularity. We characterize state abstractions necessary to ensure that planning with these skills, by simulating trajectories in the abstract MDP, results in policies with bounded value loss in the original MDP. We evaluate our approach in goal-based navigation environments that require continuous abstract states to plan successfully and show that abstract model learning improves the sample efficiency of planning and learning.

6/26/2024

Learning Object States from Actions via Large Language Models

Masatoshi Tateno, Takuma Yagi, Ryosuke Furuta, Yoichi Sato

Temporally localizing the presence of object states in videos is crucial in understanding human activities beyond actions and objects. This task has suffered from a lack of training data due to object states' inherent ambiguity and variety. To avoid exhaustive annotation, learning from transcribed narrations in instructional videos would be intriguing. However, object states are less described in narrations compared to actions, making them less effective. In this work, we propose to extract the object state information from action information included in narrations, using large language models (LLMs). Our observation is that LLMs include world knowledge on the relationship between actions and their resulting object states, and can infer the presence of object states from past action sequences. The proposed LLM-based framework offers flexibility to generate plausible pseudo-object state labels against arbitrary categories. We evaluate our method with our newly collected Multiple Object States Transition (MOST) dataset including dense temporal annotation of 60 object state categories. Our model trained by the generated pseudo-labels demonstrates significant improvement of over 29% in mAP against strong zero-shot vision-language models, showing the effectiveness of explicitly extracting object state information from actions through LLMs.

5/3/2024