Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents

2402.03678

Published 4/4/2024 by Yash Shukla, Tanushree Burman, Abhishek Kulkarni, Robert Wright, Alvaro Velasquez, Jivko Sinapov

🏅

Abstract

Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTL$_f$) formulas or Reward Machines (RM), to guide the learning progress of the agent. In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions. Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies. We evaluate LSTS on a gridworld and show that it achieves improved time-to-threshold performance on complex sequential decision-making problems compared to state-of-the-art RM and Automaton-guided RL baselines, such as Q-Learning for Reward Machines and Compositional RL from logical Specifications (DIRL). Moreover, we demonstrate that our method outperforms RM and Automaton-guided RL baselines in terms of sample-efficiency, both in a partially observable robotic task and in a continuous control robotic manipulation task.

Create account to get full access

Overview

Reinforcement learning (RL) has enabled artificial agents to learn diverse behaviors, but often requires a large number of environment interactions.
Recent approaches have used high-level task specifications, like Linear Temporal Logic (LTL$_f$) formulas or Reward Machines (RM), to guide the learning progress and reduce sample complexity.
This paper proposes a novel approach called Logical Specifications-guided Dynamic Task Sampling (LSTS) that learns a set of RL policies to guide an agent to a goal state based on a high-level task specification, while minimizing environment interactions.
Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies.

Plain English Explanation

Reinforcement learning (RL) is a technique that allows artificial agents to learn how to perform tasks by interacting with their environment and receiving feedback in the form of rewards or penalties. This has led to significant progress in enabling these agents to learn a wide variety of behaviors. However, the downside is that it often requires a large number of these environmental interactions before the agent can learn an effective policy (a set of rules for how to behave) to accomplish the desired task.

To address this issue of high sample complexity, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTL$_f$) formulas or Reward Machines (RM), to guide the learning process and help the agent learn more efficiently. These task specifications provide the agent with a clear, structured understanding of the overall goal it needs to achieve, rather than just relying on trial-and-error interactions.

This paper introduces a new method called Logical Specifications-guided Dynamic Task Sampling (LSTS) that builds on this idea. LSTS learns a set of RL policies that can guide the agent from an initial state to a goal state, based on the high-level task specification provided. Crucially, LSTS does not require any prior information about the environment dynamics or the Reward Machine, as some previous approaches did. Instead, it dynamically identifies and focuses on the most promising tasks that are likely to lead to successful goal policies, further reducing the number of environment interactions needed.

Technical Explanation

The key innovation in this paper is the LSTS approach, which learns a set of RL policies to guide an agent towards a goal state based on a high-level task specification, without requiring any prior information about the environment dynamics or the Reward Machine.

The LSTS method works as follows:

It starts by decomposing the high-level task specification (e.g., an LTL$_f$ formula or Reward Machine) into a set of smaller, intermediate subtasks.
It then learns individual RL policies to solve each of these subtasks, using a technique called reward shaping to efficiently guide the learning process.
Finally, LSTS dynamically selects which subtasks to focus on during training, based on an estimate of how promising they are for ultimately reaching the goal state.

This dynamic task sampling approach allows LSTS to efficiently explore the most relevant parts of the state space, without wasting time on less productive subtasks.

The authors evaluate LSTS on both a gridworld environment and more complex robotic manipulation tasks. They show that LSTS outperforms state-of-the-art RM and Automaton-guided RL baselines in terms of sample-efficiency, meaning it can learn effective policies with significantly fewer environment interactions.

Critical Analysis

The paper provides a thorough evaluation of the LSTS approach and demonstrates its advantages over previous methods. However, a few potential limitations and areas for further research are worth noting:

The paper does not address how LSTS would scale to even more complex, real-world tasks with very large state spaces. The dynamic task sampling approach may become computationally infeasible in such cases.
The authors mention that LSTS does not require information about the environment dynamics or Reward Machine, but it's unclear how this information could be incorporated if available, and whether that would further improve performance.
The paper focuses on task specifications in the form of LTL$_f$ formulas or Reward Machines, but it would be interesting to explore how LSTS could be adapted to work with other types of high-level task representations.

Overall, the LSTS approach represents a promising step towards more sample-efficient reinforcement learning by leveraging structured task specifications. Further research into scalability and the integration of additional domain knowledge could help unlock the full potential of this technique.

Conclusion

This paper presents a novel reinforcement learning method called Logical Specifications-guided Dynamic Task Sampling (LSTS) that addresses the sample complexity challenge of traditional RL approaches. By learning a set of policies guided by high-level task specifications, and dynamically focusing on the most promising subtasks, LSTS can significantly reduce the number of environment interactions required to learn effective behaviors.

The authors demonstrate the effectiveness of LSTS on both gridworld and robotic manipulation tasks, showing that it outperforms state-of-the-art baselines in terms of sample-efficiency. While some potential limitations and avenues for further research are identified, this work represents an important contribution towards developing more practical and scalable reinforcement learning systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔮

Decomposition-based Hierarchical Task Allocation and Planning for Multi-Robots under Hierarchical Temporal Logic Specifications

Xusheng Luo, Shaojun Xu, Ruixuan Liu, Changliu Liu

Past research into robotic planning with temporal logic specifications, notably Linear Temporal Logic (LTL), was largely based on a single formula for individual or groups of robots. But with increasing task complexity, LTL formulas unavoidably grow lengthy, complicating interpretation and specification generation, and straining the computational capacities of the planners. A recent development has been the hierarchical representation of LTL~cite{luo2024simultaneous} that contains multiple temporal logic specifications, providing a more interpretable framework. However, the proposed planning algorithm assumes the independence of robots within each specification, limiting their application to multi-robot coordination with complex temporal constraints. In this work, we formulated a decomposition-based hierarchical framework. At the high level, each specification is first decomposed into a set of atomic sub-tasks. We further infer the temporal relations among the sub-tasks of different specifications to construct a task network. Subsequently, a Mixed Integer Linear Program is used to assign sub-tasks to various robots. At the lower level, domain-specific controllers are employed to execute sub-tasks. Our approach was experimentally applied to domains of navigation and manipulation. The simulation demonstrated that our approach can find better solutions using less runtimes.

5/27/2024

cs.RO cs.AI

Inductive Generalization in Reinforcement Learning from Specifications

Vignesh Subramanian, Rohit Kushwah, Subhajit Roy, Suguman Bansal

We present a novel inductive generalization framework for RL from logical specifications. Many interesting tasks in RL environments have a natural inductive structure. These inductive tasks have similar overarching goals but they differ inductively in low-level predicates and distributions. We present a generalization procedure that leverages this inductive relationship to learn a higher-order function, a policy generator, that generates appropriately adapted policies for instances of an inductive task in a zero-shot manner. An evaluation of the proposed approach on a set of challenging control benchmarks demonstrates the promise of our framework in generalizing to unseen policies for long-horizon tasks.

6/7/2024

cs.LG cs.AI cs.LO

➖

Fast and Adaptive Multi-agent Planning under Collaborative Temporal Logic Tasks via Poset Products

Zesen Liu, Meng Guo, Weimin Bao, Zhongkui Li

Efficient coordination and planning is essential for large-scale multi-agent systems that collaborate in a shared dynamic environment. Heuristic search methods or learning-based approaches often lack the guarantee on correctness and performance. Moreover, when the collaborative tasks contain both spatial and temporal requirements, e.g., as Linear Temporal Logic (LTL) formulas, formal methods provide a verifiable framework for task planning. However, since the planning complexity grows exponentially with the number of agents and the length of the task formula, existing studies are mostly limited to small artificial cases. To address this issue, a new planning paradigm is proposed in this work for system-wide temporal task formulas that are released online and continually. It avoids two common bottlenecks in the traditional methods, i.e., (i) the direct translation of the complete task formula to the associated Buchi automaton; and (ii) the synchronized product between the Buchi automaton and the transition models of all agents. Instead, an adaptive planning algorithm is proposed that computes the product of relaxed partially-ordered sets (R-posets) on-the-fly, and assigns these subtasks to the agents subject to the ordering constraints. It is shown that the first valid plan can be derived with a polynomial time and memory complexity w.r.t. the system size and the formula length. Our method can take into account task formulas with a length of more than 400 and a fleet with more than $400$ agents, while most existing methods fail at the formula length of 25 within a reasonable duration. The proposed method is validated on large fleets of service robots in both simulation and hardware experiments.

4/10/2024

cs.RO

➖

Reactive Temporal Logic-based Planning and Control for Interactive Robotic Tasks

Farhad Nawaz, Shaoting Peng, Lars Lindemann, Nadia Figueroa, Nikolai Matni

Robots interacting with humans must be safe, reactive and adapt online to unforeseen environmental and task changes. Achieving these requirements concurrently is a challenge as interactive planners lack formal safety guarantees, while safe motion planners lack flexibility to adapt. To tackle this, we propose a modular control architecture that generates both safe and reactive motion plans for human-robot interaction by integrating temporal logic-based discrete task level plans with continuous Dynamical System (DS)-based motion plans. We formulate a reactive temporal logic formula that enables users to define task specifications through structured language, and propose a planning algorithm at the task level that generates a sequence of desired robot behaviors while being adaptive to environmental changes. At the motion level, we incorporate control Lyapunov functions and control barrier functions to compute stable and safe continuous motion plans for two types of robot behaviors: (i) complex, possibly periodic motions given by autonomous DS and (ii) time-critical tasks specified by Signal Temporal Logic~(STL). Our methodology is demonstrated on the Franka robot arm performing wiping tasks on a whiteboard and a mannequin that is compliant to human interactions and adaptive to environmental changes.

5/1/2024

cs.RO cs.SY eess.SY