Online Control of Adaptive Large Neighborhood Search using Deep Reinforcement Learning

2211.00759

Published 4/4/2024 by Robbert Reijnen, Yingqian Zhang, Hoong Chuin Lau, Zaharah Bukhsh

🤿

Abstract

The Adaptive Large Neighborhood Search (ALNS) algorithm has shown considerable success in solving combinatorial optimization problems (COPs). Nonetheless, the performance of ALNS relies on the proper configuration of its selection and acceptance parameters, which is known to be a complex and resource-intensive task. To address this, we introduce a Deep Reinforcement Learning (DRL) based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search. The proposed method aims to learn, based on the state of the search, to configure ALNS for the next iteration to yield more effective solutions for the given optimization problem. We evaluate the proposed method on an orienteering problem with stochastic weights and time windows, as presented in an IJCAI competition. The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches that were the winning methods of the competition, achieving this with significantly fewer training observations. Furthermore, we demonstrate several good properties of the proposed DR-ALNS method: it is easily adapted to solve different routing problems, its learned policies perform consistently well across various instance sizes, and these policies can be directly applied to different problem variants.

Get summaries of the top AI research delivered straight to your inbox:

Overview

The paper introduces a Deep Reinforcement Learning (DRL) based approach called DR-ALNS to improve the performance of the Adaptive Large Neighborhood Search (ALNS) algorithm in solving combinatorial optimization problems (COPs).
ALNS is a successful algorithm for COPs, but its performance relies on properly configuring selection and acceptance parameters, which is a complex and resource-intensive task.
DR-ALNS aims to learn, based on the state of the search, how to configure ALNS for the next iteration to yield more effective solutions.
The method is evaluated on an orienteering problem with stochastic weights and time windows, and outperforms other ALNS-based approaches and state-of-the-art DRL methods.

Plain English Explanation

Combinatorial optimization problems (COPs) are complex problems that involve finding the best solution from a large number of possible options. One successful algorithm for solving COPs is called Adaptive Large Neighborhood Search (ALNS). ALNS works by making many small changes to a proposed solution, and then accepting or rejecting those changes based on certain rules.

However, the performance of ALNS depends heavily on how the selection of changes and the acceptance of changes are configured. Finding the right configuration is a difficult and time-consuming task. To address this, the researchers in this paper developed a new approach called DR-ALNS, which uses deep reinforcement learning (DRL) to automatically learn how to configure ALNS.

The idea behind DR-ALNS is that it can observe the current state of the search process and then decide what changes to make and how to accept or reject those changes in order to find better solutions. By learning from the search process, DR-ALNS can adapt its configuration to the specific problem being solved.

The researchers tested DR-ALNS on a challenging orienteering problem, where the goal is to visit as many locations as possible within a given time limit. The results showed that DR-ALNS outperformed both the standard ALNS algorithm and other state-of-the-art DRL approaches, while using significantly fewer training examples.

Technical Explanation

The paper introduces a Deep Reinforcement Learning (DRL) based approach called DR-ALNS to improve the performance of the Adaptive Large Neighborhood Search (ALNS) algorithm in solving combinatorial optimization problems (COPs).

ALNS is a successful algorithm for COPs, but its performance relies on properly configuring its selection and acceptance parameters, which is a complex and resource-intensive task. To address this, the authors propose DR-ALNS, which uses DRL to learn how to configure ALNS throughout the search process.

The DR-ALNS method consists of a neural network-based agent that observes the current state of the ALNS search and then selects operators, adjusts parameters, and controls the acceptance criterion for the next iteration. The goal is for the agent to learn, based on the search state, how to configure ALNS to yield more effective solutions.

The authors evaluate DR-ALNS on an orienteering problem with stochastic weights and time windows, as presented in an IJCAI competition. The results show that DR-ALNS outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches that were the winning methods of the competition. Importantly, DR-ALNS achieves this with significantly fewer training observations.

Additionally, the authors demonstrate several desirable properties of DR-ALNS: it is easily adapted to solve different routing problems, its learned policies perform consistently well across various instance sizes, and these policies can be directly applied to different problem variants.

Critical Analysis

The paper presents a promising approach to improving the performance of the ALNS algorithm through the use of deep reinforcement learning. By automating the configuration of ALNS parameters, DR-ALNS addresses a key challenge in applying ALNS to new problems.

One potential limitation of the research is the focus on a single problem domain (orienteering with stochastic weights and time windows). While the authors show that the learned policies can be applied to different problem variants, it would be valuable to evaluate DR-ALNS on a broader range of COP benchmarks to better understand its generalization capabilities.

Additionally, the paper does not provide much insight into the learned policies or decision-making process of the DR-ALNS agent. A more detailed analysis of the agent's behavior and the factors it considers when configuring ALNS could yield useful design principles for future DRL-based optimization approaches.

Further research could also explore the scalability of DR-ALNS, both in terms of problem size and the number of agents involved. As COPs often involve large-scale problems, understanding how DR-ALNS handles increased complexity would be valuable.

Finally, it would be interesting to see how DR-ALNS compares to other DRL-based approaches for configuring optimization algorithms, such as those that use inverse reinforcement learning to infer the objective function.

Conclusion

The paper presents a novel DRL-based approach called DR-ALNS that aims to improve the performance of the Adaptive Large Neighborhood Search algorithm in solving combinatorial optimization problems. By learning how to configure the ALNS parameters based on the search state, DR-ALNS outperforms both standard ALNS and other state-of-the-art DRL methods on a challenging orienteering problem.

The research demonstrates the potential of using deep reinforcement learning to automatically adapt optimization algorithms to specific problem instances, reducing the need for manual parameter tuning. The strong results and desirable properties of DR-ALNS, such as its ability to generalize across problem variants, suggest that this approach could be a valuable tool for solving a wide range of complex optimization challenges, with potential applications in fields like logistics, transportation, and resource allocation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

Weizhe Chen, Sven Koenig, Bistra Dilkina

Cooperative multi-agent reinforcement learning (MARL) has been an increasingly important research topic in the last half-decade because of its great potential for real-world applications. Because of the curse of dimensionality, the popular centralized training decentralized execution framework requires a long time in training, yet still cannot converge efficiently. In this paper, we propose a general training framework, MARL-LNS, to algorithmically address these issues by training on alternating subsets of agents using existing deep MARL algorithms as low-level trainers, while not involving any additional parameters to be trained. Based on this framework, we provide three algorithm variants based on the framework: random large neighborhood search (RLNS), batch large neighborhood search (BLNS), and adaptive large neighborhood search (ALNS), which alternate the subsets of agents differently. We test our algorithms on both the StarCraft Multi-Agent Challenge and Google Research Football, showing that our algorithms can automatically reduce at least 10% of training time while reaching the same final skill level as the original algorithm.

4/5/2024

cs.MA cs.LG

Adaptive Reinforcement Learning for Robot Control

Yu Tang Liu, Nilaksh Singh, Aamir Ahmad

Deep reinforcement learning (DRL) has shown remarkable success in simulation domains, yet its application in designing robot controllers remains limited, due to its single-task orientation and insufficient adaptability to environmental changes. To overcome these limitations, we present a novel adaptive agent that leverages transfer learning techniques to dynamically adapt policy in response to different tasks and environmental conditions. The approach is validated through the blimp control challenge, where multitasking capabilities and environmental adaptability are essential. The agent is trained using a custom, highly parallelized simulator built on IsaacGym. We perform zero-shot transfer to fly the blimp in the real world to solve various tasks. We share our code at url{https://github.com/robot-perception-group/adaptive_agent/}.

4/30/2024

cs.RO cs.AI cs.SY eess.SY

Learning Heuristics for Transit Network Design and Improvement with Deep Reinforcement Learning

Andrew Holliday, Ahmed El-Geneidy, Gregory Dudek

Transit agencies world-wide face tightening budgets. To maintain quality of service while cutting costs, efficient transit network design is essential. But planning a network of public transit routes is a challenging optimization problem. The most successful approaches to date use metaheuristic algorithms to search through the space of possible transit networks by applying low-level heuristics that randomly alter routes in a network. The design of these low-level heuristics has a major impact on the quality of the result. In this paper we use deep reinforcement learning with graph neural nets to learn low-level heuristics for an evolutionary algorithm, instead of designing them manually. These learned heuristics improve the algorithm's results on benchmark synthetic cities with 70 nodes or more, and obtain state-of-the-art results when optimizing operating costs. They also improve upon a simulation of the real transit network in the city of Laval, Canada, by as much as 54% and 18% on two key metrics, and offer cost savings of up to 12% over the city's existing transit network.

4/16/2024

cs.LG cs.AI cs.NE

↗️

Integrating DeepRL with Robust Low-Level Control in Robotic Manipulators for Non-Repetitive Reaching Tasks

Mehdi Heydari Shahna, Seyed Adel Alizadeh Kolagar, Jouni Mattila

In robotics, contemporary strategies are learning-based, characterized by a complex black-box nature and a lack of interpretability, which may pose challenges in ensuring stability and safety. To address these issues, we propose integrating a collision-free trajectory planner based on deep reinforcement learning (DRL) with a novel auto-tuning low-level control strategy, all while actively engaging in the learning phase through interactions with the environment. This approach circumvents the control performance and complexities associated with computations while addressing nonrepetitive reaching tasks in the presence of obstacles. First, a model-free DRL agent is employed to plan velocity-bounded motion for a manipulator with 'n' degrees of freedom (DoF), ensuring collision avoidance for the end-effector through joint-level reasoning. The generated reference motion is then input into a robust subsystem-based adaptive controller, which produces the necessary torques, while the cuckoo search optimization (CSO) algorithm enhances control gains to minimize the stabilization and tracking error in the steady state. This approach guarantees robustness and uniform exponential convergence in an unfamiliar environment, despite the presence of uncertainties and disturbances. Theoretical assertions are validated through the presentation of simulation outcomes.

5/16/2024

cs.RO cs.LG cs.SY eess.SY