Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

Read original: arXiv:2403.20109 - Published 4/1/2024 by Jinyeong Park, Jaegyoon Ahn, Jonghwan Choi, Jibum Kim

Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

Introduction

The paper discusses the development of optimization techniques to efficiently discover molecular structures with target properties, which is a critical challenge in AI-based drug discovery research. Traditional high-throughput screening (HTS) techniques to identify hit molecules have limitations in reducing the time and costs of drug discovery. Deep generative models have been applied to the efficient and effective exploration of molecular structures in drug discovery.

Goal-directed molecular generation using deep generative models faces two key challenges: representing and generating molecular structures, and directing the models to discover molecules with desired properties. Researchers have used string-based and graph-based representations, as well as various deep generative models like recurrent neural networks, transformers, and graph neural networks to handle molecular data. Bayesian optimization and reinforcement learning (RL) techniques have also been exploited for deep molecular generative models to create molecules with desired properties.

However, due to the vast size of the chemical space, it is challenging for an RL agent to perform efficient exploration and find an optimal policy for generating desired molecular structures. To enhance the exploration capability of property-constrained RL, curiosity strategies with intrinsic rewards have been proposed. This study introduces a new RL-based framework that uses a combination of two types of intrinsic rewards based on a random distillation network and counting-based strategies. The proposed framework demonstrated superior performance in goal-directed molecular generation compared to existing approaches, especially in tasks related to identifying hit molecules with structural similarities.

Preliminaries

The paper discusses the representation of molecular structures and the use of reinforcement learning (RL) for goal-directed molecular generation.

Molecular structure representation is crucial in AI-driven drug discovery. Graph-based representations use atoms as nodes and bonds as edges, but lack a standardized convention for configuring node and edge features. String-based representations like SMILES and SELFIES are widely used. SELFIES has advantages over SMILES in handling branching and ring structures, and enables easier management of errors in generated molecular strings.

The RL framework is defined as a Markov decision process, with SELFIES strings as states, SELFIES characters as actions, and a reward function that aims to maximize the expected sum of discounted rewards. Proximal Policy Optimization (PPO) is used as the policy gradient algorithm, which enhances training stability.

The paper also discusses the exploration-exploitation trade-off in RL and the importance of intrinsic rewards to encourage effective exploration. Three types of intrinsic rewards are mentioned: prediction-based, count-based, and memory-based. Prediction-based methods, which use a forward dynamics model to define rewards based on prediction error, are highlighted as an efficient approach for exploring the vast chemical space.

Related Works

3.1 Reinforcement Learning for Molecular Generation Numerous studies have used reinforcement learning (RL) to generate molecular structures. The RL framework defines an action space of symbol sets representing molecular structures, a state space of symbol substrings, a policy for predicting the next symbol to append, and an environment that evaluates the completed structure and provides rewards based on its properties. The policies use deep neural networks like RNNs to handle string-based molecular structures. Rewards are allocated based on the target chemical properties or pharmacological efficacy of the generated molecule. The policy network is updated using algorithms like REINFORCE and PPO, incentivizing the generation of molecules with higher rewards.

3.2 Intrinsic Rewards for Molecular Generation RL-based molecular generation techniques face challenges in exploring the vast chemical space. Various learning strategies incorporating intrinsic rewards have been developed to enable efficient exploration.

3.2.1 Count-based Intrinsic Reward This method calculates intrinsic rewards based on the frequency of encountered molecular structures. It uses Morgan fingerprints and locality-sensitive hashing to track occurrence frequencies and penalize the discovery of common molecules.

3.2.2 Memory-based Intrinsic Reward This approach uses a fixed-size memory buffer to store previously generated molecules. It assigns negative intrinsic rewards based on the similarity between the current molecule and those in memory, encouraging the exploration of novel molecular structures.

3.2.3 Prediction-based Intrinsic Reward This strategy calculates intrinsic rewards from the prediction error of a neural network model that predicts specific molecular properties. The aim is to align the exploration process with the discovery of molecules with desirable properties.

Limitations of traditional approaches

The provided text summarizes research on intrinsic reward methods for goal-directed molecular generation. The authors proposed three new types of intrinsic rewards: count-based, memory-based, and prediction-based. Through evaluations on benchmarks, the efficacy of these methods was examined, though challenges remain in discovering new and unknown molecules due to solutions becoming trapped in local optima.

The history-based approaches, such as count-based and memory-based rewards, use predefined storage mechanisms to track visited states, but require careful design and can lead to imbalanced exploration and exploitation. The learning-based prediction-based approach leverages neural networks to automatically learn and remember states, promoting more efficient exploration, but struggles to encourage exploration as training progresses, especially when the network exhibits high generalization.

The limitations of the history-based and learning-based approaches are further demonstrated in experiments on generating celecoxib-like structures and optimizing pLogP. This motivates the development of a hybrid approach that combines the strengths of both methods to create a more robust framework for efficient exploration of the vast chemical space and molecular structure optimization.

Methods

The study introduces Mol-AIR, a molecular optimization framework that uses adaptive intrinsic rewards to efficiently explore and identify molecules with desired properties. Mol-AIR combines the strengths of history-based and learning-based intrinsic reward approaches.

The history-based intrinsic reward (HIR) encourages the discovery of new molecular structures by prioritizing less visited states. The learning-based intrinsic reward (LIR) adjusts the balance between exploration and exploitation using a Random Network Distillation (RND) method. RND computes intrinsic rewards based on the difference in outputs between two neural networks, enabling more efficient exploration.

By combining HIR and LIR, Mol-AIR provides a powerful framework for navigating the complex landscape of molecular structures and efficiently identifying molecules with target properties.

Figure 4: Overview of Mol-AIR

The provided text describes the process of calculating two intrinsic rewards and one extrinsic reward in the training of RL-based models with Mol-AIR.

The first intrinsic reward, called HIR, tracks the number of times each molecule is visited and assigns higher rewards to less-visited structures to encourage exploration. The second intrinsic reward, called LIR, uses random network distillation to define an intrinsic reward based on the difference in predictions between two neural networks.

The extrinsic reward, called PER, evaluates the target chemical properties of the generated molecular structures and provides a reward signal to guide the policy towards structures with improved properties.

The paper uses an actor-critic structure with two critic networks to estimate episodic and non-episodic advantages, which are then used to update the policy network using the PPO algorithm. The training algorithm is presented in detail.

Results

This section discusses the implementation details and benchmark tasks used to evaluate the proposed Mol-AIR method for molecular generation. Key points:

The Mol-AIR methodology was implemented in Python 3.7 using open-source libraries like PyTorch, CUDA, RDKit, and SELFIES.
Six benchmark tasks were used to evaluate the performance, including properties like pLogP, QED, similarity to celecoxib, GSK3B inhibition, JNK3 inhibition, and the average of GSK3B and JNK3 inhibition.
The proposed Mol-AIR method was compared to baseline methods using intrinsic rewards, and Mol-AIR was shown to outperform the baselines across all six benchmark tasks.
An analysis of the intrinsic reward patterns revealed that the combination of history-based and learning-based intrinsic rewards in Mol-AIR led to more effective exploration compared to using either approach alone.
A hyperparameter analysis was conducted, finding that a balance parameter β=0.01 between history-based and learning-based intrinsic rewards worked best across the different benchmark tasks.

Conclusion

The study proposes a new reinforcement learning (RL) framework for efficient exploration in goal-directed molecular generation. The framework utilizes a novel intrinsic reward function that combines the strengths of history-based and learning-based approaches. The results show that this hybrid approach, called Mol-AIR, is effective for efficient exploration in the chemical space, successfully discovering molecules better than those found by existing intrinsic reward methods.

An ablation study revealed that the two components of the proposed method, LIR (learning-based intrinsic reward) and HIR (history-based intrinsic reward), work synergistically to drive exploration. LIR facilitates strong early-phase exploration, while HIR ensures sustained exploration later on, guiding the RL agent towards optimal molecular structures.

While Mol-AIR performed well in discovering molecular structures similar to celecoxib compared to existing methods, the similarity was still low. The authors suggest that tasks sensitive to the exploration of new structures, such as discovering similar molecular structures, require more refined control of exploration. They also note that the current intrinsic reward methods, including Mol-AIR, calculate rewards independently of target property information, making fine-tuning exploration challenging. The authors plan to focus on developing effective intrinsic rewards and RL techniques to address this limitation and improve the ability to generate structurally similar molecular structures, which is crucial for drug development.

RediT authorship contribution statement

The provided section describes the contributions of the authors to this paper. Jinyeong Park was responsible for developing the methodology, creating the software, and drafting the original manuscript. Jaegyoon Ahn contributed to the conceptualization and methodology. Jonghwan Choi conducted validation, investigation, and reviewed and edited the writing. Jibum Kim provided supervision, project administration, and secured funding for the work.

ata Availability

Unfortunately, there is no text provided in this section to summarize. The instructions indicate that if there is no text to summarize, I should plainly state that.

eclaration of competing interest

The authors state they have no known financial interests or personal relationships that could have influenced the research reported in this paper. The work was supported in part by grants from the National Research Foundation of Korea and the Ministry of Science and ICT in Korea.

References

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

Jinyeong Park, Jaegyoon Ahn, Jonghwan Choi, Jibum Kim

Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing particular chemical properties. To overcome these limitations, we present Mol-AIR, a reinforcement learning-based framework using adaptive intrinsic rewards for effective goal-directed molecular generation. Mol-AIR leverages the strengths of both history-based and learning-based intrinsic rewards by exploiting random distillation network and counting-based strategies. In benchmark tests, Mol-AIR demonstrates superior performance over existing approaches in generating molecules with desired properties without any prior knowledge, including penalized LogP, QED, and celecoxib similarity. We believe that Mol-AIR represents a significant advancement in drug discovery, offering a more efficient path to discovering novel therapeutics.

4/1/2024

Improving Targeted Molecule Generation through Language Model Fine-Tuning Via Reinforcement Learning

Salma J. Ahmed, Mustafa A. Elattar

Developing new drugs is laborious and costly, demanding extensive time investment. In this study, we introduce an innovative de-novo drug design strategy, which harnesses the capabilities of language models to devise targeted drugs for specific proteins. Employing a Reinforcement Learning (RL) framework utilizing Proximal Policy Optimization (PPO), we refine the model to acquire a policy for generating drugs tailored to protein targets. Our method integrates a composite reward function, combining considerations of drug-target interaction and molecular validity. Following RL fine-tuning, our approach demonstrates promising outcomes, yielding notable improvements in molecular validity, interaction efficacy, and critical chemical properties, achieving 65.37 for Quantitative Estimation of Drug-likeness (QED), 321.55 for Molecular Weight (MW), and 4.47 for Octanol-Water Partition Coefficient (logP), respectively. Furthermore, out of the generated drugs, only 0.041% do not exhibit novelty.

5/14/2024

🏅

Materials Discovery with Extreme Properties via Reinforcement Learning-Guided Combinatorial Chemistry

Hyunseung Kim (Seoul National University), Haeyeon Choi (Ewha Womans University), Dongju Kang (Seoul National University), Won Bo Lee (Seoul National University), Jonggeol Na (Ewha Womans University)

The goal of most materials discovery is to discover materials that are superior to those currently known. Fundamentally, this is close to extrapolation, which is a weak point for most machine learning models that learn the probability distribution of data. Herein, we develop reinforcement learning-guided combinatorial chemistry, which is a rule-based molecular designer driven by trained policy for selecting subsequent molecular fragments to get a target molecule. Since our model has the potential to generate all possible molecular structures that can be obtained from combinations of molecular fragments, unknown molecules with superior properties can be discovered. We theoretically and empirically demonstrate that our model is more suitable for discovering better compounds than probability distribution-learning models. In an experiment aimed at discovering molecules that hit seven extreme target properties, our model discovered 1,315 of all target-hitting molecules and 7,629 of five target-hitting molecules out of 100,000 trials, whereas the probability distribution-learning models failed. Moreover, it has been confirmed that every molecule generated under the binding rules of molecular fragments is 100% chemically valid. To illustrate the performance in actual problems, we also demonstrate that our models work well on two practical applications: discovering protein docking molecules and HIV inhibitors.

5/8/2024

🏅

Quantum-inspired Reinforcement Learning for Synthesizable Drug Design

Dannong Wang, Jintai Chen, Zhiding Liang, Tianfan Fu, Xiao-Yang Liu

Synthesizable molecular design (also known as synthesizable molecular optimization) is a fundamental problem in drug discovery, and involves designing novel molecular structures to improve their properties according to drug-relevant oracle functions (i.e., objective) while ensuring synthetic feasibility. However, existing methods are mostly based on random search. To address this issue, in this paper, we introduce a novel approach using the reinforcement learning method with quantum-inspired simulated annealing policy neural network to navigate the vast discrete space of chemical structures intelligently. Specifically, we employ a deterministic REINFORCE algorithm using policy neural networks to output transitional probability to guide state transitions and local search using genetic algorithm to refine solutions to a local optimum within each iteration. Our methods are evaluated with the Practical Molecular Optimization (PMO) benchmark framework with a 10K query budget. We further showcase the competitive performance of our method by comparing it against the state-of-the-art genetic algorithms-based method.

9/17/2024