Towards DNA-Encoded Library Generation with GFlowNets

2404.10094

YC

0

Reddit

0

Published 4/17/2024 by Micha{l} Koziarski, Mohammed Abukalam, Vedant Shah, Louis Vaillancourt, Doris Alexandra Schuetz, Moksh Jain, Almer van der Sloot, Mathieu Bourgey, Anne Marinier, Yoshua Bengio
Towards DNA-Encoded Library Generation with GFlowNets

Abstract

DNA-encoded libraries (DELs) are a powerful approach for rapidly screening large numbers of diverse compounds. One of the key challenges in using DELs is library design, which involves choosing the building blocks that will be combinatorially combined to produce the final library. In this paper we consider the task of protein-protein interaction (PPI) biased DEL design. To this end, we evaluate several machine learning algorithms on the PPI modulation task and use them as a reward for the proposed GFlowNet-based generative approach. We additionally investigate the possibility of using structural information about building blocks to design a hierarchical action space for the GFlowNet. The observed results indicate that GFlowNets are a promising approach for generating diverse combinatorial library candidates.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper introduces a new approach called DEL-GFlowNet for generating DNA-encoded libraries (DELs) using a type of machine learning model called a GFlowNet.
  • DELs are collections of small molecules that can be used to discover new drugs or other useful compounds.
  • The GFlowNet model is used to efficiently explore the vast space of possible DEL compounds and identify promising candidates.

Plain English Explanation

The paper presents a method for generating DNA-encoded libraries (DELs) using a machine learning technique called GFlowNets. DELs are collections of small molecules that can be used to discover new drugs or other useful compounds. The key challenge is that the space of possible DEL compounds is enormous, so finding the most promising ones is like searching for a needle in a haystack.

The researchers developed a GFlowNet model that can efficiently explore this vast search space and identify the most promising DEL compounds. GFlowNets are a type of flow-based generative model that can learn to generate sequences of actions that lead to desired outcomes. In this case, the GFlowNet is trained to generate DEL compounds that have desirable properties, like binding to a target protein of interest.

By using the GFlowNet approach, the researchers were able to explore the DEL search space more effectively and identify promising candidates more efficiently than traditional methods. This could help accelerate the discovery of new drug candidates and other useful compounds.

Technical Explanation

The paper introduces a new approach called DEL-GFlowNet for generating DNA-encoded libraries (DELs) using GFlowNets. DELs are collections of small molecules that can be used to discover new drugs or other useful compounds.

The key challenge in DEL generation is the vast search space of possible compounds. The DEL-GFlowNet model addresses this by using a GFlowNet, a type of flow-based generative model that can learn to generate sequences of actions leading to desired outcomes.

The DEL-GFlowNet model is trained to generate DEL compounds with desirable properties, such as binding to a target protein of interest. By using the GFlowNet approach, the researchers were able to explore the DEL search space more effectively and identify promising candidates more efficiently than traditional methods.

The paper presents experiments demonstrating the effectiveness of the DEL-GFlowNet approach on several benchmark tasks, including the generation of DEL compounds with specific binding affinities to target proteins. The results show that the DEL-GFlowNet model outperforms other DEL generation methods in terms of both sample efficiency and the quality of the generated compounds.

Critical Analysis

The paper provides a promising approach for generating DNA-encoded libraries (DELs) using GFlowNets, but it also acknowledges several caveats and limitations that warrant further research.

One key limitation is that the DEL-GFlowNet model is trained on a relatively small dataset of known DEL compounds, which may limit its ability to generalize to the vast space of possible DEL compounds. The researchers suggest that expanding the training dataset or incorporating additional domain knowledge could help address this issue.

Additionally, the paper does not provide a detailed analysis of the runtime or computational efficiency of the DEL-GFlowNet approach compared to other DEL generation methods. This is an important consideration, as the ability to quickly and efficiently explore the DEL search space is a key requirement for practical applications.

Another potential area for further research is the interpretability of the DEL-GFlowNet model. Understanding the reasoning behind the model's decisions could help domain experts better understand the generated DEL compounds and potentially guide the discovery of new drug candidates.

Despite these limitations, the paper presents a compelling approach that could have significant implications for the field of drug discovery and other applications involving the generation of complex molecular structures. Further research and refinement of the DEL-GFlowNet method could lead to even more powerful and efficient tools for exploring the vast chemical space.

Conclusion

This paper introduces a novel approach called DEL-GFlowNet for generating DNA-encoded libraries (DELs) using a type of machine learning model called a GFlowNet. DELs are collections of small molecules that can be used to discover new drugs and other useful compounds, but the vast search space of possible DEL compounds makes it challenging to identify promising candidates.

The DEL-GFlowNet model addresses this challenge by using a GFlowNet, a flow-based generative model that can efficiently explore the DEL search space and identify promising compounds. The researchers demonstrate the effectiveness of their approach on several benchmark tasks, showing that the DEL-GFlowNet model outperforms other DEL generation methods in terms of both sample efficiency and the quality of the generated compounds.

While the paper acknowledges some limitations and areas for further research, the DEL-GFlowNet approach represents an important step forward in the field of drug discovery and the generation of complex molecular structures. By leveraging the power of machine learning and flow-based generative models, this work could help accelerate the identification of new drug candidates and other valuable chemical compounds.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Geometric-informed GFlowNets for Structure-Based Drug Design

Geometric-informed GFlowNets for Structure-Based Drug Design

Grayson Lee, Tony Shen, Martin Ester

YC

0

Reddit

0

The rise of cost involved with drug discovery and current speed of which they are discover, underscore the need for more efficient structure-based drug design (SBDD) methods. We employ Generative Flow Networks (GFlowNets), to effectively explore the vast combinatorial space of drug-like molecules, which traditional virtual screening methods fail to cover. We introduce a novel modification to the GFlowNet framework by incorporating trigonometrically consistent embeddings, previously utilized in tasks involving protein conformation and protein-ligand interactions, to enhance the model's ability to generate molecules tailored to specific protein pockets. We have modified the existing protein conditioning used by GFlowNets, blending geometric information from both protein and ligand embeddings to achieve more geometrically consistent embeddings. Experiments conducted using CrossDocked2020 demonstrated an improvement in the binding affinity between generated molecules and protein pockets for both single and multi-objective tasks, compared to previous work. Additionally, we propose future work aimed at further increasing the geometric information captured in protein-ligand interactions.

Read more

6/18/2024

Genetic-guided GFlowNets for Sample Efficient Molecular Optimization

Genetic-guided GFlowNets for Sample Efficient Molecular Optimization

Hyeonah Kim, Minsu Kim, Sanghyeok Choi, Jinkyoo Park

YC

0

Reddit

0

The challenge of discovering new molecules with desired properties is crucial in domains like drug discovery and material design. Recent advances in deep learning-based generative methods have shown promise but face the issue of sample efficiency due to the computational expense of evaluating the reward function. This paper proposes a novel algorithm for sample-efficient molecular optimization by distilling a powerful genetic algorithm into deep generative policy using GFlowNets training, the off-policy method for amortized inference. This approach enables the deep generative policy to learn from domain knowledge, which has been explicitly integrated into the genetic algorithm. Our method achieves state-of-the-art performance in the official molecular optimization benchmark, significantly outperforming previous methods. It also demonstrates effectiveness in designing inhibitors against SARS-CoV-2 with substantially fewer reward calls.

Read more

5/28/2024

RGFN: Synthesizable Molecular Generation Using GFlowNets

RGFN: Synthesizable Molecular Generation Using GFlowNets

Micha{l} Koziarski, Andrei Rekesh, Dmytro Shevchuk, Almer van der Sloot, Piotr Gai'nski, Yoshua Bengio, Cheng-Hao Liu, Mike Tyers, Robert A. Batey

YC

0

Reddit

0

Generative models hold great promise for small molecule discovery, significantly increasing the size of search space compared to traditional in silico screening libraries. However, most existing machine learning methods for small molecule generation suffer from poor synthesizability of candidate compounds, making experimental validation difficult. In this paper we propose Reaction-GFlowNet (RGFN), an extension of the GFlowNet framework that operates directly in the space of chemical reactions, thereby allowing out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate that with the proposed set of reactions and building blocks, it is possible to obtain a search space of molecules orders of magnitude larger than existing screening libraries coupled with low cost of synthesis. We also show that the approach scales to very large fragment libraries, further increasing the number of potential molecules. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.

Read more

6/14/2024

SynFlowNet: Towards Molecule Design with Guaranteed Synthesis Pathways

SynFlowNet: Towards Molecule Design with Guaranteed Synthesis Pathways

Miruna Cretu, Charles Harris, Julien Roy, Emmanuel Bengio, Pietro Li`o

YC

0

Reddit

0

Recent breakthroughs in generative modelling have led to a number of works proposing molecular generation models for drug discovery. While these models perform well at capturing drug-like motifs, they are known to often produce synthetically inaccessible molecules. This is because they are trained to compose atoms or fragments in a way that approximates the training distribution, but they are not explicitly aware of the synthesis constraints that come with making molecules in the lab. To address this issue, we introduce SynFlowNet, a GFlowNet model whose action space uses chemically validated reactions and reactants to sequentially build new molecules. We evaluate our approach using synthetic accessibility scores and an independent retrosynthesis tool. SynFlowNet consistently samples synthetically feasible molecules, while still being able to find diverse and high-utility candidates. Furthermore, we compare molecules designed with SynFlowNet to experimentally validated actives, and find that they show comparable properties of interest, such as molecular weight, SA score and predicted protein binding affinity.

Read more

5/3/2024