Learning to Extend Molecular Scaffolds with Structural Motifs

Read original: arXiv:2103.03864 - Published 5/14/2024 by Krzysztof Maziarz, Henry Jackson-Flux, Pashmina Cameron, Finton Sirockin, Nadine Schneider, Nikolaus Stiefl, Marwin Segler, Marc Brockschmidt

🧠

Overview

This paper proposes a new deep learning-based model called MoLeR for generating molecules with a specified scaffold or core structure.
Many drug discovery projects require a fixed scaffold, but incorporating this constraint has been challenging for existing generative models.
MoLeR is a graph-based model that can naturally incorporate a scaffold as the initial seed for the generation process, without being conditioned on the generation history.
Experiments show that MoLeR performs well on both unconstrained molecular optimization and scaffold-based tasks, while being much faster to train and sample from than other approaches.
The paper also examines the impact of various design choices on the overall performance of the model.

Plain English Explanation

Developing new drug molecules is a complex and time-consuming process. Deep learning-based models have shown promise in accelerating this process by generating potential drug candidates in silico.

Many drug discovery projects require the generated molecules to have a specific core structure or "scaffold." Existing generative models have struggled to incorporate this scaffold constraint effectively. MoLeR, the new model proposed in this paper, is designed to naturally support the use of a scaffold as the starting point for generating new molecules.

Unlike other models that generate molecules one atom or bond at a time, MoLeR uses a graph-based approach that does not depend on the order of generation. This allows it to start with the scaffold and build out the rest of the molecule around it.

The researchers found that MoLeR performs just as well as state-of-the-art methods on open-ended molecular optimization tasks, but it outperforms them when the generated molecules need to include a specific scaffold. Moreover, MoLeR is significantly faster to train and generate samples from compared to other approaches.

The paper also explores how various design choices, such as the model architecture and training techniques, can impact the overall performance of the system.

Technical Explanation

The paper introduces MoLeR, a new graph-based generative model for molecular design that can naturally incorporate a fixed scaffold as the initial seed for the generation process.

Unlike many existing methods that generate molecules either atom-by-atom and bond-by-bond or fragment-by-fragment, MoLeR uses a graph-based approach that is not conditioned on the generation history. This allows the model to start with a scaffold and build the rest of the molecule around it, which is a common requirement in drug discovery projects.

The researchers evaluate MoLeR on both unconstrained molecular optimization tasks and scaffold-based tasks. They find that MoLeR performs comparably to state-of-the-art methods on the unconstrained tasks, while outperforming them on the scaffold-based tasks. Moreover, MoLeR is an order of magnitude faster to train and sample from than existing approaches.

The paper also examines the influence of several design choices on the overall performance of the model, including the model architecture, training techniques, and the use of different types of scaffolds. These insights can inform the development of future generative models for molecular design.

Critical Analysis

The paper makes a significant contribution by addressing the challenge of incorporating scaffold constraints into deep learning-based molecular generation models. The proposed MoLeR approach is a novel and promising solution that shows strong performance on both constrained and unconstrained tasks.

One potential limitation of the research is the relatively narrow scope of the experiments, which focus mainly on small molecule optimization. It would be valuable to see how MoLeR performs on a broader set of molecular design tasks, such as those involving larger, more complex molecules or different types of scaffolds.

Additionally, the paper does not provide much insight into the interpretability or explainability of the MoLeR model. Understanding the internal workings and decision-making processes of such generative models is an important area for further research, as it can help build trust and enable more informed decision-making in drug discovery applications.

Generative Active Learning for Small Molecule-Protein Binding and Multimodal Learning for Predicting Molecular Properties are examples of other recent approaches that could be compared or combined with MoLeR to further advance the state of the art in this field.

Conclusion

This paper presents a new deep learning-based generative model called MoLeR that can effectively incorporate scaffold constraints into the molecular design process. MoLeR outperforms state-of-the-art methods on scaffold-based tasks while being much faster to train and sample from.

The ability to generate molecules with a specified scaffold is a crucial capability for many drug discovery projects. MoLeR represents a significant step forward in addressing this challenge and could potentially accelerate the identification of new drug candidates through in silico screening.

Further research to expand the scope of MoLeR and improve its interpretability could lead to even greater advancements in the use of deep learning for computational drug design.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🧠

Learning to Extend Molecular Scaffolds with Structural Motifs

Krzysztof Maziarz, Henry Jackson-Flux, Pashmina Cameron, Finton Sirockin, Nadine Schneider, Nikolaus Stiefl, Marwin Segler, Marc Brockschmidt

Recent advancements in deep learning-based modeling of molecules promise to accelerate in silico drug discovery. A plethora of generative models is available, building molecules either atom-by-atom and bond-by-bond or fragment-by-fragment. However, many drug discovery projects require a fixed scaffold to be present in the generated molecule, and incorporating that constraint has only recently been explored. Here, we propose MoLeR, a graph-based model that naturally supports scaffolds as initial seed of the generative procedure, which is possible because it is not conditioned on the generation history. Our experiments show that MoLeR performs comparably to state-of-the-art methods on unconstrained molecular optimization tasks, and outperforms them on scaffold-based tasks, while being an order of magnitude faster to train and sample from than existing approaches. Furthermore, we show the influence of a number of seemingly minor design choices on the overall performance.

5/14/2024

🧠

Improved motif-scaffolding with SE(3) flow matching

Jason Yim, Andrew Campbell, Emile Mathieu, Andrew Y. K. Foong, Michael Gastegger, Jos'e Jim'enez-Luna, Sarah Lewis, Victor Garcia Satorras, Bastiaan S. Veeling, Frank No'e, Regina Barzilay, Tommi S. Jaakkola

Protein design often begins with the knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a range of motifs. However, generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend FrameFlow, an SE(3) flow matching model for protein backbone generation, to perform motif-scaffolding with two complementary approaches. The first is motif amortization, in which FrameFlow is trained with the motif as input using a data augmentation strategy. The second is motif guidance, which performs scaffolding using an estimate of the conditional score from FrameFlow without additional training. On a benchmark of 24 biologically meaningful motifs, we show our method achieves 2.5 times more designable and unique motif-scaffolds compared to state-of-the-art. Code: https://github.com/microsoft/protein-frame-flow

7/22/2024

🤿

Deep Lead Optimization: Leveraging Generative AI for Structural Modification

Odin Zhang, Haitao Lin, Hui Zhang, Huifeng Zhao, Yufei Huang, Yuansheng Huang, Dejun Jiang, Chang-yu Hsieh, Peichen Pan, Tingjun Hou

The idea of using deep-learning-based molecular generation to accelerate discovery of drug candidates has attracted extraordinary attention, and many deep generative models have been developed for automated drug design, termed molecular generation. In general, molecular generation encompasses two main strategies: de novo design, which generates novel molecular structures from scratch, and lead optimization, which refines existing molecules into drug candidates. Among them, lead optimization plays an important role in real-world drug design. For example, it can enable the development of me-better drugs that are chemically distinct yet more effective than the original drugs. It can also facilitate fragment-based drug design, transforming virtual-screened small ligands with low affinity into first-in-class medicines. Despite its importance, automated lead optimization remains underexplored compared to the well-established de novo generative models, due to its reliance on complex biological and chemical knowledge. To bridge this gap, we conduct a systematic review of traditional computational methods for lead optimization, organizing these strategies into four principal sub-tasks with defined inputs and outputs. This review delves into the basic concepts, goals, conventional CADD techniques, and recent advancements in AIDD. Additionally, we introduce a unified perspective based on constrained subgraph generation to harmonize the methodologies of de novo design and lead optimization. Through this lens, de novo design can incorporate strategies from lead optimization to address the challenge of generating hard-to-synthesize molecules; inversely, lead optimization can benefit from the innovations in de novo design by approaching it as a task of generating molecules conditioned on certain substructures.

5/1/2024

Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets

Ulrich A. Mbou Sob, Qiulin Li, Miguel Arbes'u, Oliver Bent, Andries P. Smit, Arnu Pretorius

A specific challenge with deep learning approaches for molecule generation is generating both syntactically valid and chemically plausible molecular string representations. To address this, we propose a novel generative latent-variable transformer model for small molecules that leverages a recently proposed molecular string representation called SAFE. We introduce a modification to SAFE to reduce the number of invalid fragmented molecules generated during training and use this to train our model. Our experiments show that our model can generate novel molecules with a validity rate > 90% and a fragmentation rate < 1% by sampling from a latent space. By fine-tuning the model using reinforcement learning to improve molecular docking, we significantly increase the number of hit candidates for five specific protein targets compared to the pre-trained model, nearly doubling this number for certain targets. Additionally, our top 5% mean docking scores are comparable to the current state-of-the-art (SOTA), and we marginally outperform SOTA on three of the five targets.

7/22/2024