PROflow: An iterative refinement model for PROTAC-induced structure prediction

Read original: arXiv:2405.06654 - Published 5/14/2024 by Bo Qiang, Wenxian Shi, Yuxuan Song, Menghua Wu

📈

Overview

PROTACs are small molecules that can trigger the breakdown of traditionally "undruggable" proteins by binding to both the target protein and degradation-associated proteins.
Designing effective PROTACs is challenging due to the lack of crystal structures, which has forced existing methods to simplify the problem into a distance-constrained protein-protein docking task.
The researchers develop a novel pseudo-data generation scheme and an iterative refinement model called PROflow to address these challenges and improve PROTAC structure prediction.

Plain English Explanation

PROTACs are a type of drug that can help break down proteins that are normally difficult to target with traditional medicines. These proteins are often referred to as "undruggable" because they are hard to reach or affect using standard drug design approaches.

PROTACs work by binding to both the target protein and proteins involved in breaking down other proteins. This allows them to essentially tag the target protein for destruction, causing it to be broken down and removed from the cell.

One of the key challenges in designing effective PROTACs is understanding how they interact with the proteins they bind to. This information is usually obtained from detailed 3D structures of the protein complexes, but there is a lack of these structures available for PROTACs.

To address this, the researchers developed a new method to generate simulated data on how PROTACs might interact with proteins. They then used this data to train a machine learning model called PROflow, which can predict the 3D structure of PROTAC-protein complexes.

PROflow outperforms existing methods and can screen large numbers of potential PROTAC designs quickly. The structural properties it predicts also show a strong correlation with the ability of real-world PROTACs to degrade their target proteins, suggesting it can provide useful insights to guide PROTAC development.

Technical Explanation

The researchers develop a novel pseudo-data generation scheme to address the lack of crystal structures available for PROTACs (only 18 in the Protein Data Bank). Their approach only requires binary protein-protein complex data, which is more readily available.

This new dataset enables PROflow, an iterative refinement model for PROTAC-induced structure prediction. PROflow models the full flexibility of the PROTAC molecule during constrained protein-protein docking, in contrast to previous methods that simplified the problem.

PROflow outperforms the state-of-the-art across docking metrics and runtime. Its inference speed enables large-scale screening of PROTAC designs. The computed properties of the predicted structures also achieve statistically significant correlations with published degradation activities, suggesting PROflow can provide useful insights to guide PROTAC development.

Critical Analysis

The researchers acknowledge the lack of available crystal structures as a key limitation in the field of PROTAC design. Their pseudo-data generation approach is a clever solution to this problem, but it relies on the assumption that binary protein-protein complex data can adequately represent the more complex PROTAC-induced structures.

While the PROflow model outperforms existing methods, its ability to accurately predict real-world PROTAC activities is still limited by the quality and completeness of the training data. Further research may be needed to understand the full range of structural factors that influence PROTAC effectiveness.

Additionally, the researchers note that their method is currently optimized for a specific class of PROTACs and may not generalize perfectly to all PROTAC designs. Expanding the model's capabilities to handle greater structural diversity could be an area for future work.

Overall, the researchers have made a valuable contribution to the field of PROTAC design by developing a novel structure prediction approach that leverages machine learning. Their work demonstrates the potential for AI-driven methods to accelerate the development of this promising class of protein-targeting therapeutics.

Conclusion

This research addresses a key challenge in the design of PROTACs, small molecules that can trigger the breakdown of traditionally "undruggable" proteins. By developing a novel pseudo-data generation scheme and an iterative refinement model called PROflow, the researchers have demonstrated a more effective approach to predicting the structural basis of PROTAC activity.

PROflow's ability to accurately model PROTAC flexibility and screen large numbers of potential designs quickly makes it a valuable tool for guiding PROTAC development. The strong correlations between the model's predictions and published degradation activities suggest it can provide useful insights to accelerate the discovery of new PROTAC-based therapeutics.

Overall, this work represents an important step forward in the field of protein-targeted drug design, highlighting the potential of AI-driven methods to address complex challenges in pharmacology and drug discovery.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📈

PROflow: An iterative refinement model for PROTAC-induced structure prediction

Bo Qiang, Wenxian Shi, Yuxuan Song, Menghua Wu

Proteolysis targeting chimeras (PROTACs) are small molecules that trigger the breakdown of traditionally ``undruggable'' proteins by binding simultaneously to their targets and degradation-associated proteins. A key challenge in their rational design is understanding their structural basis of activity. Due to the lack of crystal structures (18 in the PDB), existing PROTAC docking methods have been forced to simplify the problem into a distance-constrained protein-protein docking task. To address the data issue, we develop a novel pseudo-data generation scheme that requires only binary protein-protein complexes. This new dataset enables PROflow, an iterative refinement model for PROTAC-induced structure prediction that models the full PROTAC flexibility during constrained protein-protein docking. PROflow outperforms the state-of-the-art across docking metrics and runtime. Its inference speed enables the large-scale screening of PROTAC designs, and computed properties of predicted structures achieve statistically significant correlations with published degradation activities.

5/14/2024

A Comprehensive Review of Emerging Approaches in Machine Learning for De Novo PROTAC Design

Yossra Gharbi, Roc'io Mercado

Targeted protein degradation (TPD) is a rapidly growing field in modern drug discovery that aims to regulate the intracellular levels of proteins by harnessing the cell's innate degradation pathways to selectively target and degrade disease-related proteins. This strategy creates new opportunities for therapeutic intervention in cases where occupancy-based inhibitors have not been successful. Proteolysis-targeting chimeras (PROTACs) are at the heart of TPD strategies, which leverage the ubiquitin-proteasome system for the selective targeting and proteasomal degradation of pathogenic proteins. As the field evolves, it becomes increasingly apparent that the traditional methodologies for designing such complex molecules have limitations. This has led to the use of machine learning (ML) and generative modeling to improve and accelerate the development process. In this review, we explore the impact of ML on de novo PROTAC design $-$ an aspect of molecular design that has not been comprehensively reviewed despite its significance. We delve into the distinct characteristics of PROTAC linker design, underscoring the complexities required to create effective bifunctional molecules capable of TPD. We then examine how ML in the context of fragment-based drug design (FBDD), honed in the realm of small-molecule drug discovery, is paving the way for PROTAC linker design. Our review provides a critical evaluation of the limitations inherent in applying this method to the complex field of PROTAC development. Moreover, we review existing ML works applied to PROTAC design, highlighting pioneering efforts and, importantly, the limitations these studies face. By offering insights into the current state of PROTAC development and the integral role of ML in PROTAC design, we aim to provide valuable perspectives for researchers in their pursuit of better design strategies for this new modality.

6/26/2024

💬

PPFlow: Target-aware Peptide Design with Torsional Flow Matching

Haitao Lin, Odin Zhang, Huifeng Zhao, Dejun Jiang, Lirong Wu, Zicheng Liu, Yufei Huang, Stan Z. Li

Therapeutic peptides have proven to have great pharmaceutical value and potential in recent decades. However, methods of AI-assisted peptide drug discovery are not fully explored. To fill the gap, we propose a target-aware peptide design method called textsc{PPFlow}, based on conditional flow matching on torus manifolds, to model the internal geometries of torsion angles for the peptide structure design. Besides, we establish a protein-peptide binding dataset named PPBench2024 to fill the void of massive data for the task of structure-based peptide drug design and to allow the training of deep learning methods. Extensive experiments show that PPFlow reaches state-of-the-art performance in tasks of peptide drug generation and optimization in comparison with baseline models, and can be generalized to other tasks including docking and side-chain packing.

6/18/2024

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation

Guillaume Huguet, James Vuckovic, Kilian Fatras, Eric Thibodeau-Laufer, Pablo Lemos, Riashat Islam, Cheng-Hao Liu, Jarrid Rector-Brooks, Tara Akhound-Sadegh, Michael Bronstein, Alexander Tong, Avishek Joey Bose

Proteins are essential for almost all biological processes and derive their diverse functions from complex 3D structures, which are in turn determined by their amino acid sequences. In this paper, we exploit the rich biological inductive bias of amino acid sequences and introduce FoldFlow-2, a novel sequence-conditioned SE(3)-equivariant flow matching model for protein structure generation. FoldFlow-2 presents substantial new architectural features over the previous FoldFlow family of models including a protein large language model to encode sequence, a new multi-modal fusion trunk that combines structure and sequence representations, and a geometric transformer based decoder. To increase diversity and novelty of generated samples -- crucial for de-novo drug design -- we train FoldFlow-2 at scale on a new dataset that is an order of magnitude larger than PDB datasets of prior works, containing both known proteins in PDB and high-quality synthetic structures achieved through filtering. We further demonstrate the ability to align FoldFlow-2 to arbitrary rewards, e.g. increasing secondary structures diversity, by introducing a Reinforced Finetuning (ReFT) objective. We empirically observe that FoldFlow-2 outperforms previous state-of-the-art protein structure-based generative models, improving over RFDiffusion in terms of unconditional generation across all metrics including designability, diversity, and novelty across all protein lengths, as well as exhibiting generalization on the task of equilibrium conformation sampling. Finally, we demonstrate that a fine-tuned FoldFlow-2 makes progress on challenging conditional design tasks such as designing scaffolds for the VHH nanobody.

5/31/2024