PPFlow: Target-aware Peptide Design with Torsional Flow Matching

Read original: arXiv:2405.06642 - Published 6/18/2024 by Haitao Lin, Odin Zhang, Huifeng Zhao, Dejun Jiang, Lirong Wu, Zicheng Liu, Yufei Huang, Stan Z. Li
Total Score

0

💬

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Peptides have shown great promise in drug development, but leveraging AI for peptide drug discovery is not yet fully explored.
  • The researchers propose a method called PPFlow to design peptides based on modeling their internal geometric structures.
  • They also introduce a new dataset, PPBench2024, to support structure-based peptide drug design research.
  • Experiments show that PPFlow outperforms baseline models on peptide drug generation and optimization tasks, and can be applied to other related problems like protein docking and side-chain packing.

Plain English Explanation

Peptides are small chains of amino acids that can have valuable pharmaceutical properties. Researchers have been exploring ways to use artificial intelligence (AI) to help discover new peptide-based drugs. However, this area has not been fully developed yet.

To address this, the researchers created a new method called PPFlow that aims to design peptides by modeling their internal 3D structures. The key idea is to represent the twists and turns of the peptide molecule using a special mathematical concept called a "torus manifold." This allows the AI system to learn the typical patterns and geometries of peptides, which can then be used to generate new peptide designs.

Additionally, the researchers built a new dataset called PPBench2024 to help train and test AI models for peptide drug discovery. This dataset contains information about how peptides interact with proteins, which is important for designing drugs that can bind to their molecular targets.

When tested, the PPFlow method was found to outperform other AI approaches for generating and optimizing peptide drug candidates. The researchers also showed that PPFlow can be applied to other related problems, like predicting how peptides will dock with proteins or how the "side chains" of amino acids will pack together.

Technical Explanation

The core of the PPFlow method is a technique called "conditional flow matching on torus manifolds." This involves representing the 3D structure of a peptide using a mathematical shape called a torus, which can capture the circular, twisting nature of the molecule. The AI system then learns to generate new peptide structures by matching the internal geometric patterns it observes in the training data.

This torus-based representation allows PPFlow to model the complex, non-linear relationships between different parts of the peptide structure. The researchers show this outperforms simpler approaches that treat the peptide as a linear chain.

To support the development of PPFlow and other peptide drug discovery AI models, the researchers also introduced the PPBench2024 dataset. This dataset contains information about how thousands of peptides interact with target proteins, providing a rich training and evaluation resource for AI systems.

The experimental results demonstrate that PPFlow achieves state-of-the-art performance on peptide drug generation and optimization tasks compared to baseline models. Importantly, the researchers also show that PPFlow can be applied to other related problems like protein-ligand docking and side-chain packing, suggesting its generalizability.

Critical Analysis

The researchers acknowledge some limitations of their work. For example, the PPBench2024 dataset, while comprehensive, may not capture the full diversity of peptide-protein interactions observed in the real world. Additionally, the torus-based representation used in PPFlow, while powerful, may not be able to model all the nuances of peptide structure.

Further research could investigate ways to expand the PPBench2024 dataset, such as by incorporating data from additional experimental sources. Exploring alternative structural representations or hybrid approaches that combine multiple modeling techniques may also lead to further improvements in peptide drug design capabilities.

Overall, the PPFlow method and the PPBench2024 dataset represent significant advances in the field of AI-assisted peptide drug discovery. By enabling more accurate and efficient peptide design, these innovations have the potential to accelerate the development of new peptide-based therapeutics for a wide range of medical applications.

Conclusion

This research proposes a novel AI-based method called PPFlow for designing peptide-based drug candidates. PPFlow models the complex internal geometries of peptides using a torus-based representation, which allows it to outperform other approaches on peptide generation and optimization tasks.

The researchers also introduce the PPBench2024 dataset, a valuable resource for training and evaluating AI systems for structure-based peptide drug discovery. Experiments show that PPFlow can be applied beyond just peptide design, with potential use cases in protein docking and side-chain packing as well.

While the work has some limitations, it represents an important step forward in leveraging AI to accelerate the development of new peptide-based medicines. As the field continues to evolve, further advancements in peptide modeling and data resources could lead to even more transformative breakthroughs in the coming years.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Total Score

0

PPFlow: Target-aware Peptide Design with Torsional Flow Matching

Haitao Lin, Odin Zhang, Huifeng Zhao, Dejun Jiang, Lirong Wu, Zicheng Liu, Yufei Huang, Stan Z. Li

Therapeutic peptides have proven to have great pharmaceutical value and potential in recent decades. However, methods of AI-assisted peptide drug discovery are not fully explored. To fill the gap, we propose a target-aware peptide design method called textsc{PPFlow}, based on conditional flow matching on torus manifolds, to model the internal geometries of torsion angles for the peptide structure design. Besides, we establish a protein-peptide binding dataset named PPBench2024 to fill the void of massive data for the task of structure-based peptide drug design and to allow the training of deep learning methods. Extensive experiments show that PPFlow reaches state-of-the-art performance in tasks of peptide drug generation and optimization in comparison with baseline models, and can be generalized to other tasks including docking and side-chain packing.

Read more

6/18/2024

Full-Atom Peptide Design based on Multi-modal Flow Matching
Total Score

0

Full-Atom Peptide Design based on Multi-modal Flow Matching

Jiahan Li, Chaoran Cheng, Zuofan Wu, Ruihan Guo, Shitong Luo, Zhizhou Ren, Jian Peng, Jianzhu Ma

Peptides, short chains of amino acid residues, play a vital role in numerous biological processes by interacting with other target molecules, offering substantial potential in drug discovery. In this work, we present PepFlow, the first multi-modal deep generative model grounded in the flow-matching framework for the design of full-atom peptides that target specific protein receptors. Drawing inspiration from the crucial roles of residue backbone orientations and side-chain dynamics in protein-peptide interactions, we characterize the peptide structure using rigid backbone frames within the $mathrm{SE}(3)$ manifold and side-chain angles on high-dimensional tori. Furthermore, we represent discrete residue types in the peptide sequence as categorical distributions on the probability simplex. By learning the joint distributions of each modality using derived flows and vector fields on corresponding manifolds, our method excels in the fine-grained design of full-atom peptides. Harnessing the multi-modal paradigm, our approach adeptly tackles various tasks such as fix-backbone sequence design and side-chain packing through partial sampling. Through meticulously crafted experiments, we demonstrate that PepFlow exhibits superior performance in comprehensive benchmarks, highlighting its significant potential in computational peptide design and analysis.

Read more

6/4/2024

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation
Total Score

0

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation

Guillaume Huguet, James Vuckovic, Kilian Fatras, Eric Thibodeau-Laufer, Pablo Lemos, Riashat Islam, Cheng-Hao Liu, Jarrid Rector-Brooks, Tara Akhound-Sadegh, Michael Bronstein, Alexander Tong, Avishek Joey Bose

Proteins are essential for almost all biological processes and derive their diverse functions from complex 3D structures, which are in turn determined by their amino acid sequences. In this paper, we exploit the rich biological inductive bias of amino acid sequences and introduce FoldFlow-2, a novel sequence-conditioned SE(3)-equivariant flow matching model for protein structure generation. FoldFlow-2 presents substantial new architectural features over the previous FoldFlow family of models including a protein large language model to encode sequence, a new multi-modal fusion trunk that combines structure and sequence representations, and a geometric transformer based decoder. To increase diversity and novelty of generated samples -- crucial for de-novo drug design -- we train FoldFlow-2 at scale on a new dataset that is an order of magnitude larger than PDB datasets of prior works, containing both known proteins in PDB and high-quality synthetic structures achieved through filtering. We further demonstrate the ability to align FoldFlow-2 to arbitrary rewards, e.g. increasing secondary structures diversity, by introducing a Reinforced Finetuning (ReFT) objective. We empirically observe that FoldFlow-2 outperforms previous state-of-the-art protein structure-based generative models, improving over RFDiffusion in terms of unconditional generation across all metrics including designability, diversity, and novelty across all protein lengths, as well as exhibiting generalization on the task of equilibrium conformation sampling. Finally, we demonstrate that a fine-tuned FoldFlow-2 makes progress on challenging conditional design tasks such as designing scaffolds for the VHH nanobody.

Read more

5/31/2024

📈

Total Score

0

PROflow: An iterative refinement model for PROTAC-induced structure prediction

Bo Qiang, Wenxian Shi, Yuxuan Song, Menghua Wu

Proteolysis targeting chimeras (PROTACs) are small molecules that trigger the breakdown of traditionally ``undruggable'' proteins by binding simultaneously to their targets and degradation-associated proteins. A key challenge in their rational design is understanding their structural basis of activity. Due to the lack of crystal structures (18 in the PDB), existing PROTAC docking methods have been forced to simplify the problem into a distance-constrained protein-protein docking task. To address the data issue, we develop a novel pseudo-data generation scheme that requires only binary protein-protein complexes. This new dataset enables PROflow, an iterative refinement model for PROTAC-induced structure prediction that models the full PROTAC flexibility during constrained protein-protein docking. PROflow outperforms the state-of-the-art across docking metrics and runtime. Its inference speed enables the large-scale screening of PROTAC designs, and computed properties of predicted structures achieve statistically significant correlations with published degradation activities.

Read more

5/14/2024