From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning

Read original: arXiv:2208.10230 - Published 9/4/2024 by Yaosen Min, Ye Wei, Peizhuo Wang, Xiaoting Wang, Han Li, Nian Wu, Stefan Bauer, Shuxin Zheng, Yu Shi, Yingheng Wang and 3 others

🔮

Overview

Accurately predicting the binding affinity between proteins and drug compounds (ligands) is crucial for structure-based drug design.
Existing data-driven methods have limited accuracy because they only use static crystal structures, while binding affinities are determined by the dynamic interactions between proteins and ligands.
Molecular dynamics (MD) simulations can better approximate the thermodynamic ensemble of these interactions.
This paper presents Dynaformer, a graph-based deep learning model that predicts binding affinities by learning from MD simulation trajectories.

Plain English Explanation

When developing new drugs, researchers need to understand how well a potential drug compound (ligand) will bind to a target protein. This binding affinity is a key factor in determining the drug's effectiveness. However, accurately predicting binding affinities has been challenging.

Existing computer models for this task mainly use static 3D structures of proteins and ligands, but the actual binding process involves dynamic, constantly changing interactions. To better capture this, the researchers turned to molecular dynamics (MD) simulations, which can model the movements and interactions of molecules over time.

The researchers created a large dataset of 3,218 protein-ligand complexes and their MD simulation trajectories. They then developed a new machine learning model called Dynaformer that can analyze these MD simulations and predict the binding affinities. Dynaformer uses a graph-based deep learning approach to learn the key geometric features of the protein-ligand interactions.

When tested on a standard benchmark dataset, Dynaformer outperformed previous methods in its ability to accurately score and rank the binding affinities. The researchers also used Dynaformer to virtually screen for new drug candidates that bind to a protein called HSP90, and were able to identify several promising compounds, including some with very strong binding.

Overall, this work demonstrates that incorporating dynamic MD simulation data can significantly improve the accuracy of computational methods for predicting protein-ligand binding, which is an important step in accelerating the early stages of the drug discovery process.

Technical Explanation

The researchers curated a large MD dataset containing 3,218 protein-ligand complexes, which they used to train their Dynaformer model. Dynaformer is a graph-based deep learning architecture that learns to predict binding affinities by analyzing the geometric characteristics of the protein-ligand interactions captured in the MD trajectories.

Through in silico experiments, the team demonstrated that Dynaformer exhibits state-of-the-art performance on the CASF-2016 benchmark dataset, outperforming previous machine learning approaches for predicting and ranking protein-ligand binding affinities.

The researchers also applied Dynaformer to a virtual screening task, using it to identify 20 candidate compounds that might bind to the HSP90 protein. They then experimentally validated the binding affinities of these candidates, finding that Dynaformer successfully identified 12 "hit" compounds, including several novel scaffolds with submicromolar binding affinities.

Critical Analysis

The paper presents a promising approach for improving the accuracy of computational methods for predicting protein-ligand binding affinities, which is a key challenge in structure-based drug design. By incorporating dynamic MD simulation data, the Dynaformer model is able to better capture the thermodynamic ensemble of interactions that determine binding, leading to improved performance on standard benchmarks.

However, the authors acknowledge that their dataset, while large, still represents only a small fraction of the possible protein-ligand interactions. Expanding the diversity of the training data, as well as further optimizing the model architecture and training procedures, could lead to even better predictive performance.

Additionally, while the virtual screening results on HSP90 are encouraging, the authors do not provide a comparison to other virtual screening methods. It would be helpful to understand how Dynaformer's performance compares to more traditional structure-based or ligand-based approaches.

Overall, this work represents an important step forward in the field of computational drug discovery, and the Dynaformer model shows significant potential for accelerating the identification of new drug candidates. Further research and validation on a broader range of target proteins and ligands will be needed to fully assess the generalizability and practical impact of this approach.

Conclusion

This paper presents a novel deep learning model called Dynaformer that can accurately predict protein-ligand binding affinities by learning from molecular dynamics simulation data. Dynaformer outperformed previous methods on standard benchmarks and was also successful in identifying promising drug candidates through virtual screening experiments.

The incorporation of dynamic simulation data is a key innovation that allows Dynaformer to better capture the thermodynamic ensemble of protein-ligand interactions, leading to improved predictive power. While further research is needed to expand the model's capabilities, this work demonstrates the potential for data-driven approaches that leverage realistic physical simulations to accelerate the early stages of the drug discovery process.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔮

From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning

Yaosen Min, Ye Wei, Peizhuo Wang, Xiaoting Wang, Han Li, Nian Wu, Stefan Bauer, Shuxin Zheng, Yu Shi, Yingheng Wang, Ji Wu, Dan Zhao, Jianyang Zeng

Accurate prediction of protein-ligand binding affinities is an essential challenge in structure-based drug design. Despite recent advances in data-driven methods for affinity prediction, their accuracy is still limited, partially because they only take advantage of static crystal structures while the actual binding affinities are generally determined by the thermodynamic ensembles between proteins and ligands. One effective way to approximate such a thermodynamic ensemble is to use molecular dynamics (MD) simulation. Here, an MD dataset containing 3,218 different protein-ligand complexes is curated, and Dynaformer, a graph-based deep learning model is further developed to predict the binding affinities by learning the geometric characteristics of the protein-ligand interactions from the MD trajectories. In silico experiments demonstrated that the model exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset, outperforming the methods hitherto reported. Moreover, in a virtual screening on heat shock protein 90 (HSP90) using Dynaformer, 20 candidates are identified and their binding affinities are further experimentally validated. Dynaformer displayed promising results in virtual drug screening, revealing 12 hit compounds (two are in the submicromolar range), including several novel scaffolds. Overall, these results demonstrated that the approach offer a promising avenue for accelerating the early drug discovery process.

9/4/2024

🧠

A hybrid quantum-classical fusion neural network to improve protein-ligand binding affinity predictions for drug discovery

L. Domingo, M. Chehimi, S. Banerjee, S. He Yuxun, S. Konakanchi, L. Ogunfowora, S. Roy, S. Selvaras, M. Djukic, C. Johnson

The field of drug discovery hinges on the accurate prediction of binding affinity between prospective drug molecules and target proteins, especially when such proteins directly influence disease progression. However, estimating binding affinity demands significant financial and computational resources. While state-of-the-art methodologies employ classical machine learning (ML) techniques, emerging hybrid quantum machine learning (QML) models have shown promise for enhanced performance, owing to their inherent parallelism and capacity to manage exponential increases in data dimensionality. Despite these advances, existing models encounter issues related to convergence stability and prediction accuracy. This paper introduces a novel hybrid quantum-classical deep learning model tailored for binding affinity prediction in drug discovery. Specifically, the proposed model synergistically integrates 3D and spatial graph convolutional neural networks within an optimized quantum architecture. Simulation results demonstrate a 6% improvement in prediction accuracy relative to existing classical models, as well as a significantly more stable convergence performance compared to previous classical approaches.

9/4/2024

🔮

Improved prediction of ligand-protein binding affinities by meta-modeling

Ho-Joon Lee, Prashant S. Emani, Mark B. Gerstein

The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Many computational models for binding affinity prediction have been developed, but with varying results across targets. Given that ensembling or meta-modeling methods have shown great promise in reducing model-specific biases, we develop a framework to integrate published force-field-based empirical docking and sequence-based deep learning models. In building this framework, we evaluate many combinations of individual base models, training databases, and several meta-modeling approaches. We show that many of our meta-models significantly improve affinity predictions over base models. Our best meta-models achieve comparable performance to state-of-the-art deep learning tools exclusively based on structures, while allowing for improved database scalability and flexibility through the explicit inclusion of features such as physicochemical properties or molecular descriptors. Overall, we demonstrate that diverse modeling approaches can be ensembled together to gain improvement in binding affinity prediction.

5/21/2024

On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction

Nikolai Schapin, Carles Navarro, Albert Bou, Gianni De Fabritiis

Binding affinity optimization is crucial in early-stage drug discovery. While numerous machine learning methods exist for predicting ligand potency, their comparative efficacy remains unclear. This study evaluates the performance of classical tree-based models and advanced neural networks in protein-ligand binding affinity prediction. Our comprehensive benchmarking encompasses 2D models utilizing ligand-only RDKit embeddings and Large Language Model (LLM) ligand representations, as well as 3D neural networks incorporating bound protein-ligand conformations. We assess these models across multiple standard datasets, examining various predictive scenarios including classification, ranking, regression, and active learning. Results indicate that simpler models can surpass more complex ones in specific tasks, while 3D models leveraging structural information become increasingly competitive with larger training datasets containing compounds with labelled affinity data against multiple targets. Pre-trained 3D models, by incorporating protein pocket environments, demonstrate significant advantages in data-scarce scenarios for specific binding pockets. Additionally, LLM pretraining on 2D ligand data enhances complex model performance, providing versatile embeddings that outperform traditional RDKit features in computational efficiency. Finally, we show that combining 2D and 3D model strengths improves active learning outcomes beyond current state-of-the-art approaches. These findings offer valuable insights for optimizing machine learning strategies in drug discovery pipelines.

7/30/2024