Towards Evolutionary-based Automated Machine Learning for Small Molecule Pharmacokinetic Prediction

Read original: arXiv:2408.00421 - Published 8/2/2024 by Alex G. C. de S'a, David B. Ascher

Towards Evolutionary-based Automated Machine Learning for Small Molecule Pharmacokinetic Prediction

Overview

This paper explores the use of evolutionary-based automated machine learning (AutoML) for predicting the pharmacokinetic properties of small molecules.
It proposes a grammar-based genetic programming (GBGP) approach to automatically generate machine learning models for small molecule pharmacokinetic prediction.
The goal is to develop a more efficient and effective AutoML system for this domain compared to traditional methods.

Plain English Explanation

The paper is focused on using evolutionary-based AutoML to predict the pharmacokinetic properties of small molecules. Pharmacokinetics refers to how drugs move through the body - things like absorption, distribution, metabolism, and excretion.

The key idea is to use a grammar-based genetic programming (GBGP) approach to automatically generate machine learning models for this task. This means the system learns to create its own predictive models, rather than relying on humans to design them.

The authors argue this evolutionary-based AutoML approach can be more efficient and effective than traditional methods for small molecule optimization and drug discovery. The core benefit is being able to automatically explore a vast space of potential models and find the best one for predicting pharmacokinetic properties.

Technical Explanation

The paper proposes a grammar-based genetic programming (GBGP) approach for automated machine learning (AutoML) on small molecule pharmacokinetic prediction tasks.

The GBGP system uses an evolutionary algorithm to iteratively generate and evaluate candidate machine learning models. It starts with an initial population of model "individuals" encoded as trees, and then applies genetic operators like mutation and crossover to produce new model variants. Each variant is evaluated on a pharmacokinetic dataset, and the highest performing models are selected to seed the next generation.

Over many iterations, the system autonomously explores the space of possible models, learning to construct more accurate predictors of small molecule pharmacokinetics. The grammar-based encoding allows the models to have a rich, flexible structure, going beyond simple neural networks or regression equations.

The authors evaluate their GBGP-based AutoML approach on several pharmacokinetic endpoints, comparing it to traditional machine learning techniques as well as other AutoML frameworks. They find that the evolutionary-based method is able to automatically discover effective predictive models, outperforming the baselines.

Critical Analysis

The paper presents a novel and promising application of evolutionary AutoML to the domain of small molecule pharmacokinetics. The GBGP approach allows the system to flexibly explore a wide range of model architectures, which could be valuable given the complexity of predicting how drugs move through biological systems.

One limitation is that the study only evaluates the method on a few pharmacokinetic endpoints. Further research would be needed to assess its broader applicability across a wider range of pharmacokinetic properties and small molecule datasets.

Additionally, the paper does not provide much insight into the specific model structures or features discovered by the GBGP system. Understanding the "why" behind the successful models could yield useful scientific insights, beyond just the empirical performance gains.

Overall, this work demonstrates the potential of evolutionary-based AutoML for advancing computational drug discovery and optimization. However, more research is needed to fully validate the approach and unpack its underlying mechanisms.

Conclusion

This paper explores the use of an evolutionary-based AutoML method, specifically grammar-based genetic programming (GBGP), for predicting the pharmacokinetic properties of small molecules. The key innovation is the ability of the system to automatically generate and refine predictive models, rather than relying on human-designed approaches.

The results indicate that this evolutionary AutoML strategy can outperform traditional machine learning techniques on pharmacokinetic endpoints. This suggests it could be a valuable tool for accelerating small molecule optimization and drug discovery workflows.

Further research is needed to fully understand the strengths and limitations of this approach, as well as explore its potential applications in other areas of computational biology and chemistry. But this work represents an important step towards more autonomous and effective machine learning for small molecule design and development.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Towards Evolutionary-based Automated Machine Learning for Small Molecule Pharmacokinetic Prediction

Alex G. C. de S'a, David B. Ascher

Machine learning (ML) is revolutionising drug discovery by expediting the prediction of small molecule properties essential for developing new drugs. These properties -- including absorption, distribution, metabolism and excretion (ADME)-- are crucial in the early stages of drug development since they provide an understanding of the course of the drug in the organism, i.e., the drug's pharmacokinetics. However, existing methods lack personalisation and rely on manually crafted ML algorithms or pipelines, which can introduce inefficiencies and biases into the process. To address these challenges, we propose a novel evolutionary-based automated ML method (AutoML) specifically designed for predicting small molecule properties, with a particular focus on pharmacokinetics. Leveraging the advantages of grammar-based genetic programming, our AutoML method streamlines the process by automatically selecting algorithms and designing predictive pipelines tailored to the particular characteristics of input molecular data. Results demonstrate AutoML's effectiveness in selecting diverse ML algorithms, resulting in comparable or even improved predictive performances compared to conventional approaches. By offering personalised ML-driven pipelines, our method promises to enhance small molecule research in drug discovery, providing researchers with a valuable tool for accelerating the development of novel therapeutic drugs.

8/2/2024

Small Molecule Optimization with Large Language Models

Philipp Guevorguian, Menua Bedrosian, Tigran Fahradyan, Gayane Chilingaryan, Hrant Khachatrian, Armen Aghajanyan

Recent advancements in large language models have opened new possibilities for generative molecular drug design. We present Chemlactica and Chemma, two language models fine-tuned on a novel corpus of 110M molecules with computed properties, totaling 40B tokens. These models demonstrate strong performance in generating molecules with specified properties and predicting new molecular characteristics from limited samples. We introduce a novel optimization algorithm that leverages our language models to optimize molecules for arbitrary properties given limited access to a black box oracle. Our approach combines ideas from genetic algorithms, rejection sampling, and prompt optimization. It achieves state-of-the-art performance on multiple molecular optimization benchmarks, including an 8% improvement on Practical Molecular Optimization compared to previous methods. We publicly release the training corpus, the language models and the optimization algorithm.

7/29/2024

👀

Implementation of The Future of Drug Discovery: QuantumBased Machine Learning Simulation (QMLS)

Yifan Zhou, Yan Shing Liang, Yew Kee Wong, Haichuan Qiu, Yu Xi Wu, Bin He

The Research & Development (R&D) phase of drug development is a lengthy and costly process. To revolutionize this process, we introduce our new concept QMLS to shorten the whole R&D phase to three to six months and decrease the cost to merely fifty to eighty thousand USD. For Hit Generation, Machine Learning Molecule Generation (MLMG) generates possible hits according to the molecular structure of the target protein while the Quantum Simulation (QS) filters molecules from the primary essay based on the reaction and binding effectiveness with the target protein. Then, For Lead Optimization, the resultant molecules generated and filtered from MLMG and QS are compared, and molecules that appear as a result of both processes will be made into dozens of molecular variations through Machine Learning Molecule Variation (MLMV), while others will only be made into a few variations. Lastly, all optimized molecules would undergo multiple rounds of QS filtering with a high standard for reaction effectiveness and safety, creating a few dozen pre-clinical-trail-ready drugs. This paper is based on our first paper, where we pitched the concept of machine learning combined with quantum simulations. In this paper we will go over the detailed design and framework of QMLS, including MLMG, MLMV, and QS.

9/6/2024

🔮

Physical formula enhanced multi-task learning for pharmacokinetics prediction

Ruifeng Li, Dongzhan Zhou, Ancheng Shen, Ao Zhang, Mao Su, Mingqian Li, Hongyang Chen, Gang Chen, Yin Zhang, Shufei Zhang, Yuqiang Li, Wanli Ouyang

Artificial intelligence (AI) technology has demonstrated remarkable potential in drug dis-covery, where pharmacokinetics plays a crucial role in determining the dosage, safety, and efficacy of new drugs. A major challenge for AI-driven drug discovery (AIDD) is the scarcity of high-quality data, which often requires extensive wet-lab work. A typical example of this is pharmacokinetic experiments. In this work, we develop a physical formula enhanced mul-ti-task learning (PEMAL) method that predicts four key parameters of pharmacokinetics simultaneously. By incorporating physical formulas into the multi-task framework, PEMAL facilitates effective knowledge sharing and target alignment among the pharmacokinetic parameters, thereby enhancing the accuracy of prediction. Our experiments reveal that PEMAL significantly lowers the data demand, compared to typical Graph Neural Networks. Moreover, we demonstrate that PEMAL enhances the robustness to noise, an advantage that conventional Neural Networks do not possess. Another advantage of PEMAL is its high flexibility, which can be potentially applied to other multi-task machine learning scenarios. Overall, our work illustrates the benefits and potential of using PEMAL in AIDD and other scenarios with data scarcity and noise.

4/17/2024