Retro-fallback: retrosynthetic planning in an uncertain world

Read original: arXiv:2310.09270 - Published 4/16/2024 by Austin Tripp, Krzysztof Maziarz, Sarah Lewis, Marwin Segler, Jos'e Miguel Hern'andez-Lobato

🔍

Overview

Retrosynthesis is the process of planning a series of chemical reactions to create a desired molecule from simpler, available molecules.
Previous retrosynthesis algorithms aimed to find optimal solutions, but did not account for the fact that our knowledge of possible reactions is imperfect.
This paper proposes a new way of formulating retrosynthesis as a stochastic process to handle this uncertainty.
The authors also present a novel greedy algorithm called "retro-fallback" that maximizes the probability of being able to execute at least one synthesis plan in the lab.
Experiments show that retro-fallback generally produces better sets of synthesis plans compared to existing algorithms.

Plain English Explanation

Imagine you're a chemist trying to create a new drug molecule. You can't just start mixing random chemicals together - you need a plan. Retrosynthesis is the process of working backwards from the target molecule to figure out what simpler molecules you can start with and what reactions you need to perform to build up to the final product.

Previous retrosynthesis algorithms have tried to find the "optimal" plan, such as the shortest or cheapest sequence of reactions. However, the problem is that our knowledge of chemistry is imperfect. We don't know everything about how different chemicals will react, so a plan that looks good on paper may not actually work when you try it in the lab.

This new paper proposes a way to model retrosynthesis as a stochastic (probabilistic) process. Instead of just going for the single "best" plan, the authors' algorithm tries to find a set of plans that maximizes the chances that at least one of them will be executable in the real world. It's like having a backup plan, or a few different options to try.

The results show that this new "retro-fallback" algorithm generally comes up with better sets of synthesis plans compared to other popular methods. It's a more realistic approach that accounts for the uncertainty in our chemical knowledge.

Technical Explanation

The key innovation in this paper is the formulation of retrosynthesis as a stochastic process. Previous works have used deterministic algorithms to find optimal retrosynthesis plans, but these do not capture the reality that our knowledge of possible chemical reactions is imperfect.

The authors model retrosynthesis as a Markov decision process, where each step in the synthesis plan represents a transition with some probability of success. They then propose a novel greedy algorithm called "retro-fallback" that aims to maximize the probability that at least one of the generated synthesis plans can be executed in the lab.

Retro-fallback works by iteratively expanding the most promising partial synthesis plans, while maintaining a set of backup plans. At each step, the algorithm chooses the expansion that maximizes the probability of having at least one executable plan, rather than just optimizing for the single best plan.

The authors evaluate retro-fallback against two other popular retrosynthesis algorithms, MCTS and retro*, using in-silico benchmarks. They demonstrate that retro-fallback generally produces better sets of synthesis plans, with higher probabilities of successful execution.

Critical Analysis

The key strength of this work is the incorporation of reaction uncertainty into the retrosynthesis problem formulation. By modeling retrosynthesis as a stochastic process, the authors have developed a more realistic approach that better reflects the challenges faced by chemists in the lab.

However, one limitation is that the paper does not address the issue of how to accurately estimate the probabilities of different reaction steps succeeding. In practice, these probabilities may be difficult to obtain, as they likely depend on a wide range of factors.

Additionally, the in-silico evaluation, while useful, does not fully capture the real-world complexities of laboratory experimentation. Further validation on actual experimental data would be valuable to assess the practical effectiveness of the retro-fallback algorithm.

Another potential area for improvement is the greedy nature of the retro-fallback algorithm. While it performs well in the benchmarks, there may be opportunities to incorporate more global optimization techniques to further improve the quality of the generated synthesis plans.

Conclusion

This paper presents a novel approach to the retrosynthesis problem that accounts for the uncertainty inherent in our knowledge of chemical reactions. By formulating retrosynthesis as a stochastic process and proposing the retro-fallback algorithm, the authors have developed a more realistic and effective tool for chemists seeking to plan the synthesis of target molecules.

The results demonstrate the potential benefits of this stochastic approach, which could have significant implications for fields such as drug discovery and materials design, where the ability to reliably plan and execute chemical syntheses is of critical importance.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔍

Retro-fallback: retrosynthetic planning in an uncertain world

Austin Tripp, Krzysztof Maziarz, Sarah Lewis, Marwin Segler, Jos'e Miguel Hern'andez-Lobato

Retrosynthesis is the task of planning a series of chemical reactions to create a desired molecule from simpler, buyable molecules. While previous works have proposed algorithms to find optimal solutions for a range of metrics (e.g. shortest, lowest-cost), these works generally overlook the fact that we have imperfect knowledge of the space of possible reactions, meaning plans created by algorithms may not work in a laboratory. In this paper we propose a novel formulation of retrosynthesis in terms of stochastic processes to account for this uncertainty. We then propose a novel greedy algorithm called retro-fallback which maximizes the probability that at least one synthesis plan can be executed in the lab. Using in-silico benchmarks we demonstrate that retro-fallback generally produces better sets of synthesis plans than the popular MCTS and retro* algorithms.

4/16/2024

Retro-prob: Retrosynthetic Planning Based on a Probabilistic Model

Chengyang Tian, Yangpeng Zhang, Yang Liu

Retrosynthesis is a fundamental but challenging task in organic chemistry, with broad applications in fields such as drug design and synthesis. Given a target molecule, the goal of retrosynthesis is to find out a series of reactions which could be assembled into a synthetic route which starts from purchasable molecules and ends at the target molecule. The uncertainty of reactions used in retrosynthetic planning, which is caused by hallucinations of backward models, has recently been noticed. In this paper we propose a succinct probabilistic model to describe such uncertainty. Based on the model, we propose a new retrosynthesis planning algorithm called retro-prob to maximize the successful synthesis probability of target molecules, which acquires high efficiency by utilizing the chain rule of derivatives. Experiments on the Paroutes benchmark show that retro-prob outperforms previous algorithms, retro* and retro-fallback, both in speed and in the quality of synthesis plans.

5/28/2024

🌿

Re-evaluating Retrosynthesis Algorithms with Syntheseus

Krzysztof Maziarz, Austin Tripp, Guoqing Liu, Megan Stanley, Shufang Xie, Piotr Gai'nski, Philipp Seidl, Marwin Segler

Automated Synthesis Planning has recently re-emerged as a research area at the intersection of chemistry and machine learning. Despite the appearance of steady progress, we argue that imperfect benchmarks and inconsistent comparisons mask systematic shortcomings of existing techniques, and unnecessarily hamper progress. To remedy this, we present a synthesis planning library with an extensive benchmarking framework, called syntheseus, which promotes best practice by default, enabling consistent meaningful evaluation of single-step models and multi-step planning algorithms. We demonstrate the capabilities of syntheseus by re-evaluating several previous retrosynthesis algorithms, and find that the ranking of state-of-the-art models changes in controlled evaluation experiments. We end with guidance for future works in this area, and call the community to engage in the discussion on how to improve benchmarks for synthesis planning.

9/9/2024

Evolutionary Retrosynthetic Route Planning

Yan Zhang, Hao Hao, Xiao He, Shuanhu Gao, Aimin Zhou

Molecular retrosynthesis is a significant and complex problem in the field of chemistry, however, traditional manual synthesis methods not only need well-trained experts but also are time-consuming. With the development of big data and machine learning, artificial intelligence (AI) based retrosynthesis is attracting more attention and has become a valuable tool for molecular retrosynthesis. At present, Monte Carlo tree search is a mainstream search framework employed to address this problem. Nevertheless, its search efficiency is compromised by its large search space. Therefore, this paper proposes a novel approach for retrosynthetic route planning based on evolutionary optimization, marking the first use of Evolutionary Algorithm (EA) in the field of multi-step retrosynthesis. The proposed method involves modeling the retrosynthetic problem into an optimization problem, defining the search space and operators. Additionally, to improve the search efficiency, a parallel strategy is implemented. The new approach is applied to four case products and compared with Monte Carlo tree search. The experimental results show that, in comparison to the Monte Carlo tree search algorithm, EA significantly reduces the number of calling single-step model by an average of 53.9%. The time required to search three solutions decreases by an average of 83.9%, and the number of feasible search routes increases by 1.38 times. The source code is available at https://github.com/ilog-ecnu/EvoRRP.

7/16/2024