Adaptive Catalyst Discovery Using Multicriteria Bayesian Optimization with Representation Learning

2404.12445

Published 4/22/2024 by Jie Chen, Pengfei Ou, Yuxin Chang, Hengrui Zhang, Xiao-Yan Li, Edward H. Sargent, Wei Chen

🛠️

Abstract

High-performance catalysts are crucial for sustainable energy conversion and human health. However, the discovery of catalysts faces challenges due to the absence of efficient approaches to navigating vast and high-dimensional structure and composition spaces. In this study, we propose a high-throughput computational catalyst screening approach integrating density functional theory (DFT) and Bayesian Optimization (BO). Within the BO framework, we propose an uncertainty-aware atomistic machine learning model, UPNet, which enables automated representation learning directly from high-dimensional catalyst structures and achieves principled uncertainty quantification. Utilizing a constrained expected improvement acquisition function, our BO framework simultaneously considers multiple evaluation criteria. Using the proposed methods, we explore catalyst discovery for the CO2 reduction reaction. The results demonstrate that our approach achieves high prediction accuracy, facilitates interpretable feature extraction, and enables multicriteria design optimization, leading to significant reduction of computing power and time (10x reduction of required DFT calculations) in high-performance catalyst discovery.

Create account to get full access

Overview

Importance of high-performance catalysts for sustainable energy conversion and human health
Challenges in discovering new catalysts due to the vast and high-dimensional search space
Proposal of a high-throughput computational catalyst screening approach integrating density functional theory (DFT) and Bayesian Optimization (BO)
Development of an uncertainty-aware atomistic machine learning model, UPNet, for automated representation learning and uncertainty quantification
Exploration of catalyst discovery for the CO2 reduction reaction
Significant reduction in computing power and time required for high-performance catalyst discovery

Plain English Explanation

Catalysts are essential for many important processes, like converting energy sustainably and improving human health. However, finding new and effective catalysts is challenging because there are so many possible chemical compositions and structures to explore. This study proposes a new computational approach to speed up the discovery of high-performance catalysts.

The researchers combined two powerful techniques: density functional theory (DFT), which can accurately simulate how catalysts work at the atomic level, and Bayesian Optimization (BO), a machine learning method that can efficiently search through large, complex spaces.

Within the BO framework, the researchers developed a new machine learning model called UPNet. This model can automatically learn useful representations directly from the complex 3D structures of potential catalysts and provide reliable estimates of the uncertainty in its predictions. By considering multiple performance criteria at once, the BO framework can then guide the search towards the most promising catalyst candidates.

The researchers used this approach to search for new catalysts for the CO2 reduction reaction, which is important for sustainable energy. The results show that this method can accurately predict catalyst performance, extract meaningful insights about the catalysts, and find high-performing candidates much more efficiently than traditional approaches – reducing the number of expensive DFT calculations needed by 10 times.

Technical Explanation

The researchers proposed a high-throughput computational catalyst screening approach that integrates density functional theory (DFT) and Bayesian Optimization (BO). Within the BO framework, they developed an uncertainty-aware atomistic machine learning model called UPNet, which can automatically learn useful representations directly from high-dimensional catalyst structures and provide principled uncertainty quantification.

The UPNet model uses a graph neural network architecture to capture the complex 3D structure of catalyst materials. By learning directly from the atomic-scale details, UPNet can achieve high prediction accuracy without the need for extensive feature engineering. The model also outputs reliable estimates of the uncertainty in its predictions, which is crucial for guiding the BO optimization process.

The BO framework uses a constrained expected improvement acquisition function to simultaneously consider multiple evaluation criteria, such as catalytic activity and selectivity, during the optimization. This allows the framework to navigate the high-dimensional catalyst design space and identify candidates that excel across various performance metrics.

The researchers applied this integrated DFT-BO approach to the discovery of catalysts for the CO2 reduction reaction. The results demonstrate that the method can achieve high prediction accuracy, facilitate interpretable feature extraction, and enable multicriteria design optimization. Importantly, this approach led to a significant reduction in the required computing power and time, requiring only 10% of the DFT calculations needed by traditional methods.

Critical Analysis

The researchers have presented a compelling approach for accelerating the discovery of high-performance catalysts using a combination of DFT simulations and Bayesian Optimization. The development of the UPNet model, which can learn directly from catalyst structures, is a particularly notable contribution, as it avoids the need for manual feature engineering and provides reliable uncertainty estimates.

One potential limitation of the study is that it focuses on a single reaction, the CO2 reduction reaction. While this is an important application, it would be valuable to see how the approach performs on a broader range of catalyst discovery challenges, such as chemical reaction optimization or cancer screening. Additionally, the paper does not provide a detailed discussion of the computational cost and scalability of the BO framework, which could be an important consideration for real-world applications.

Another area that could be explored further is the interpretability and explainability of the UPNet model. While the paper mentions that the approach enables interpretable feature extraction, more details on how the model's predictions can be understood and related to the underlying chemistry would be valuable for building trust in the approach.

Conclusion

This study presents a novel high-throughput computational catalyst screening approach that combines DFT simulations and Bayesian Optimization. The key innovation is the development of the UPNet model, which can automatically learn useful representations from complex catalyst structures and provide reliable uncertainty estimates. This approach has demonstrated significant improvements in the efficiency of catalyst discovery, reducing the required computational resources by 90% compared to traditional methods.

The potential impact of this work is substantial, as it could accelerate the development of high-performance catalysts for a wide range of applications, from sustainable energy conversion to improved human health. The incorporation of multiple evaluation criteria and the ability to navigate high-dimensional design spaces are particularly valuable features that could unlock new possibilities in catalyst discovery and optimization.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

$ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback$

ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback

Henry W. Sprueill, Carl Edwards, Khushbu Agarwal, Mariefel V. Olarte, Udishnu Sanyal, Conrad Johnston, Hongbin Liu, Heng Ji, Sutanay Choudhury

The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent actively searches for highly effective catalysts via the iterative combination of large language model (LLM)-derived hypotheses and atomistic graph neural network (GNN)-derived feedback. Identified catalysts in intermediate search steps undergo structural evaluation based on spatial orientation, reaction pathways, and stability. Scoring functions based on adsorption energies and reaction energy barriers steer the exploration in the LLM's knowledge space toward energetically favorable, high-efficiency catalysts. We introduce planning methods that automatically guide the exploration without human input, providing competitive performance against expert-enumerated chemical descriptor-based implementations. By integrating language-guided reasoning with computational chemistry feedback, our work pioneers AI-accelerated, trustworthy catalyst discovery.

6/10/2024

cs.AI cs.CE cs.LG

🤿

Lightweight Geometric Deep Learning for Molecular Modelling in Catalyst Discovery

Patrick Geitner

New technology for energy storage is necessary for the large-scale adoption of renewable energy sources like wind and solar. The ability to discover suitable catalysts is crucial for making energy storage more cost-effective and scalable. The Open Catalyst Project aims to apply advances in graph neural networks (GNNs) to accelerate progress in catalyst discovery, replacing Density Functional Theory-based (DFT) approaches that are computationally burdensome. Current approaches involve scaling GNNs to over 1 billion parameters, pushing the problem out of reach for a vast majority of machine learning practitioner around the world. This study aims to evaluate the performance and insights gained from using more lightweight approaches for this task that are more approachable for smaller teams to encourage participation from individuals from diverse backgrounds. By implementing robust design patterns like geometric and symmetric message passing, we were able to train a GNN model that reached a MAE of 0.0748 in predicting the per-atom forces of adsorbate-surface interactions, rivaling established model architectures like SchNet and DimeNet++ while using only a fraction of trainable parameters.

4/17/2024

cs.LG

Diagnosing and fixing common problems in Bayesian optimization for molecule design

Austin Tripp, Jos'e Miguel Hern'andez-Lobato

Bayesian optimization (BO) is a principled approach to molecular design tasks. In this paper we explain three pitfalls of BO which can cause poor empirical performance: an incorrect prior width, over-smoothing, and inadequate acquisition function maximization. We show that with these issues addressed, even a basic BO setup is able to achieve the highest overall performance on the PMO benchmark for molecule design (Gao et al, 2022). These results suggest that BO may benefit from more attention in the machine learning for molecules community.

6/13/2024

cs.LG stat.ML

MALIBO: Meta-learning for Likelihood-free Bayesian Optimization

Jiarong Pan, Stefan Falkner, Felix Berkenkamp, Joaquin Vanschoren

Bayesian optimization (BO) is a popular method to optimize costly black-box functions. While traditional BO optimizes each new target task from scratch, meta-learning has emerged as a way to leverage knowledge from related tasks to optimize new tasks faster. However, existing meta-learning BO methods rely on surrogate models that suffer from scalability issues and are sensitive to observations with different scales and noise types across tasks. Moreover, they often overlook the uncertainty associated with task similarity. This leads to unreliable task adaptation when only limited observations are obtained or when the new tasks differ significantly from the related tasks. To address these limitations, we propose a novel meta-learning BO approach that bypasses the surrogate model and directly learns the utility of queries across tasks. Our method explicitly models task uncertainty and includes an auxiliary model to enable robust adaptation to new tasks. Extensive experiments show that our method demonstrates strong anytime performance and outperforms state-of-the-art meta-learning BO methods in various benchmarks.

7/1/2024

cs.LG stat.ML