Lightweight Geometric Deep Learning for Molecular Modelling in Catalyst Discovery

2404.10003

Published 4/17/2024 by Patrick Geitner

🤿

Abstract

New technology for energy storage is necessary for the large-scale adoption of renewable energy sources like wind and solar. The ability to discover suitable catalysts is crucial for making energy storage more cost-effective and scalable. The Open Catalyst Project aims to apply advances in graph neural networks (GNNs) to accelerate progress in catalyst discovery, replacing Density Functional Theory-based (DFT) approaches that are computationally burdensome. Current approaches involve scaling GNNs to over 1 billion parameters, pushing the problem out of reach for a vast majority of machine learning practitioner around the world. This study aims to evaluate the performance and insights gained from using more lightweight approaches for this task that are more approachable for smaller teams to encourage participation from individuals from diverse backgrounds. By implementing robust design patterns like geometric and symmetric message passing, we were able to train a GNN model that reached a MAE of 0.0748 in predicting the per-atom forces of adsorbate-surface interactions, rivaling established model architectures like SchNet and DimeNet++ while using only a fraction of trainable parameters.

Create account to get full access

Overview

Renewable energy sources like wind and solar require efficient energy storage for large-scale adoption.
Discovering suitable catalysts is crucial for making energy storage more cost-effective and scalable.
The Open Catalyst Project aims to apply advances in graph neural networks (GNNs) to accelerate progress in catalyst discovery, replacing computationally burdensome Density Functional Theory-based (DFT) approaches.
Current approaches involve scaling GNNs to over 1 billion parameters, which is out of reach for many machine learning practitioners.
This study evaluates the performance and insights gained from using more lightweight GNN approaches that are more approachable for smaller teams, encouraging participation from diverse backgrounds.

Plain English Explanation

Renewable energy sources like wind and solar power are crucial for a sustainable future, but they rely on effective energy storage to be widely adopted. Discovering the right catalysts, which are substances that speed up chemical reactions, is key to making energy storage more affordable and scalable.

The Open Catalyst Project is using a type of machine learning called graph neural networks (GNNs) to accelerate the search for these important catalysts. GNNs are powerful tools for analyzing the complex interactions between atoms and molecules, which is essential for predicting how potential catalysts will perform.

However, the current state-of-the-art GNN models are very large, requiring over 1 billion parameters to be trained. This makes them difficult for most researchers and engineers to use, as it requires a lot of computing power and expertise.

This study explores using more compact, "lightweight" GNN models that are easier for smaller teams and individuals to work with. By designing the models carefully, the researchers were able to create a GNN that performs nearly as well as the massive, complex models, but with a fraction of the parameters. This makes the technology more accessible and encourages participation from people with diverse backgrounds, which is crucial for driving innovation in this important area.

Technical Explanation

The researchers in this study aimed to develop a more lightweight graph neural network (GNN) model for predicting the interactions between adsorbates (molecules or atoms that attach to a surface) and catalytic surfaces. This is a crucial step in discovering new catalysts for energy storage applications.

Current state-of-the-art GNN models for this task, such as SchNet and DimeNet++, require over 1 billion trainable parameters. This makes them computationally expensive and inaccessible to many researchers and engineers.

To address this, the researchers implemented robust design patterns like geometric and symmetric message passing in their GNN model. This allowed them to achieve a mean absolute error (MAE) of 0.0748 in predicting the per-atom forces of adsorbate-surface interactions, rivaling the performance of the much larger models while using a fraction of the trainable parameters.

Critical Analysis

The researchers acknowledge that their lightweight GNN model, while highly performant, may not be able to capture the full complexity of adsorbate-surface interactions compared to the more parameter-intensive models. They suggest that future work could explore ways to increase the model capacity without dramatically increasing the number of trainable parameters.

Additionally, the researchers note that their evaluation was limited to a specific set of adsorbate-surface systems, and further testing would be needed to assess the model's generalizability to a wider range of catalytic materials and reactions.

While the researchers have made significant progress in making GNN-based catalyst discovery more accessible, it's important to recognize that this is an active area of research, and there may be other approaches or techniques that could further improve the performance and scalability of these models.

Conclusion

This study demonstrates that it is possible to develop highly efficient GNN models for predicting adsorbate-surface interactions, a crucial step in catalyst discovery for energy storage. By focusing on robust design patterns and keeping the model size manageable, the researchers have created a solution that is more accessible to a wider range of researchers and engineers, potentially accelerating progress in this important field. As the development of renewable energy technologies continues, advances in catalyst discovery through approaches like this will be essential for making energy storage more cost-effective and scalable.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🧠

Hybrid Quantum Graph Neural Network for Molecular Property Prediction

Michael Vitz, Hamed Mohammadbagherpoor, Samarth Sandeep, Andrew Vlasic, Richard Padbury, Anh Pham

To accelerate the process of materials design, materials science has increasingly used data driven techniques to extract information from collected data. Specially, machine learning (ML) algorithms, which span the ML discipline, have demonstrated ability to predict various properties of materials with the level of accuracy similar to explicit calculation of quantum mechanical theories, but with significantly reduced run time and computational resources. Within ML, graph neural networks have emerged as an important algorithm within the field of machine learning, since they are capable of predicting accurately a wide range of important physical, chemical and electronic properties due to their higher learning ability based on the graph representation of material and molecular descriptors through the aggregation of information embedded within the graph. In parallel with the development of state of the art classical machine learning applications, the fusion of quantum computing and machine learning have created a new paradigm where classical machine learning model can be augmented with quantum layers which are able to encode high dimensional data more efficiently. Leveraging the structure of existing algorithms, we developed a unique and novel gradient free hybrid quantum classical convoluted graph neural network (HyQCGNN) to predict formation energies of perovskite materials. The performance of our hybrid statistical model is competitive with the results obtained purely from a classical convoluted graph neural network, and other classical machine learning algorithms, such as XGBoost. Consequently, our study suggests a new pathway to explore how quantum feature encoding and parametric quantum circuits can yield drastic improvements of complex ML algorithm like graph neural network.

5/9/2024

cs.LG

On the Scalability of GNNs for Molecular Graphs

Maciej Sypetkowski, Frederik Wenkel, Farimah Poursafaei, Nia Dickson, Karush Suri, Philip Fradkin, Dominique Beaini

Scaling deep learning models has been at the heart of recent revolutions in language modelling and image generation. Practitioners have observed a strong relationship between model size, dataset size, and performance. However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. Specifically, we analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs. For the first time, we observe that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets. We further demonstrate strong finetuning scaling behavior on 38 highly competitive downstream tasks, outclassing previous large models. This gives rise to MolGPS, a new graph foundation model that allows to navigate the chemical space, outperforming the previous state-of-the-arts on 26 out the 38 downstream tasks. We hope that our work paves the way for an era where foundational GNNs drive pharmaceutical drug discovery.

5/3/2024

cs.LG

🛠️

Adaptive Catalyst Discovery Using Multicriteria Bayesian Optimization with Representation Learning

Jie Chen, Pengfei Ou, Yuxin Chang, Hengrui Zhang, Xiao-Yan Li, Edward H. Sargent, Wei Chen

High-performance catalysts are crucial for sustainable energy conversion and human health. However, the discovery of catalysts faces challenges due to the absence of efficient approaches to navigating vast and high-dimensional structure and composition spaces. In this study, we propose a high-throughput computational catalyst screening approach integrating density functional theory (DFT) and Bayesian Optimization (BO). Within the BO framework, we propose an uncertainty-aware atomistic machine learning model, UPNet, which enables automated representation learning directly from high-dimensional catalyst structures and achieves principled uncertainty quantification. Utilizing a constrained expected improvement acquisition function, our BO framework simultaneously considers multiple evaluation criteria. Using the proposed methods, we explore catalyst discovery for the CO2 reduction reaction. The results demonstrate that our approach achieves high prediction accuracy, facilitates interpretable feature extraction, and enables multicriteria design optimization, leading to significant reduction of computing power and time (10x reduction of required DFT calculations) in high-performance catalyst discovery.

4/22/2024

cs.LG cs.CE

$ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback$

ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback

Henry W. Sprueill, Carl Edwards, Khushbu Agarwal, Mariefel V. Olarte, Udishnu Sanyal, Conrad Johnston, Hongbin Liu, Heng Ji, Sutanay Choudhury

The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent actively searches for highly effective catalysts via the iterative combination of large language model (LLM)-derived hypotheses and atomistic graph neural network (GNN)-derived feedback. Identified catalysts in intermediate search steps undergo structural evaluation based on spatial orientation, reaction pathways, and stability. Scoring functions based on adsorption energies and reaction energy barriers steer the exploration in the LLM's knowledge space toward energetically favorable, high-efficiency catalysts. We introduce planning methods that automatically guide the exploration without human input, providing competitive performance against expert-enumerated chemical descriptor-based implementations. By integrating language-guided reasoning with computational chemistry feedback, our work pioneers AI-accelerated, trustworthy catalyst discovery.

6/10/2024

cs.AI cs.CE cs.LG