Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System

Read original: arXiv:2405.07898 - Published 5/14/2024 by Kylee Santos, Stan Moore, Tomas Oppelstrup, Amirali Sharifian, Ilya Sharapov, Aidan Thompson, Delyan Z Kalchev, Danny Perez, Robert Schreiber, Scott Pakin and 4 others

🔎

Overview

Molecular dynamics (MD) simulations are widely used in materials science, computational chemistry, biophysics, and drug design
However, even on powerful supercomputers, these simulations can take an excessive amount of time to run
This paper demonstrates a significant performance improvement for MD simulations using the Cerebras Wafer-Scale Engine, a specialized hardware platform

Plain English Explanation

Molecular dynamics (MD) simulations are computer models that simulate the behavior of materials at the atomic scale. These simulations have been instrumental in advancing fields like materials science, chemistry, and biology by giving researchers a detailed look at what's happening at the nanoscale.

However, even with the most powerful supercomputers available today, these MD simulations can take a very long time to run - sometimes years for the systems and time scales that scientists are most interested in studying. This limits the usefulness of the simulations, as researchers may not have the time or resources to explore all the phenomena they'd like to.

In this paper, the researchers show how they were able to dramatically speed up MD simulations by running them on a specialized hardware platform called the Cerebras Wafer-Scale Engine. By dedicating a separate processing core to each simulated atom, they were able to achieve a 179-fold improvement in the number of simulation steps they could perform per second, compared to a state-of-the-art GPU-based supercomputer.

This massive performance boost means that processes that would have taken years to simulate can now be done in just a couple of days. This unlocks entirely new realms of scientific discovery, as researchers will be able to study slow-moving transformations in materials that were previously inaccessible. The researchers' dataflow algorithm was able to run these simulations at over 270,000 steps per second for systems with up to 800,000 atoms - an unprecedented level of performance for general-purpose computing hardware.

Technical Explanation

The researchers used the Cerebras Wafer-Scale Engine, a specialized hardware platform designed for highly parallel workloads, to run their MD simulations. By dedicating a single processor core to each simulated atom, they were able to achieve a 179-fold improvement in the number of simulation timesteps per second compared to a state-of-the-art GPU-based Exascale platform.

This performance boost was enabled by the Cerebras system's unique architecture, which avoids many of the bottlenecks that limit the scalability of traditional CPU and GPU-based systems for these types of simulations. The researchers' dataflow algorithm was able to run Embedded Atom Method (EAM) simulations at over 270,000 timesteps per second for systems with up to 800,000 atoms.

Reducing the runtime of these MD simulations from years down to just a couple of days unlocks new avenues of scientific discovery. Researchers will now be able to study slow-moving microstructure transformation processes in materials that were previously inaccessible due to the prohibitive computational cost.

Critical Analysis

While the performance improvements demonstrated in this paper are truly impressive, the researchers do note a few caveats and areas for further exploration. For example, the simulations were limited to the Embedded Atom Method (EAM) potential, which may not capture all the relevant physics for certain materials or applications.

Additionally, the researchers only tested their system on relatively small-scale simulations (up to 800,000 atoms). It remains to be seen whether the same level of performance and scalability can be achieved for much larger systems, which are often necessary to accurately model real-world materials and phenomena.

Further research would also be needed to explore the energy efficiency and cost-effectiveness of the Cerebras platform compared to traditional CPU and GPU-based systems, especially as the computational demands of MD simulations continue to grow. Scalable quantum detector tomography by high performance techniques may also be an interesting area to investigate in the future.

Overall, this paper demonstrates an exciting leap forward in the capabilities of MD simulations, which could have far-reaching implications for materials science, chemistry, and beyond. However, as with any new technology, there are still some open questions and areas for improvement that will need to be explored.

Conclusion

This paper presents a significant advancement in the performance of molecular dynamics (MD) simulations, a critical tool for understanding materials and processes at the nanoscale. By running these simulations on the specialized Cerebras Wafer-Scale Engine, the researchers were able to achieve a 179-fold improvement in simulation throughput compared to state-of-the-art GPU-based supercomputers.

This massive performance boost means that researchers will now be able to study slow-moving microstructure transformation processes in materials that were previously inaccessible due to the prohibitive computational cost. The researchers' dataflow algorithm was able to run Embedded Atom Method (EAM) simulations at over 270,000 timesteps per second for systems with up to 800,000 atoms - an unprecedented level of performance for general-purpose computing hardware.

While the paper highlights some caveats and areas for further research, the overall impact of this work could be transformative for fields like materials science, computational chemistry, and drug design, by unlocking new realms of scientific discovery and understanding.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System

Kylee Santos, Stan Moore, Tomas Oppelstrup, Amirali Sharifian, Ilya Sharapov, Aidan Thompson, Delyan Z Kalchev, Danny Perez, Robert Schreiber, Scott Pakin, Edgar A Leon, James H Laros III, Michael James, Sivasankaran Rajamanickam

Molecular dynamics (MD) simulations have transformed our understanding of the nanoscale, driving breakthroughs in materials science, computational chemistry, and several other fields, including biophysics and drug design. Even on exascale supercomputers, however, runtimes are excessive for systems and timescales of scientific interest. Here, we demonstrate strong scaling of MD simulations on the Cerebras Wafer-Scale Engine. By dedicating a processor core for each simulated atom, we demonstrate a 179-fold improvement in timesteps per second versus the Frontier GPU-based Exascale platform, along with a large improvement in timesteps per unit energy. Reducing every year of runtime to two days unlocks currently inaccessible timescales of slow microstructure transformation processes that are critical for understanding material behavior and function. Our dataflow algorithm runs Embedded Atom Method (EAM) simulations at rates over 270,000 timesteps per second for problems with up to 800k atoms. This demonstrated performance is unprecedented for general-purpose processing cores.

5/14/2024

Achieving Energetic Superiority Through System-Level Quantum Circuit Simulation

Rong Fu, Zhongling Su, Han-Sen Zhong, Xiti Zhao, Jianyang Zhang, Feng Pan, Pan Zhang, Xianhe Zhao, Ming-Cheng Chen, Chao-Yang Lu, Jian-Wei Pan, Zhiling Pei, Xingcheng Zhang, Wanli Ouyang

Quantum Computational Superiority boasts rapid computation and high energy efficiency. Despite recent advances in classical algorithms aimed at refuting the milestone claim of Google's sycamore, challenges remain in generating uncorrelated samples of random quantum circuits. In this paper, we present a groundbreaking large-scale system technology that leverages optimization on global, node, and device levels to achieve unprecedented scalability for tensor networks. This enables the handling of large-scale tensor networks with memory capacities reaching tens of terabytes, surpassing memory space constraints on a single node. Our techniques enable accommodating large-scale tensor networks with up to tens of terabytes of memory, reaching up to 2304 GPUs with a peak computing power of 561 PFLOPS half-precision. Notably, we have achieved a time-to-solution of 14.22 seconds with energy consumption of 2.39 kWh which achieved fidelity of 0.002 and our most remarkable result is a time-to-solution of 17.18 seconds, with energy consumption of only 0.29 kWh which achieved a XEB of 0.002 after post-processing, outperforming Google's quantum processor Sycamore in both speed and energy efficiency, which recorded 600 seconds and 4.3 kWh, respectively.

7/2/2024

Trackable Agent-based Evolution Models at Wafer Scale

Matthew Andres Moreno, Connor Yang, Emily Dolson, Luis Zaman

Continuing improvements in computing hardware are poised to transform capabilities for in silico modeling of cross-scale phenomena underlying major open questions in evolutionary biology and artificial life, such as transitions in individuality, eco-evolutionary dynamics, and rare evolutionary events. Emerging ML/AI-oriented hardware accelerators, like the 850,000 processor Cerebras Wafer Scale Engine (WSE), hold particular promise. However, practical challenges remain in conducting informative evolution experiments that efficiently utilize these platforms' large processor counts. Here, we focus on the problem of extracting phylogenetic information from agent-based evolution on the WSE platform. This goal drove significant refinements to decentralized in silico phylogenetic tracking, reported here. These improvements yield order-of-magnitude performance improvements. We also present an asynchronous island-based genetic algorithm (GA) framework for WSE hardware. Emulated and on-hardware GA benchmarks with a simple tracking-enabled agent model clock upwards of 1 million generations a minute for population sizes reaching 16 million agents. We validate phylogenetic reconstructions from these trials and demonstrate their suitability for inference of underlying evolutionary conditions. In particular, we demonstrate extraction, from wafer-scale simulation, of clear phylometric signals that differentiate runs with adaptive dynamics enabled versus disabled. Together, these benchmark and validation trials reflect strong potential for highly scalable agent-based evolution simulation that is both efficient and observable. Developed capabilities will bring entirely new classes of previously intractable research questions within reach, benefiting further explorations within the evolutionary biology and artificial life communities across a variety of emerging high-performance computing platforms.

6/4/2024

Experience and Analysis of Scalable High-Fidelity Computational Fluid Dynamics on Modular Supercomputing Architectures

Martin Karp, Estela Suarez, Jan H. Meinke, M{aa}ns I. Andersson, Philipp Schlatter, Stefano Markidis, Niclas Jansson

The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high-fidelity CFD using the spectral element method can exploit the modular supercomputing architecture at scale through domain partitioning, where the computational domain is split between a Booster module powered by GPUs and a Cluster module with conventional CPU nodes. We investigate several different flow cases and computer systems based on the modular supercomputing architecture (MSA). We observe that for our simulations, the communication overhead and load balancing issues incurred by incorporating different computing architectures are seldom worthwhile, especially when I/O is also considered, but when the simulation at hand requires more than the combined global memory on the GPUs, utilizing additional CPUs to increase the available memory can be fruitful. We support our results with a simple performance model to assess when running across modules might be beneficial. As MSA is becoming more widespread and efforts to increase system utilization are growing more important our results give insight into when and how a monolithic application can utilize and spread out to more than one module and obtain a faster time to solution.

5/10/2024