Reliable edge machine learning hardware for scientific applications

Tommaso Baldi (Fermilab, University of Pisa), Javier Campos (Fermilab), Ben Hawks (Fermilab), Jennifer Ngadiuba (Fermilab), Nhan Tran (Fermilab), Daniel Diaz (UC San Diego), Javier Duarte (UC San Diego), Ryan Kastner (UC San Diego), Andres Meza (UC San Diego), Melissa Quinnan (UC San Diego), Olivia Weng (UC San Diego), Caleb Geniesse (UC Berkeley/LBNL/ICSI), Amir Gholami (UC Berkeley/LBNL/ICSI), Michael W. Mahoney (UC Berkeley/LBNL/ICSI), Vladimir Loncar (MIT), Philip Harris (MIT), Joshua Agar (Drexel University), Shuyu Qin (Drexel University)

Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing. This leads to unique validation challenges for VLSI implementations of ML algorithms: enabling bit-accurate functional simulations for performance validation in experimental software frameworks, verifying those ML models are robust under extreme quantization and pruning, and enabling ultra-fine-grained model inspection for efficient fault tolerance. We discuss approaches to developing and validating reliable algorithms at the scientific edge under such strict latency, resource, power, and area requirements in extreme experimental environments. We study metrics for developing robust algorithms, present preliminary results and mitigation strategies, and conclude with an outlook of these and future directions of research towards the longer-term goal of developing autonomous scientific experimentation methods for accelerated scientific discovery.

7/1/2024

👨‍🏫

Controlling Chaos Using Edge Computing Hardware

Robert M. Kent, Wendson A. S. Barbosa, Daniel J. Gauthier

Machine learning provides a data-driven approach for creating a digital twin of a system - a digital model used to predict the system behavior. Having an accurate digital twin can drive many applications, such as controlling autonomous systems. Often the size, weight, and power consumption of the digital twin or related controller must be minimized, ideally realized on embedded computing hardware that can operate without a cloud-computing connection. Here, we show that a nonlinear controller based on next-generation reservoir computing can tackle a difficult control problem: controlling a chaotic system to an arbitrary time-dependent state. The model is accurate, yet it is small enough to be evaluated on a field-programmable gate array typically found in embedded devices. Furthermore, the model only requires 25.0 $pm$ 7.0 nJ per evaluation, well below other algorithms, even without systematic power optimization. Our work represents the first step in deploying efficient machine learning algorithms to the computing edge.

6/21/2024

Ultrafast jet classification on FPGAs for the HL-LHC

Patrick Odagiu, Zhiqiang Que, Javier Duarte, Johannes Haller, Gregor Kasieczka, Artur Lobanov, Vladimir Loncar, Wayne Luk, Jennifer Ngadiuba, Maurizio Pierini, Philipp Rincke, Arpita Seksaria, Sioni Summers, Andre Sznajder, Alexander Tapper, Thea K. Aarrestad

Three machine learning models are used to perform jet origin classification. These models are optimized for deployment on a field-programmable gate array device. In this context, we demonstrate how latency and resource consumption scale with the input size and choice of algorithm. Moreover, the models proposed here are designed to work on the type of data and under the foreseen conditions at the CERN LHC during its high-luminosity phase. Through quantization-aware training and efficient synthetization for a specific field programmable gate array, we show that $O(100)$ ns inference of complex architectures such as Deep Sets and Interaction Networks is feasible at a relatively low computational resource cost.

7/8/2024

🏷️

Investigating Resource-efficient Neutron/Gamma Classification ML Models Targeting eFPGAs

Jyothisraj Johnson, Billy Boxer, Tarun Prakash, Carl Grace, Peter Sorensen, Mani Tripathi

There has been considerable interest and resulting progress in implementing machine learning (ML) models in hardware over the last several years from the particle and nuclear physics communities. A big driver has been the release of the Python package, hls4ml, which has enabled porting models specified and trained using Python ML libraries to register transfer level (RTL) code. So far, the primary end targets have been commercial FPGAs or synthesized custom blocks on ASICs. However, recent developments in open-source embedded FPGA (eFPGA) frameworks now provide an alternate, more flexible pathway for implementing ML models in hardware. These customized eFPGA fabrics can be integrated as part of an overall chip design. In general, the decision between a fully custom, eFPGA, or commercial FPGA ML implementation will depend on the details of the end-use application. In this work, we explored the parameter space for eFPGA implementations of fully-connected neural network (fcNN) and boosted decision tree (BDT) models using the task of neutron/gamma classification with a specific focus on resource efficiency. We used data collected using an AmBe sealed source incident on Stilbene, which was optically coupled to an OnSemi J-series SiPM to generate training and test data for this study. We investigated relevant input features and the effects of bit-resolution and sampling rate as well as trade-offs in hyperparameters for both ML architectures while tracking total resource usage. The performance metric used to track model performance was the calculated neutron efficiency at a gamma leakage of 10$^{-3}$. The results of the study will be used to aid the specification of an eFPGA fabric, which will be integrated as part of a test chip.

7/25/2024

Reliable edge machine learning hardware for scientific applications

Overview

Plain English Explanation

Technical Explanation

Critical Analysis

Conclusion

Related Papers