GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures

Read original: arXiv:2404.16208 - Published 4/26/2024 by Sahil Hassan, Michael Inouye, Miguel C. Gonzalez, Ilkin Aliyev, Joshua Mack, Maisha Hafiz, Ali Akoglu

🐍

Overview

Open-source simulation tools are crucial for neuromorphic application engineers and hardware architects to investigate performance bottlenecks and explore design optimizations before committing to silicon.
RANC is a tool that enables executing pre-trained Spiking Neural Network (SNN) models through both software-based simulation and FPGA-based emulation.
RANC provides a flexible and highly parameterized design to study implementation bottlenecks, tune architectural parameters, and modify neuron behavior based on application insights.

Plain English Explanation

Designing architectures for neuromorphic computing involves a vast number of configuration parameters, such as weight precision, neuron and axon counts, network topology, and neuron behavior. RANC is an open-source simulation tool that helps engineers and researchers explore these design choices before committing to building the actual hardware.

RANC allows users to run simulations of pre-trained Spiking Neural Networks (SNNs) in software and also test them on FPGA hardware. This flexibility enables studying the trade-offs between hardware performance and network accuracy. Engineers can use RANC to identify bottlenecks in their designs, tune architectural parameters, and even modify the behavior of individual neurons based on the needs of their application.

Exploring the vast design space of neuromorphic computing can be time-consuming, so the researchers have now developed a GPU-based implementation of RANC to dramatically speed up the simulation process. This allows for more rapid exploration and convergence on optimized neuromorphic architectures.

Technical Explanation

The researchers have introduced a GPU-based implementation of the RANC simulation tool to accelerate the exploration of neuromorphic architectures. RANC is designed to execute pre-trained Spiking Neural Network (SNN) models within a unified ecosystem through both software-based simulation and FPGA-based emulation.

The researchers describe their parallelization approach to leverage the GPU's processing power for tick-accurate SNN simulations. They quantify the speedup gains achieved with the GPU-based RANC implementation across various use cases, demonstrating up to 780 times speedup compared to the serial version of the simulator.

The evaluation was performed on a 512 neuromorphic core MNIST inference application, showcasing the significant performance improvements enabled by the GPU-accelerated RANC simulator. This GPU-based implementation provides a much more efficient and feasible avenue for researchers to explore different optimizations for accelerating SNNs and perform richer studies on neuromorphic architectures.

Critical Analysis

The paper effectively demonstrates the value of the GPU-accelerated RANC simulator in accelerating the exploration of neuromorphic architectures. The researchers have provided a thorough evaluation of the speedup gains across various use cases, which is a strength of the work.

However, the paper does not delve into the potential limitations or caveats of the GPU-based approach. For example, it would be useful to understand the memory requirements and potential bottlenecks when scaling the simulations to larger network sizes or more complex applications.

Additionally, the paper does not address the potential impact of the GPU-based RANC simulator on the accuracy or fidelity of the SNN simulations compared to the original serial version. It would be valuable to understand any trade-offs between simulation speed and simulation accuracy.

Overall, the research presented in this paper is a significant contribution to the field of neuromorphic computing, as it provides a powerful tool in RANC to accelerate the design exploration process. However, further investigation into the limitations and potential areas for improvement would strengthen the work and provide a more comprehensive understanding of the GPU-accelerated RANC simulator.

Conclusion

The provided paper introduces a GPU-based implementation of the RANC simulation tool, which plays a crucial role in the development of neuromorphic computing applications. By leveraging the parallel processing power of GPUs, the researchers have achieved significant speedup gains in executing tick-accurate Spiking Neural Network (SNN) simulations, enabling faster exploration of the vast design space of neuromorphic architectures.

The GPU-accelerated RANC simulator allows neuromorphic application engineers and hardware architects to more efficiently investigate performance bottlenecks, tune architectural parameters, and modify neuron behavior based on application insights. This accelerated design space exploration can lead to the rapid convergence on optimized neuromorphic architectures, advancing the field of neuromorphic computing and its real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🐍

GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures

Sahil Hassan, Michael Inouye, Miguel C. Gonzalez, Ilkin Aliyev, Joshua Mack, Maisha Hafiz, Ali Akoglu

Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks and explore design optimizations before committing to silicon. Reconfigurable Architecture for Neuromorphic Computing (RANC) is one such tool that offers ability to execute pre-trained Spiking Neural Network (SNN) models within a unified ecosystem through both software-based simulation and FPGA-based emulation. RANC has been utilized by the community with its flexible and highly parameterized design to study implementation bottlenecks, tune architectural parameters or modify neuron behavior based on application insights and study the trade space on hardware performance and network accuracy. In designing architectures for use in neuromorphic computing, there are an incredibly large number of configuration parameters such as number and precision of weights per neuron, neuron and axon counts per core, network topology, and neuron behavior. To accelerate such studies and provide users with a streamlined productive design space exploration, in this paper we introduce the GPU-based implementation of RANC. We summarize our parallelization approach and quantify the speedup gains achieved with GPU-based tick-accurate simulations across various use cases. We demonstrate up to 780 times speedup compared to serial version of the RANC simulator based on a 512 neuromorphic core MNIST inference application. We believe that the RANC ecosystem now provides a much more feasible avenue in the research of exploring different optimizations for accelerating SNNs and performing richer studies by enabling rapid convergence to optimized neuromorphic architectures.

4/26/2024

A Realistic Simulation Framework for Analog/Digital Neuromorphic Architectures

Fernando M. Quintana, Maryada, Pedro L. Galindo, Elisa Donati, Giacomo Indiveri, Fernando Perez-Pe~na

Developing dedicated neuromorphic computing platforms optimized for embedded or edge-computing applications requires time-consuming design, fabrication, and deployment of full-custom neuromorphic processors.bTo ensure that initial prototyping efforts, exploring the properties of different network architectures and parameter settings, lead to realistic results it is important to use simulation frameworks that match as best as possible the properties of the final hardware. This is particularly challenging for neuromorphic hardware platforms made using mixed-signal analog/digital circuits, due to the variability and noise sensitivity of their components. In this paper, we address this challenge by developing a software spiking neural network simulator explicitly designed to account for the properties of mixed-signal neuromorphic circuits, including device mismatch variability. The simulator, called ARCANA (A Realistic Simulation Framework for Analog/Digital Neuromorphic Architectures), is designed to reproduce the dynamics of mixed-signal synapse and neuron electronic circuits with autogradient differentiation for parameter optimization and GPU acceleration. We demonstrate the effectiveness of this approach by matching software simulation results with measurements made from an existing neuromorphic processor. We show how the results obtained provide a reliable estimate of the behavior of the spiking neural network trained in software, once deployed in hardware. This framework enables the development and innovation of new learning rules and processing architectures in neuromorphic embedded systems.

9/24/2024

Fast Algorithms for Spiking Neural Network Simulation with FPGAs

Bjorn A. Lindqvist, Artur Podobas

Using OpenCL-based high-level synthesis, we create a number of spiking neural network (SNN) simulators for the Potjans-Diesmann cortical microcircuit for a high-end Field-Programmable Gate Array (FPGA). Our best simulators simulate the circuit 25% faster than real-time, require less than 21 nJ per synaptic event, and are bottle-necked by the device's on-chip memory. Speed-wise they compare favorably to the state-of-the-art GPU-based simulators and their energy usage is lower than any other published result. This result is the first for simulating the circuit on a single hardware accelerator. We also extensively analyze the techniques and algorithms we implement our simulators with, many of which can be realized on other types of hardware. Thus, this article is of interest to any researcher or practitioner interested in efficient SNN simulation, whether they target FPGAs or not.

5/6/2024

NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator

Kaustubh Shivdikar, Nicolas Bohm Agostini, Malith Jayaweera, Gilbert Jonatan, Jose L. Abellan, Ajay Joshi, John Kim, David Kaeli

Graph Neural Networks (GNNs) are emerging as a formidable tool for processing non-euclidean data across various domains, ranging from social network analysis to bioinformatics. Despite their effectiveness, their adoption has not been pervasive because of scalability challenges associated with large-scale graph datasets, particularly when leveraging message passing. To tackle these challenges, we introduce NeuraChip, a novel GNN spatial accelerator based on Gustavson's algorithm. NeuraChip decouples the multiplication and addition computations in sparse matrix multiplication. This separation allows for independent exploitation of their unique data dependencies, facilitating efficient resource allocation. We introduce a rolling eviction strategy to mitigate data idling in on-chip memory as well as address the prevalent issue of memory bloat in sparse graph computations. Furthermore, the compute resource load balancing is achieved through a dynamic reseeding hash-based mapping, ensuring uniform utilization of computing resources agnostic of sparsity patterns. Finally, we present NeuraSim, an open-source, cycle-accurate, multi-threaded, modular simulator for comprehensive performance analysis. Overall, NeuraChip presents a significant improvement, yielding an average speedup of 22.1x over Intel's MKL, 17.1x over NVIDIA's cuSPARSE, 16.7x over AMD's hipSPARSE, and 1.5x over prior state-of-the-art SpGEMM accelerator and 1.3x over GNN accelerator. The source code for our open-sourced simulator and performance visualizer is publicly accessible on GitHub https://neurachip.us

4/30/2024