An Asynchronous Multi-core Accelerator for SNN inference

Read original: arXiv:2407.20947 - Published 7/31/2024 by Zhuo Chen, De Ma, Xiaofei Jin, Qinghui Xing, Ouwen Jin, Xin Du, Shuibing He, Gang Pan

An Asynchronous Multi-core Accelerator for SNN inference

Overview

The paper presents an asynchronous multi-core accelerator for spiking neural network (SNN) inference.
The accelerator is designed to efficiently execute SNN workloads by leveraging the inherent asynchronous nature of SNNs.
The architecture incorporates several novel techniques to improve performance and energy efficiency.

Plain English Explanation

The paper describes a new type of hardware that can run a specific kind of artificial intelligence called a spiking neural network (SNN) more info. SNNs are different from the more common artificial neural networks (ANNs) in that they use "spikes" of activity rather than continuous signals.

The researchers have designed a multi-core accelerator that can run SNN workloads very efficiently. This means the hardware is specialized to execute SNN computations much faster and with less power than a general-purpose computer. The key innovation is that the accelerator is asynchronous, which means the different parts of the system operate independently without a central clock coordinating them.

This asynchronous design matches well with the inherent asynchronous nature of SNNs, where neurons fire spikes independently. By avoiding the overhead of a global clock, the accelerator can achieve higher performance and energy efficiency compared to synchronous designs. The paper also describes other techniques the researchers used to further optimize the accelerator more info.

Technical Explanation

The paper presents an asynchronous multi-core accelerator for executing spiking neural network (SNN) inference workloads more info. The key innovation is the asynchronous design, which matches the inherent asynchronous nature of SNNs.

The accelerator architecture consists of multiple independent processing cores that operate asynchronously without a global clock. Each core is responsible for executing the computations of a subset of the neurons in the SNN. The asynchronous design avoids the overhead of a centralized clock, allowing the cores to independently process spikes as they arrive.

To further improve performance and efficiency, the accelerator incorporates several techniques:

Dynamic voltage and frequency scaling (DVFS): The frequency and voltage of each core is scaled independently based on the workload to minimize power consumption.
Spike event-driven execution: The cores only perform computations when they receive input spikes, avoiding unnecessary work.
Distributed spike routing: Spikes are routed between cores using a distributed, low-latency network-on-chip (NoC) rather than a centralized arbiter.

The researchers evaluated the accelerator using several SNN benchmarks and found significant improvements in throughput, latency, and energy efficiency compared to synchronous designs and GPU-based approaches.

Critical Analysis

The paper provides a comprehensive and technically detailed description of the asynchronous multi-core accelerator. The key strengths of the design are the asynchronous, event-driven execution and the use of DVFS to optimize power consumption.

One potential limitation is the complexity of the overall system, which may make it challenging to design and verify. The distributed spike routing mechanism, in particular, could introduce additional design challenges. The authors acknowledge this and suggest further research on simplified spike routing schemes.

Additionally, the paper does not provide much insight into the practical applicability of the accelerator. While the performance and efficiency improvements are significant, the authors do not discuss the feasibility of integrating the accelerator into real-world SNN-based systems more info.

Further research could explore the scalability of the design, the programmability for different SNN models, and the ease of integration with existing SNN-based applications more info.

Conclusion

The paper presents a novel asynchronous multi-core accelerator for efficient SNN inference. The key innovations are the asynchronous, event-driven execution model and the use of techniques like DVFS to optimize power consumption.

The results demonstrate significant improvements in throughput, latency, and energy efficiency compared to synchronous designs and GPU-based approaches. While the complexity of the system may pose some challenges, the paper's contributions represent an important step towards the practical deployment of SNNs in real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

An Asynchronous Multi-core Accelerator for SNN inference

Zhuo Chen, De Ma, Xiaofei Jin, Qinghui Xing, Ouwen Jin, Xin Du, Shuibing He, Gang Pan

Spiking Neural Networks (SNNs) are extensively utilized in brain-inspired computing and neuroscience research. To enhance the speed and energy efficiency of SNNs, several many-core accelerators have been developed. However, maintaining the accuracy of SNNs often necessitates frequent explicit synchronization among all cores, which presents a challenge to overall efficiency. In this paper, we propose an asynchronous architecture for Spiking Neural Networks (SNNs) that eliminates the need for inter-core synchronization, thus enhancing speed and energy efficiency. This approach leverages the pre-determined dependencies of neuromorphic cores established during compilation. Each core is equipped with a scheduler that monitors the status of its dependencies, allowing it to safely advance to the next timestep without waiting for other cores. This eliminates the necessity for global synchronization and minimizes core waiting time despite inherent workload imbalances. Comprehensive evaluations using five different SNN workloads show that our architecture achieves a 1.86x speedup and a 1.55x increase in energy efficiency compared to state-of-the-art synchronization architectures.

7/31/2024

Overcoming the Limitations of Layer Synchronization in Spiking Neural Networks

Roel Koopman, Amirreza Yousefzadeh, Mahyar Shahsavari, Guangzhi Tang, Manolis Sifalakis

Currently, neural-network processing in machine learning applications relies on layer synchronization, whereby neurons in a layer aggregate incoming currents from all neurons in the preceding layer, before evaluating their activation function. This is practiced even in artificial Spiking Neural Networks (SNNs), which are touted as consistent with neurobiology, in spite of processing in the brain being, in fact asynchronous. A truly asynchronous system however would allow all neurons to evaluate concurrently their threshold and emit spikes upon receiving any presynaptic current. Omitting layer synchronization is potentially beneficial, for latency and energy efficiency, but asynchronous execution of models previously trained with layer synchronization may entail a mismatch in network dynamics and performance. We present a study that documents and quantifies this problem in three datasets on our simulation environment that implements network asynchrony, and we show that models trained with layer synchronization either perform sub-optimally in absence of the synchronization, or they will fail to benefit from any energy and latency reduction, when such a mechanism is in place. We then make ends meet and address the problem with unlayered backprop, a novel backpropagation-based training method, for learning models suitable for asynchronous processing. We train with it models that use different neuron execution scheduling strategies, and we show that although their neurons are more reactive, these models consistently exhibit lower overall spike density (up to 50%), reach a correct decision faster (up to 2x) without integrating all spikes, and achieve superior accuracy (up to 10% higher). Our findings suggest that asynchronous event-based (neuromorphic) AI computing is indeed more efficient, but we need to seriously rethink how we train our SNN models, to benefit from it.

8/12/2024

SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling

Sai Sanjeet, Bibhu Datta Sahoo, Keshab K. Parhi

Spiking Neural Networks (SNNs) have gained popularity due to their high energy efficiency. Prior works have proposed various methods for training SNNs, including backpropagation-based methods. Training SNNs is computationally expensive compared to their conventional counterparts and would benefit from multiprocessor hardware acceleration. This is the first paper to propose inter-layer pipelining to accelerate training in SNNs using systolic array-based processors and multiprocessor scheduling. The impact of training using delayed gradients is observed using three networks training on different datasets, showing no degradation for small networks and < 10% degradation for large networks. The mapping of various training tasks of the SNN onto systolic arrays is formulated, and the proposed scheduling method is evaluated on the three networks. The results are compared against standard pipelining algorithms. The results show that the proposed method achieves an average speedup of 1.6X compared to standard pipelining algorithms, with an upwards of 2X improvement in some cases. The incurred communication overhead due to the proposed method is less than 0.5% of the total required communication of training.

6/12/2024

Q-SNNs: Quantized Spiking Neural Networks

Wenjie Wei, Yu Liang, Ammar Belatreche, Yichen Xiao, Honglin Cao, Zhenbang Ren, Guoqing Wang, Malu Zhang, Yang Yang

Brain-inspired Spiking Neural Networks (SNNs) leverage sparse spikes to represent information and process them in an asynchronous event-driven manner, offering an energy-efficient paradigm for the next generation of machine intelligence. However, the current focus within the SNN community prioritizes accuracy optimization through the development of large-scale models, limiting their viability in resource-constrained and low-power edge devices. To address this challenge, we introduce a lightweight and hardware-friendly Quantized SNN (Q-SNN) that applies quantization to both synaptic weights and membrane potentials. By significantly compressing these two key elements, the proposed Q-SNNs substantially reduce both memory usage and computational complexity. Moreover, to prevent the performance degradation caused by this compression, we present a new Weight-Spike Dual Regulation (WS-DR) method inspired by information entropy theory. Experimental evaluations on various datasets, including static and neuromorphic, demonstrate that our Q-SNNs outperform existing methods in terms of both model size and accuracy. These state-of-the-art results in efficiency and efficacy suggest that the proposed method can significantly improve edge intelligent computing.

6/21/2024