SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

Read original: arXiv:2302.13939 - Published 7/12/2024 by Rui-Jie Zhu, Qihang Zhao, Guoqi Li, Jason K. Eshraghian

💬

Overview

As the size of large language models continues to grow, so does the computational resources required to run them.
Spiking Neural Networks (SNNs) offer an energy-efficient approach to deep learning by using sparse and event-driven activations to reduce computational overhead.
SNNs have become competitive with non-spiking models on computer vision tasks, but have proven more challenging to train, resulting in performance lags compared to modern deep learning.
The effectiveness of SNNs in language generation has yet to be fully explored.

Plain English Explanation

Large language models, which are AI systems that can understand and generate human-like text, require a lot of computing power to run. Spiking Neural Networks (SNNs) offer a potential solution by using a different type of "neuron" that is more energy-efficient. These neurons only "fire" (activate) when they need to, rather than constantly running like in traditional neural networks.

While SNNs have shown promising results in computer vision tasks, they have been more difficult to train effectively. This has meant their performance hasn't quite caught up to modern deep learning models. Researchers are still exploring how well SNNs can work for language generation tasks, like writing text.

In this paper, the authors take inspiration from the RWKV language model and develop a new SNN-based language model called SpikeGPT. They trained two versions of SpikeGPT, one with 45 million parameters and one with 216 million parameters, making it the largest SNN language model trained to date.

The key innovation is that the authors modified the standard transformer architecture to use a more efficient attention mechanism. This allows SpikeGPT to process input tokens sequentially, like a typical SNN, while maintaining competitive performance with non-spiking models.

Technical Explanation

The authors were inspired by the RWKV language model and developed SpikeGPT, a generative language model that uses binary, event-driven spiking activation units. They trained two versions of the model, one with 45 million parameters and one with 216 million parameters, making SpikeGPT the largest backpropagation-trained SNN model to date.

To achieve this, the authors modified the standard transformer architecture to replace the multi-head self-attention mechanism with a more efficient approach. Instead of the quadratic computational complexity (O(N^2)) of typical attention, their approach has linear complexity (O(N)) as the sequence length increases. This allows input tokens to be streamed in sequentially, as is typical for SNNs.

The authors' preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while using 20 times fewer operations when processed on neuromorphic hardware that can leverage the sparse, event-driven activations of the SNN architecture.

Critical Analysis

The authors demonstrate that it is possible to train large-scale SNN language models that can compete with traditional deep learning approaches. This is an important step forward, as SNNs offer the potential for significant energy savings when deployed on specialized neuromorphic hardware.

However, the authors acknowledge that SNN models are still more challenging to train than non-spiking models, and their performance still lags behind the current state-of-the-art. Further research is needed to improve the training and performance of SNN language models, as well as to explore their suitability for a wider range of natural language processing tasks beyond just generation.

Additionally, the paper does not provide a detailed analysis of the energy efficiency benefits of SpikeGPT compared to non-spiking models. More work is needed to quantify the real-world energy savings and practical deployment considerations of SNN-based language models.

Conclusion

In this paper, the authors have made an important contribution by developing SpikeGPT, the largest backpropagation-trained SNN language model to date. By modifying the transformer architecture to use a more efficient attention mechanism, they have demonstrated that SNN-based language models can achieve competitive performance with traditional deep learning approaches.

The potential energy efficiency benefits of SNN models, if they can be further developed and deployed, could have significant implications for the deployment of large language models in real-world applications, particularly on resource-constrained devices. [As the field of Spike-based Computation continues to advance, we may see more SNN-based models emerge as viable alternatives to traditional deep learning for natural language processing and beyond](https://aimodels.fyi/papers/arxiv/spikelm-towards-general-spike-driven-language-modeling).

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

Rui-Jie Zhu, Qihang Zhao, Guoqi Li, Jason K. Eshraghian

As the size of large language models continue to scale, so does the computational resources required to run it. Spiking Neural Networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the Receptance Weighted Key Value (RWKV) language model, we successfully implement `SpikeGPT', a generative language model with binary, event-driven spiking activation units. We train the proposed model on two model variants: 45M and 216M parameters. To the best of our knowledge, SpikeGPT is the largest backpropagation-trained SNN model to date, rendering it suitable for both the generation and comprehension of natural language. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity O(N^2) to linear complexity O(N) with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 20x fewer operations when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at https://github.com/ridgerchu/SpikeGPT.

7/12/2024

Spiking Convolutional Neural Networks for Text Classification

Changze Lv, Jianhan Xu, Xiaoqing Zheng

Spiking neural networks (SNNs) offer a promising pathway to implement deep neural networks (DNNs) in a more energy-efficient manner since their neurons are sparsely activated and inferences are event-driven. However, there have been very few works that have demonstrated the efficacy of SNNs in language tasks partially because it is non-trivial to represent words in the forms of spikes and to deal with variable-length texts by SNNs. This work presents a conversion + fine-tuning two-step method for training SNNs for text classification and proposes a simple but effective way to encode pre-trained word embeddings as spike trains. We show empirically that after fine-tuning with surrogate gradients, the converted SNNs achieve comparable results to their DNN counterparts with much less energy consumption across multiple datasets for both English and Chinese. We also show that such SNNs are more robust to adversarial attacks than DNNs.

6/28/2024

SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms

Xingrun Xing, Zheng Zhang, Ziyi Ni, Shitao Xiao, Yiming Ju, Siqi Fan, Yequan Wang, Jiajun Zhang, Guoqi Li

Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models. However, the binary spikes in existing SNNs fail to encode adequate semantic information, placing technological challenges for generalization. This work proposes the first fully spiking mechanism for general language tasks, including both discriminative and generative ones. Different from previous spikes with {0,1} levels, we propose a more general spike formulation with bi-directional, elastic amplitude, and elastic frequency encoding, while still maintaining the addition nature of SNNs. In a single time step, the spike is enhanced by direction and amplitude information; in spike frequency, a strategy to control spike firing rate is well designed. We plug this elastic bi-spiking mechanism in language modeling, named SpikeLM. It is the first time to handle general language tasks with fully spike-driven models, which achieve much higher accuracy than previously possible. SpikeLM also greatly bridges the performance gap between SNNs and ANNs in language modeling. Our code is available at https://github.com/Xingrun-Xing/SpikeLM.

6/6/2024

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Xingrun Xing, Boyan Gao, Zheng Zhang, David A. Clifton, Shitao Xiao, Li Du, Guoqi Li, Jiajun Zhang

The recent advancements in large language models (LLMs) with billions of parameters have significantly boosted their performance across various real-world applications. However, the inference processes for these models require substantial energy and computational resources, presenting considerable deployment challenges. In contrast, human brains, which contain approximately 86 billion biological neurons, exhibit significantly greater energy efficiency compared to LLMs with a similar number of parameters. Inspired by this, we redesign 7 to 70 billion parameter LLMs using bio-plausible spiking mechanisms, emulating the efficient behavior of the human brain. We propose the first spiking large language model as recent LLMs termed SpikeLLM. Coupled with the proposed model, a novel spike-driven quantization framework named Optimal Brain Spiking is introduced to reduce the energy cost and accelerate inference speed via two essential approaches: first (second)-order differentiation-based salient channel detection, and per-channel salient outlier expansion with Generalized Integrate-and-Fire neurons. Our proposed spike-driven quantization can plug in main streams of quantization training methods. In the OmniQuant pipeline, SpikeLLM significantly reduces 25.51% WikiText2 perplexity and improves 3.08% average accuracy of 6 zero-shot datasets on a LLAMA2-7B 4A4W model. In the GPTQ pipeline, SpikeLLM realizes a sparse ternary quantization, which achieves additive in all linear layers. Compared with PB-LLM with similar operations, SpikeLLM also exceeds significantly. We will release our code on GitHub.

7/9/2024