BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis Combination for Deep Neural Networks

Read original: arXiv:2407.03738 - Published 7/8/2024 by Amro Eldebiky, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ing-Chao Lin, Ulf Schlichtmann, Bing Li

BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis Combination for Deep Neural Networks

Overview

A new in-memory computing approach called BasisN is proposed for deep neural networks
BasisN uses RRAM (resistive random-access memory) devices to perform computations without the need for reprogramming
The key idea is to combine pre-trained basis vectors to approximate the desired weights, avoiding the need for weight retraining

Plain English Explanation

BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis Combination for Deep Neural Networks introduces a new way to perform computations for deep neural networks using a type of memory called RRAM. The main advantage of this approach is that it doesn't require retraining or reprogramming the network weights, which can be a time-consuming and complex process.

The key idea behind BasisN is to use a set of pre-trained "basis" vectors, which are like building blocks that can be combined together to approximate the desired weights for the neural network. This means the network can be run on the RRAM hardware without needing to go through the weight retraining process, which can be challenging and error-prone.

By using this basis combination approach, BasisN aims to make it easier to deploy deep neural networks on specialized hardware like RRAM, which can perform computations directly in memory rather than having to move data back and forth between memory and a separate processor. This can improve the efficiency and speed of the neural network computations.

Technical Explanation

BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis Combination for Deep Neural Networks proposes a new in-memory computing approach called BasisN that uses RRAM devices to perform computations for deep neural networks without the need for weight retraining or reprogramming.

The core concept behind BasisN is to combine a set of pre-trained basis vectors to approximate the desired network weights. These basis vectors act as building blocks that can be added together in different combinations to represent the target weights. By using this basis combination method, the network can be deployed on RRAM hardware without having to go through the complex weight retraining process.

The architecture of BasisN involves first training a set of basis vectors using an optimization process. These basis vectors are then stored in the RRAM arrays. During inference, the desired weights are approximated by linearly combining the basis vectors, and this combination is performed directly in the RRAM memory using its inherent computational capabilities.

The experimental results demonstrate that BasisN can achieve comparable accuracy to a fully retrained network, while significantly reducing the time and effort required for deployment on RRAM hardware. The authors also analyze the tradeoffs between approximation accuracy and the number of basis vectors used.

Critical Analysis

The paper presents a promising approach to addressing the weight retraining challenge in deploying deep neural networks on RRAM-based in-memory computing platforms. The basis combination method seems to be an effective way to avoid the complexity of retraining the entire network.

However, the paper does not explore the impact of this approximation on the overall performance and robustness of the neural network. It would be valuable to understand how the basis vector approximation affects the network's accuracy, inference latency, and power consumption compared to a fully retrained network.

Additionally, the paper focuses on a specific neural network architecture and dataset. It would be helpful to see how the BasisN approach generalizes to a wider range of neural network models and application domains.

Conclusion

BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis Combination for Deep Neural Networks introduces a novel in-memory computing approach that can deploy deep neural networks on RRAM hardware without the need for weight retraining. The key innovation is the use of pre-trained basis vectors that can be combined to approximate the desired weights, avoiding the complex and time-consuming retraining process.

This work represents an important step towards making it easier to deploy deep learning models on specialized hardware like RRAM, which can potentially improve the efficiency and performance of these computations. Further research is needed to fully understand the trade-offs and generalizability of the BasisN approach, but it is a promising direction for advancing the field of in-memory computing for deep neural networks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis Combination for Deep Neural Networks

Amro Eldebiky, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ing-Chao Lin, Ulf Schlichtmann, Bing Li

Deep neural networks (DNNs) have made breakthroughs in various fields including image recognition and language processing. DNNs execute hundreds of millions of multiply-and-accumulate (MAC) operations. To efficiently accelerate such computations, analog in-memory-computing platforms have emerged leveraging emerging devices such as resistive RAM (RRAM). However, such accelerators face the hurdle of being required to have sufficient on-chip crossbars to hold all the weights of a DNN. Otherwise, RRAM cells in the crossbars need to be reprogramed to process further layers, which causes huge time/energy overhead due to the extremely slow writing and verification of the RRAM cells. As a result, it is still not possible to deploy such accelerators to process large-scale DNNs in industry. To address this problem, we propose the BasisN framework to accelerate DNNs on any number of available crossbars without reprogramming. BasisN introduces a novel representation of the kernels in DNN layers as combinations of global basis vectors shared between all layers with quantized coefficients. These basis vectors are written to crossbars only once and used for the computations of all layers with marginal hardware modification. BasisN also provides a novel training approach to enhance computation parallelization with the global basis vectors and optimize the coefficients to construct the kernels. Experimental results demonstrate that cycles per inference and energy-delay product were reduced to below 1% compared with applying reprogramming on crossbars in processing large-scale DNNs such as DenseNet and ResNet on ImageNet and CIFAR100 datasets, while the training and hardware costs are negligible.

7/8/2024

🎯

Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends

Jeffry Victor, Chunguang Wang, Sumeet K. Gupta

Crossbar memory arrays have been touted as the workhorse of in-memory computing (IMC)-based acceleration of Deep Neural Networks (DNNs), but the associated hardware non-idealities limit their efficacy. To address this, cross-layer design solutions that reduce the impact of hardware non-idealities on DNN accuracy are needed. In Part 1 of this paper, we established the co-optimization strategies for various memory technologies and their crossbar arrays, and conducted a comparative technology evaluation in the context of IMC robustness. In this part, we analyze various design knobs such as array size and bit-slice (number of bits per device) and their impact on the performance of 8T SRAM, ferroelectric transistor (FeFET), Resistive RAM (ReRAM) and spin-orbit-torque magnetic RAM (SOT-MRAM) in the context of inference accuracy at 7nm technology node. Further, we study the effect of circuit design solutions such as Partial Wordline Activation (PWA) and custom ADC reference levels that reduce the hardware non-idealities and comparatively analyze the response of each technology to such accuracy enhancing techniques. Our results on ResNet-20 (with CIFAR-10) show that PWA increases accuracy by up to 32.56% while custom ADC reference levels yield up to 31.62% accuracy enhancement. We observe that compared to the other technologies, FeFET, by virtue of its small layout height and high distinguishability of its memory states, is best suited for large arrays. For higher bit-slices and a more complex dataset (ResNet-50 with Cifar-100) we found that ReRAM matches the performance of FeFET.

8/13/2024

A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN

Bojing Li, Duo Zhong, Xiang Chen, Chenchen Liu

Modern Artificial Intelligence (AI) applications are increasingly utilizing multi-tenant deep neural networks (DNNs), which lead to a significant rise in computing complexity and the need for computing parallelism. ReRAM-based processing-in-memory (PIM) computing, with its high density and low power consumption characteristics, holds promising potential for supporting the deployment of multi-tenant DNNs. However, direct deployment of complex multi-tenant DNNs on exsiting ReRAM-based PIM designs poses challenges. Resource contention among different tenants can result in sever under-utilization of on-chip computing resources. Moreover, area-intensive operators and computation-intensive operators require excessively large on-chip areas and long processing times, leading to high overall latency during parallel computing. To address these challenges, we propose a novel ReRAM-based in-memory computing framework that enables efficient deployment of multi-tenant DNNs on ReRAM-based PIM designs. Our approach tackles the resource contention problems by iteratively partitioning the PIM hardware at tenant level. In addition, we construct a fine-grained reconstructed processing pipeline at the operator level to handle area-intensive operators. Compared to the direct deployments on traditional ReRAM-based PIM designs, our proposed PIM computing framework achieves significant improvements in speed (ranges from 1.75x to 60.43x) and energy(up to 1.89x).

8/12/2024

🌐

A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface

Guodong Yin, Mufeng Zhou, Yiming Chen, Wenjun Tang, Zekun Yang, Mingyen Lee, Xirui Du, Jinshan Yue, Jiaxin Liu, Huazhong Yang, Yongpan Liu, Xueqing Li

Performing data-intensive tasks in the von Neumann architecture is challenging to achieve both high performance and power efficiency due to the memory wall bottleneck. Computing-in-memory (CiM) is a promising mitigation approach by enabling parallel in-situ multiply-accumulate (MAC) operations within the memory with support from the peripheral interface and datapath. SRAM-based charge-domain CiM (CD-CiM) has shown its potential of enhanced power efficiency and computing accuracy. However, existing SRAM-based CD-CiM faces scaling challenges to meet the throughput requirement of high-performance multi-bit-quantization applications. This paper presents an SRAM-based high-throughput ReLU-optimized CD-CiM macro. It is capable of completing MAC and ReLU of two signed 8b vectors in one CiM cycle with only one A/D conversion. Along with non-linearity compensation for the analog computing and A/D conversion interfaces, this work achieves 51.2GOPS throughput and 10.3TOPS/W energy efficiency, while showing 88.6% accuracy in the CIFAR-10 dataset.

4/3/2024