StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators

Read original: arXiv:2407.12378 - Published 7/18/2024 by Ethan G Rogers, Sohan Salahuddin Mugdho, Kshemal Kshemendra Gupte, Cheng Wang

StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators

Overview

The paper introduces StoX-Net, a novel approach for efficient in-memory computing of deep neural networks (DNNs) using stochastic processing of partial sums.
StoX-Net aims to address the limitations of existing in-memory computing techniques for DNN accelerators, such as SWANN: Shuffling Weights in Crossbar Arrays for Enhanced DNN and Analog or Digital? Memory Computing Benchmarking Through Neural Network Inference.
The proposed architecture leverages stochastic computing to perform efficient matrix-vector multiplications required for DNN inference, leading to significant improvements in energy efficiency and hardware utilization.

Plain English Explanation

The paper introduces a new approach called StoX-Net that aims to make deep neural network (DNN) computations more efficient on specialized hardware. DNNs are complex algorithms that require a lot of mathematical operations, and running them on regular computers can be slow and power-hungry.

StoX-Net tackles this problem by using a technique called "stochastic computing". Instead of performing precise calculations, it approximates the results in a probabilistic way. This allows the hardware to be simpler and more energy-efficient, while still producing reasonably accurate results.

The key idea is to break down the DNN computations into smaller "partial sums" and process them stochastically. This approach takes advantage of the inherent resilience of DNNs to small errors, allowing for significant improvements in speed and power consumption compared to traditional methods.

By using this stochastic processing approach, the StoX-Net architecture can be implemented more efficiently in specialized hardware, such as Transverse Read-Assisted Valid Bit Collection to Improve In-Memory Computing or 65nm 8b Activation 8b Weight SRAM-Based In-Memory Computing for DNN Acceleration. This could lead to more powerful and energy-efficient DNN accelerators that can be deployed in a wider range of applications, from smartphones to data centers.

Technical Explanation

The StoX-Net approach leverages the concept of stochastic computing to perform efficient matrix-vector multiplications required for DNN inference. In traditional DNN accelerators, these computations are often realized using analog in-memory computing techniques, as described in Analog or Digital? Memory Computing Benchmarking Through Neural Network Inference.

StoX-Net, on the other hand, breaks down the matrix-vector multiplication into smaller "partial sums" and processes them stochastically. This allows for a more efficient hardware implementation, as the stochastic nature of the computations enables the use of simpler circuit designs and reduced precision requirements.

The key components of the StoX-Net architecture include:

Stochastic bit-stream generation: The input activations and weights are converted into stochastic bit-streams, which serve as the inputs to the partial sum computations.
Partial sum computation: The partial sums are computed using stochastic logic gates, leveraging the inherent resilience of DNNs to small errors.
Partial sum aggregation: The partial sums are aggregated using a tree-based structure to obtain the final result.

The authors demonstrate the effectiveness of StoX-Net through both simulation and hardware implementation, showing significant improvements in energy efficiency and hardware utilization compared to existing in-memory computing techniques for DNN acceleration.

Critical Analysis

The authors of the paper have provided a novel and promising approach to improving the efficiency of DNN computations using in-memory computing techniques. The key strength of StoX-Net is its ability to leverage the inherent error resilience of DNNs to enable more efficient hardware implementation through stochastic processing.

However, the paper does not address some potential limitations and areas for further research. For example, the impact of the stochastic nature of the computations on the overall accuracy of the DNN model is not thoroughly explored. Additionally, the scalability of the StoX-Net architecture to larger and more complex DNN models could be an area of concern that merits further investigation.

It would also be valuable to see a more comprehensive comparison of StoX-Net's performance and energy efficiency against other state-of-the-art DNN accelerators, such as those discussed in Multi-Objective Neural Architecture Search for Memory-Centric Computing, to better understand its relative strengths and weaknesses.

Conclusion

The StoX-Net approach presented in this paper offers a promising solution for improving the efficiency of DNN computations through stochastic in-memory processing. By leveraging the inherent error resilience of DNNs, the authors have demonstrated significant improvements in energy efficiency and hardware utilization compared to existing techniques.

The potential impact of StoX-Net could be transformative, as more efficient DNN accelerators could enable the deployment of advanced AI models in a wider range of applications, from edge devices to large-scale data centers. However, further research is needed to address the limitations and explore the scalability of the proposed architecture to ensure its long-term viability and widespread adoption.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators

Ethan G Rogers, Sohan Salahuddin Mugdho, Kshemal Kshemendra Gupte, Cheng Wang

Crossbar-based in-memory computing (IMC) has emerged as a promising platform for hardware acceleration of deep neural networks (DNNs). However, the energy and latency of IMC systems are dominated by the large overhead of the peripheral analog-to-digital converters (ADCs). To address such ADC bottleneck, here we propose to implement stochastic processing of array-level partial sums (PS) for efficient IMC. Leveraging the probabilistic switching of spin-orbit torque magnetic tunnel junctions, the proposed PS processing eliminates the costly ADC, achieving significant improvement in energy and area efficiency. To mitigate accuracy loss, we develop PS-quantization-aware training that enables backward propagation across stochastic PS. Furthermore, a novel scheme with an inhomogeneous sampling length of the stochastic conversion is proposed. When running ResNet20 on the CIFAR-10 dataset, our architecture-to-algorithm co-design demonstrates up to 22x, 30x, and 142x improvement in energy, latency, and area, respectively, compared to IMC with standard ADC. Our optimized design configuration using stochastic PS achieved 666x (111x) improvement in Energy-Delay-Product compared to IMC with full precision ADC (sparse low-bit ADC), while maintaining near-software accuracy at various benchmark classification tasks.

7/18/2024

🤿

Approximate ADCs for In-Memory Computing

Arkapravo Ghosh, Hemkar Reddy Sadana, Mukut Debnath, Panthadip Maji, Shubham Negi, Sumeet Gupta, Mrigank Sharad, Kaushik Roy

In memory computing (IMC) architectures for deep learning (DL) accelerators leverage energy-efficient and highly parallel matrix vector multiplication (MVM) operations, implemented directly in memory arrays. Such IMC designs have been explored based on CMOS as well as emerging non-volatile memory (NVM) technologies like RRAM. IMC architectures generally involve a large number of cores consisting of memory arrays, storing the trained weights of the DL model. Peripheral units like DACs and ADCs are also used for applying inputs and reading out the output values. Recently reported designs reveal that the ADCs required for reading out the MVM results, consume more than 85% of the total compute power and also dominate the area, thereby eschewing the benefits of the IMC scheme. Mitigation of imperfections in the ADCs, namely, non-linearity and variations, incur significant design overheads, due to dedicated calibration units. In this work we present peripheral aware design of IMC cores, to mitigate such overheads. It involves incorporating the non-idealities of ADCs in the training of the DL models, along with that of the memory units. The proposed approach applies equally well to both current mode as well as charge mode MVM operations demonstrated in recent years., and can significantly simplify the design of mixed-signal IMC units.

8/14/2024

🎯

SWANN: Shuffling Weights in Crossbar Arrays for Enhanced DNN Accuracy in Deeply Scaled Technologies

Jeffry Victor, Dong Eun Kim, Chunguang Wang, Kaushik Roy, Sumeet Gupta

Deep neural network (DNN) accelerators employing crossbar arrays capable of in-memory computing (IMC) are highly promising for neural computing platforms. However, in deeply scaled technologies, interconnect resistance severely impairs IMC robustness, leading to a drop in the system accuracy. To address this problem, we propose SWANN - a technique based on shuffling weights in crossbar arrays which alleviates the detrimental effect of wire resistance on IMC. For 8T-SRAM-based 128x128 crossbar arrays in 7nm technology, SWANN enhances the accuracy from 47.78% to 83.5% for ResNet-20/CIFAR-10. We also show that SWANN can be used synergistically with Partial-Word-LineActivation, further boosting the accuracy. Moreover, we evaluate the implications of SWANN for compact ferroelectric-transistorbased crossbar arrays. SWANN incurs minimal hardware overhead, with less than a 1% increase in energy consumption. Additionally, the latency and area overheads of SWANN are ~1% and ~16%, respectively when 1 ADC is utilized per crossbar array.

9/19/2024

🎯

Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends

Jeffry Victor, Chunguang Wang, Sumeet K. Gupta

Crossbar memory arrays have been touted as the workhorse of in-memory computing (IMC)-based acceleration of Deep Neural Networks (DNNs), but the associated hardware non-idealities limit their efficacy. To address this, cross-layer design solutions that reduce the impact of hardware non-idealities on DNN accuracy are needed. In Part 1 of this paper, we established the co-optimization strategies for various memory technologies and their crossbar arrays, and conducted a comparative technology evaluation in the context of IMC robustness. In this part, we analyze various design knobs such as array size and bit-slice (number of bits per device) and their impact on the performance of 8T SRAM, ferroelectric transistor (FeFET), Resistive RAM (ReRAM) and spin-orbit-torque magnetic RAM (SOT-MRAM) in the context of inference accuracy at 7nm technology node. Further, we study the effect of circuit design solutions such as Partial Wordline Activation (PWA) and custom ADC reference levels that reduce the hardware non-idealities and comparatively analyze the response of each technology to such accuracy enhancing techniques. Our results on ResNet-20 (with CIFAR-10) show that PWA increases accuracy by up to 32.56% while custom ADC reference levels yield up to 31.62% accuracy enhancement. We observe that compared to the other technologies, FeFET, by virtue of its small layout height and high distinguishability of its memory states, is best suited for large arrays. For higher bit-slices and a more complex dataset (ResNet-50 with Cifar-100) we found that ReRAM matches the performance of FeFET.

8/13/2024