Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends

Read original: arXiv:2408.05857 - Published 8/13/2024 by Jeffry Victor, Chunguang Wang, Sumeet K. Gupta

🎯

Overview

Crossbar memory arrays are touted as a key component for in-memory computing (IMC) to accelerate deep neural networks (DNNs).
However, hardware non-idealities limit the effectiveness of crossbar arrays.
This paper analyzes various design choices and circuit solutions to address the impact of hardware non-idealities on DNN accuracy.

Plain English Explanation

Crossbar memory arrays are a type of computer hardware that have been proposed as a way to speed up the training and running of deep neural networks. Deep neural networks are a powerful type of artificial intelligence that can be used for tasks like image recognition and language processing.

The idea behind using crossbar memory arrays for deep neural networks is that the arrays can perform calculations

"in-memory"

, meaning they can do the math required for the neural network right where the data is stored, without having to move the data to a separate processor. This can make the process faster and more efficient.

However, there are some issues with crossbar memory arrays that can limit their effectiveness for deep neural networks. The hardware itself has some "non-idealities" or imperfections that can introduce errors and reduce the accuracy of the neural network.

This paper looks at different ways to address these hardware problems. The researchers explore things like adjusting the size of the crossbar arrays and the number of bits they can store. They also test out some circuit design solutions that can help reduce the impact of the hardware non-idealities.

The goal is to find the right combination of crossbar array design and circuit tweaks that can maximize the accuracy of deep neural networks running on this type of hardware. The results show that some approaches, like a technique called "partial wordline activation," can significantly improve the neural network's accuracy.

Overall, the paper is about finding ways to make crossbar memory arrays work better as an underlying hardware platform for powerful AI systems like deep neural networks.

Technical Explanation

The researchers in this paper explore the use of crossbar memory arrays for in-memory computing (IMC) to accelerate deep neural networks (DNNs). However, they note that hardware non-idealities associated with crossbar arrays can limit their efficacy.

To address this, the paper analyzes various design choices and circuit-level solutions that can reduce the impact of hardware non-idealities on DNN accuracy. Specifically, the researchers examine the effects of array size and bit-slice (number of bits per device) on the performance of different memory technologies, including 8T SRAM, ferroelectric transistor (FeFET), resistive RAM (ReRAM), and spin-orbit-torque magnetic RAM (SOT-MRAM).

The paper also evaluates the impact of circuit design solutions like partial wordline activation (PWA) and custom ADC reference levels, which are techniques to mitigate the effects of hardware non-idealities.

The researchers test these design choices and circuit solutions using the ResNet-20 and ResNet-50 neural network architectures on the CIFAR-10 and CIFAR-100 datasets. Their results show that PWA can increase accuracy by up to 32.56%, while custom ADC reference levels yield up to 31.62% accuracy enhancement.

Furthermore, the paper finds that FeFET technology is best suited for large crossbar arrays due to its small layout height and high distinguishability of memory states. However, for higher bit-slices and more complex datasets, ReRAM is able to match the performance of FeFET.

Critical Analysis

The paper provides a thorough analysis of various design knobs and circuit-level solutions for improving the efficacy of crossbar memory arrays in the context of DNN acceleration. The researchers have conducted a comprehensive comparative evaluation of different memory technologies, which is a strength of the work.

However, the paper does not address some potential limitations of the proposed approaches. For example, the implementation complexity and power/energy trade-offs of the circuit design solutions, such as PWA and custom ADC reference levels, are not discussed. These factors can be important in real-world deployment scenarios.

Additionally, the paper focuses on inference accuracy, but does not explore the implications of the design choices on other performance metrics like latency, throughput, or energy efficiency. These aspects could be important when considering the practical deployment of crossbar-based IMC systems for DNN acceleration.

Further research could also investigate the scalability of the proposed techniques to larger neural network models and more diverse datasets, as well as their robustness to variations in manufacturing or environmental conditions.

Conclusion

This paper presents a comprehensive analysis of design choices and circuit-level solutions to address the hardware non-idealities of crossbar memory arrays in the context of DNN acceleration. The researchers have explored the impact of array size, bit-slice, and various circuit techniques on the inference accuracy of different memory technologies.

The key findings suggest that careful co-optimization of the crossbar array design and circuit-level solutions can significantly improve the robustness of crossbar-based IMC systems for DNN workloads. The paper provides valuable insights for researchers and engineers working on the development of energy-efficient and high-performance AI hardware platforms.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🎯

Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends

Jeffry Victor, Chunguang Wang, Sumeet K. Gupta

Crossbar memory arrays have been touted as the workhorse of in-memory computing (IMC)-based acceleration of Deep Neural Networks (DNNs), but the associated hardware non-idealities limit their efficacy. To address this, cross-layer design solutions that reduce the impact of hardware non-idealities on DNN accuracy are needed. In Part 1 of this paper, we established the co-optimization strategies for various memory technologies and their crossbar arrays, and conducted a comparative technology evaluation in the context of IMC robustness. In this part, we analyze various design knobs such as array size and bit-slice (number of bits per device) and their impact on the performance of 8T SRAM, ferroelectric transistor (FeFET), Resistive RAM (ReRAM) and spin-orbit-torque magnetic RAM (SOT-MRAM) in the context of inference accuracy at 7nm technology node. Further, we study the effect of circuit design solutions such as Partial Wordline Activation (PWA) and custom ADC reference levels that reduce the hardware non-idealities and comparatively analyze the response of each technology to such accuracy enhancing techniques. Our results on ResNet-20 (with CIFAR-10) show that PWA increases accuracy by up to 32.56% while custom ADC reference levels yield up to 31.62% accuracy enhancement. We observe that compared to the other technologies, FeFET, by virtue of its small layout height and high distinguishability of its memory states, is best suited for large arrays. For higher bit-slices and a more complex dataset (ResNet-50 with Cifar-100) we found that ReRAM matches the performance of FeFET.

8/13/2024

🎯

SWANN: Shuffling Weights in Crossbar Arrays for Enhanced DNN Accuracy in Deeply Scaled Technologies

Jeffry Victor, Dong Eun Kim, Chunguang Wang, Kaushik Roy, Sumeet Gupta

Deep neural network (DNN) accelerators employing crossbar arrays capable of in-memory computing (IMC) are highly promising for neural computing platforms. However, in deeply scaled technologies, interconnect resistance severely impairs IMC robustness, leading to a drop in the system accuracy. To address this problem, we propose SWANN - a technique based on shuffling weights in crossbar arrays which alleviates the detrimental effect of wire resistance on IMC. For 8T-SRAM-based 128x128 crossbar arrays in 7nm technology, SWANN enhances the accuracy from 47.78% to 83.5% for ResNet-20/CIFAR-10. We also show that SWANN can be used synergistically with Partial-Word-LineActivation, further boosting the accuracy. Moreover, we evaluate the implications of SWANN for compact ferroelectric-transistorbased crossbar arrays. SWANN incurs minimal hardware overhead, with less than a 1% increase in energy consumption. Additionally, the latency and area overheads of SWANN are ~1% and ~16%, respectively when 1 ADC is utilized per crossbar array.

6/24/2024

🛠️

Energy Efficient Knapsack Optimization Using Probabilistic Memristor Crossbars

Jinzhan Li, Suhas Kumar, Su-in Yi

Constrained optimization underlies crucial societal problems (for instance, stock trading and bandwidth allocation), but is often computationally hard (complexity grows exponentially with problem size). The big-data era urgently demands low-latency and low-energy optimization at the edge, which cannot be handled by digital processors due to their non-parallel von Neumann architecture. Recent efforts using massively parallel hardware (such as memristor crossbars and quantum processors) employing annealing algorithms, while promising, have handled relatively easy and stable problems with sparse or binary representations (such as the max-cut or traveling salesman problems).However, most real-world applications embody three features, which are encoded in the knapsack problem, and cannot be handled by annealing algorithms - dense and non-binary representations, with destabilizing self-feedback. Here we demonstrate a post-digital-hardware-friendly randomized competitive Ising-inspired (RaCI) algorithm performing knapsack optimization, experimentally implemented on a foundry-manufactured CMOS-integrated probabilistic analog memristor crossbar. Our solution outperforms digital and quantum approaches by over 4 orders of magnitude in energy efficiency.

7/8/2024

Measurement-driven neural-network training for integrated magnetic tunnel junction arrays

William A. Borders, Advait Madhavan, Matthew W. Daniels, Vasileia Georgiou, Martin Lueker-Boden, Tiffany S. Santos, Patrick M. Braganca, Mark D. Stiles, Jabez J. McClelland, Brian D. Hoskins

The increasing scale of neural networks needed to support more complex applications has led to an increasing requirement for area- and energy-efficient hardware. One route to meeting the budget for these applications is to circumvent the von Neumann bottleneck by performing computation in or near memory. An inevitability of transferring neural networks onto hardware is that non-idealities such as device-to-device variations or poor device yield impact performance. Methods such as hardware-aware training, where substrate non-idealities are incorporated during network training, are one way to recover performance at the cost of solution generality. In this work, we demonstrate inference on hardware neural networks consisting of 20,000 magnetic tunnel junction arrays integrated on a complementary metal-oxide-semiconductor chips that closely resembles market-ready spin transfer-torque magnetoresistive random access memory technology. Using 36 dies, each containing a crossbar array with its own non-idealities, we show that even a small number of defects in physically mapped networks significantly degrades the performance of networks trained without defects and show that, at the cost of generality, hardware-aware training accounting for specific defects on each die can recover to comparable performance with ideal networks. We then demonstrate a robust training method that extends hardware-aware training to statistics-aware training, producing network weights that perform well on most defective dies regardless of their specific defect locations. When evaluated on the 36 physical dies, statistics-aware trained solutions can achieve a mean misclassification error on the MNIST dataset that differs from the software-baseline by only 2 %. This statistics-aware training method could be generalized to networks with many layers that are mapped to hardware suited for industry-ready applications.

5/15/2024