SWANN: Shuffling Weights in Crossbar Arrays for Enhanced DNN Accuracy in Deeply Scaled Technologies

Read original: arXiv:2406.14706 - Published 6/24/2024 by Jeffry Victor, Dong Eun Kim, Chunguang Wang, Kaushik Roy, Sumeet Gupta

🎯

Overview

Researchers propose a technique called SWANN to address the problem of interconnect resistance impairing the robustness of in-memory computing (IMC) in deep neural network (DNN) accelerators using crossbar arrays.
SWANN involves shuffling weights in crossbar arrays to alleviate the detrimental effect of wire resistance on IMC.
The technique is evaluated on 8T-SRAM-based 128x128 crossbar arrays in 7nm technology, showing an increase in accuracy for ResNet-20/CIFAR-10 from 47.78% to 83.5%.
SWANN can be used in conjunction with Partial-Word-LineActivation to further boost accuracy.
The impact of SWANN is also evaluated for compact ferroelectric-transistor-based crossbar arrays.
SWANN incurs minimal hardware overhead, with less than a 1% increase in energy consumption and around 1% latency and 16% area overheads.

Plain English Explanation

Deep neural networks (DNNs) are powerful machine learning models that can excel at tasks like image recognition and natural language processing. To run these models efficiently, researchers have developed specialized hardware called DNN accelerators. These accelerators often use crossbar arrays, which are grid-like structures that can perform in-memory computing – a way of doing calculations directly within the memory, rather than moving data back and forth.

However, as these crossbar arrays are made smaller and more compact using advanced manufacturing techniques, the wires connecting the different components can start to interfere with each other, leading to a drop in the accuracy of the DNN. This is a significant problem that needs to be solved.

The researchers propose a technique called SWANN (Shuffling Weights in Analog Neural Networks) to address this issue. The key idea behind SWANN is to rearrange, or "shuffle," the weights (the parameters that determine how the DNN model behaves) within the crossbar array in a specific way. This shuffling helps to counteract the negative effects of the wire resistance, ultimately improving the accuracy of the DNN.

The researchers show that SWANN can significantly boost the accuracy of a ResNet-20 DNN model running on CIFAR-10 data, from 47.78% to 83.5%. They also demonstrate that SWANN can be used together with another technique called Partial-Word-LineActivation to further improve the performance.

Importantly, SWANN achieves these benefits with minimal additional hardware overhead, requiring less than a 1% increase in energy consumption and around 1% in latency and 16% in area. This makes it a practical and efficient solution for improving the performance of DNN accelerators.

Technical Explanation

The researchers focus on the problem of interconnect resistance severely impairing the robustness of in-memory computing (IMC) in deep neural network (DNN) accelerators that employ crossbar arrays. In deeply scaled technologies, the high resistance of the interconnects between the components in the crossbar array can significantly degrade the accuracy of the DNN computations performed within the array.

To address this issue, the researchers propose a technique called SWANN (Shuffling Weights in Analog Neural Networks). SWANN involves rearranging the weights (the parameters that determine the behavior of the DNN model) within the crossbar array in a specific way. This weight shuffling helps to counteract the negative effects of the wire resistance, ultimately improving the accuracy of the DNN.

The researchers evaluate SWANN on 8T-SRAM-based 128x128 crossbar arrays in 7nm technology. For a ResNet-20 DNN model running on the CIFAR-10 dataset, SWANN enhances the accuracy from 47.78% to 83.5%. The researchers also demonstrate that SWANN can be used synergistically with another technique called Partial-Word-LineActivation, further boosting the accuracy.

Additionally, the researchers evaluate the implications of SWANN for compact ferroelectric-transistor-based crossbar arrays. They find that SWANN incurs minimal hardware overhead, with less than a 1% increase in energy consumption and around 1% latency and 16% area overheads when 1 ADC (Analog-to-Digital Converter) is utilized per crossbar array.

Critical Analysis

The researchers have presented a promising technique, SWANN, to address the significant problem of interconnect resistance degrading the performance of in-memory computing in DNN accelerators. The key strength of SWANN is its ability to effectively mitigate the detrimental effects of wire resistance without incurring substantial hardware overhead.

However, the paper does not provide a detailed analysis of the limitations or potential downsides of the SWANN approach. For instance, it would be valuable to understand the impact of SWANN on other important metrics, such as power consumption or chip area, beyond the specific cases presented.

Additionally, the researchers focus primarily on evaluating SWANN on 8T-SRAM-based and ferroelectric-transistor-based crossbar arrays. It would be interesting to see how SWANN performs in the context of other crossbar array architectures, such as those using symmetric silicon microring resonator optical crossbar arrays or tiny shared block efficient DNN deployment techniques.

Furthermore, the paper does not discuss the potential implications of SWANN on the training process or the overall system design of DNN accelerators. It would be valuable to understand how SWANN might interact with other techniques, such as quantized spiking neural networks or measurement-driven neural network training, and how it could be integrated into a broader system-level optimization.

Despite these potential areas for further research, the SWANN technique represents a significant contribution to the field of DNN accelerators, demonstrating the ability to effectively mitigate the challenges posed by interconnect resistance in deeply scaled technologies.

Conclusion

The researchers have proposed a novel technique called SWANN (Shuffling Weights in Analog Neural Networks) to address the problem of interconnect resistance impairing the robustness of in-memory computing in deep neural network (DNN) accelerators using crossbar arrays. SWANN involves rearranging the weights within the crossbar array to counteract the negative effects of wire resistance, leading to a substantial improvement in DNN accuracy.

The authors have demonstrated the effectiveness of SWANN on 8T-SRAM-based and ferroelectric-transistor-based crossbar arrays, showing significant accuracy gains with minimal hardware overhead. The ability to integrate SWANN with other techniques, such as Partial-Word-LineActivation, further enhances its potential for improving the performance of DNN accelerators.

The SWANN approach represents an important advancement in the field of DNN hardware design, addressing a crucial challenge posed by the scaling of interconnect technologies. As the demand for efficient and high-performing DNN accelerators continues to grow, techniques like SWANN will be essential for enabling the deployment of these powerful models in real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🎯

SWANN: Shuffling Weights in Crossbar Arrays for Enhanced DNN Accuracy in Deeply Scaled Technologies

Jeffry Victor, Dong Eun Kim, Chunguang Wang, Kaushik Roy, Sumeet Gupta

Deep neural network (DNN) accelerators employing crossbar arrays capable of in-memory computing (IMC) are highly promising for neural computing platforms. However, in deeply scaled technologies, interconnect resistance severely impairs IMC robustness, leading to a drop in the system accuracy. To address this problem, we propose SWANN - a technique based on shuffling weights in crossbar arrays which alleviates the detrimental effect of wire resistance on IMC. For 8T-SRAM-based 128x128 crossbar arrays in 7nm technology, SWANN enhances the accuracy from 47.78% to 83.5% for ResNet-20/CIFAR-10. We also show that SWANN can be used synergistically with Partial-Word-LineActivation, further boosting the accuracy. Moreover, we evaluate the implications of SWANN for compact ferroelectric-transistorbased crossbar arrays. SWANN incurs minimal hardware overhead, with less than a 1% increase in energy consumption. Additionally, the latency and area overheads of SWANN are ~1% and ~16%, respectively when 1 ADC is utilized per crossbar array.

6/24/2024

🎯

Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends

Jeffry Victor, Chunguang Wang, Sumeet K. Gupta

Crossbar memory arrays have been touted as the workhorse of in-memory computing (IMC)-based acceleration of Deep Neural Networks (DNNs), but the associated hardware non-idealities limit their efficacy. To address this, cross-layer design solutions that reduce the impact of hardware non-idealities on DNN accuracy are needed. In Part 1 of this paper, we established the co-optimization strategies for various memory technologies and their crossbar arrays, and conducted a comparative technology evaluation in the context of IMC robustness. In this part, we analyze various design knobs such as array size and bit-slice (number of bits per device) and their impact on the performance of 8T SRAM, ferroelectric transistor (FeFET), Resistive RAM (ReRAM) and spin-orbit-torque magnetic RAM (SOT-MRAM) in the context of inference accuracy at 7nm technology node. Further, we study the effect of circuit design solutions such as Partial Wordline Activation (PWA) and custom ADC reference levels that reduce the hardware non-idealities and comparatively analyze the response of each technology to such accuracy enhancing techniques. Our results on ResNet-20 (with CIFAR-10) show that PWA increases accuracy by up to 32.56% while custom ADC reference levels yield up to 31.62% accuracy enhancement. We observe that compared to the other technologies, FeFET, by virtue of its small layout height and high distinguishability of its memory states, is best suited for large arrays. For higher bit-slices and a more complex dataset (ResNet-50 with Cifar-100) we found that ReRAM matches the performance of FeFET.

8/13/2024

StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators

Ethan G Rogers, Sohan Salahuddin Mugdho, Kshemal Kshemendra Gupte, Cheng Wang

Crossbar-based in-memory computing (IMC) has emerged as a promising platform for hardware acceleration of deep neural networks (DNNs). However, the energy and latency of IMC systems are dominated by the large overhead of the peripheral analog-to-digital converters (ADCs). To address such ADC bottleneck, here we propose to implement stochastic processing of array-level partial sums (PS) for efficient IMC. Leveraging the probabilistic switching of spin-orbit torque magnetic tunnel junctions, the proposed PS processing eliminates the costly ADC, achieving significant improvement in energy and area efficiency. To mitigate accuracy loss, we develop PS-quantization-aware training that enables backward propagation across stochastic PS. Furthermore, a novel scheme with an inhomogeneous sampling length of the stochastic conversion is proposed. When running ResNet20 on the CIFAR-10 dataset, our architecture-to-algorithm co-design demonstrates up to 22x, 30x, and 142x improvement in energy, latency, and area, respectively, compared to IMC with standard ADC. Our optimized design configuration using stochastic PS achieved 666x (111x) improvement in Energy-Delay-Product compared to IMC with full precision ADC (sparse low-bit ADC), while maintaining near-software accuracy at various benchmark classification tasks.

7/18/2024

Measurement-driven neural-network training for integrated magnetic tunnel junction arrays

William A. Borders, Advait Madhavan, Matthew W. Daniels, Vasileia Georgiou, Martin Lueker-Boden, Tiffany S. Santos, Patrick M. Braganca, Mark D. Stiles, Jabez J. McClelland, Brian D. Hoskins

The increasing scale of neural networks needed to support more complex applications has led to an increasing requirement for area- and energy-efficient hardware. One route to meeting the budget for these applications is to circumvent the von Neumann bottleneck by performing computation in or near memory. An inevitability of transferring neural networks onto hardware is that non-idealities such as device-to-device variations or poor device yield impact performance. Methods such as hardware-aware training, where substrate non-idealities are incorporated during network training, are one way to recover performance at the cost of solution generality. In this work, we demonstrate inference on hardware neural networks consisting of 20,000 magnetic tunnel junction arrays integrated on a complementary metal-oxide-semiconductor chips that closely resembles market-ready spin transfer-torque magnetoresistive random access memory technology. Using 36 dies, each containing a crossbar array with its own non-idealities, we show that even a small number of defects in physically mapped networks significantly degrades the performance of networks trained without defects and show that, at the cost of generality, hardware-aware training accounting for specific defects on each die can recover to comparable performance with ideal networks. We then demonstrate a robust training method that extends hardware-aware training to statistics-aware training, producing network weights that perform well on most defective dies regardless of their specific defect locations. When evaluated on the 36 physical dies, statistics-aware trained solutions can achieve a mean misclassification error on the MNIST dataset that differs from the software-baseline by only 2 %. This statistics-aware training method could be generalized to networks with many layers that are mapped to hardware suited for industry-ready applications.

5/15/2024