Approximate ADCs for In-Memory Computing

Read original: arXiv:2408.06390 - Published 8/14/2024 by Arkapravo Ghosh, Hemkar Reddy Sadana, Mukut Debnath, Panthadip Maji, Shubham Negi, Sumeet Gupta, Mrigank Sharad, Kaushik Roy

🤿

Overview

In-memory computing (IMC) architectures leverage energy-efficient and highly parallel matrix-vector multiplication (MVM) operations, implemented directly in memory arrays.
IMC designs have been explored using both CMOS and emerging non-volatile memory (NVM) technologies like RRAM.
IMC architectures involve a large number of cores consisting of memory arrays, storing the trained weights of the deep learning (DL) model.
Peripheral units like digital-to-analog converters (DACs) and analog-to-digital converters (ADCs) are used for applying inputs and reading out the output values.
Recent designs reveal that the ADCs required for reading out the MVM results consume more than 85% of the total compute power and dominate the area, reducing the benefits of the IMC scheme.
Mitigating imperfections in the ADCs, such as non-linearity and variations, incurs significant design overheads due to dedicated calibration units.

Plain English Explanation

In-memory computing (IMC) is a way to do machine learning tasks like deep learning (DL) more efficiently. Instead of doing the calculations on a regular computer chip, IMC architectures do the calculations directly inside the memory (memory arrays) where the model's trained weights are stored. This is more energy-efficient and can be done in parallel (highly parallel) to speed things up.

Researchers have explored IMC using both standard computer chip (CMOS) technology and newer memory technologies (NVM) like RRAM. The IMC architectures have a lot of little computing units (cores) that each have their own memory to store the model's weights.

To get the inputs into the IMC system and read the outputs, peripheral units like digital-to-analog converters (DACs) and analog-to-digital converters (ADCs) are used. Recent IMC designs have found that the ADCs, which are used to read the output of the calculations, actually use more than 85% of the total power and take up a lot of space. This reduces some of the efficiency benefits of using IMC.

The ADCs also have imperfections, like being non-linear and having variations, that need to be fixed with extra calibration units, which adds more complexity and cost.

Technical Explanation

The provided paper presents a new approach to designing IMC cores that takes these peripheral units, especially the ADCs, into account. The key idea is to incorporate the non-idealities of the ADCs, like non-linearity and variations, into the training (training of DL models) of the deep learning models along with the imperfections of the memory units.

This peripheral-aware design approach can be applied to both current-mode and charge-mode MVM operations that have been demonstrated in recent IMC architectures. By accounting for the ADC limitations during training, the proposed method can significantly simplify the design of the mixed-signal IMC units, reducing the need for dedicated calibration circuitry.

Critical Analysis

The paper presents a novel approach to addressing a key challenge in IMC architectures - the high power consumption and area dominance of the ADCs required for reading out the MVM results. By incorporating the non-idealities of the ADCs into the training process, the authors aim to mitigate the design overheads associated with dedicated calibration units.

One potential limitation of the approach is that it may require retraining or fine-tuning of the DL models for each specific IMC hardware implementation, as the model would be optimized for the particular ADC characteristics. This could reduce the flexibility and portability of the DL models across different IMC platforms.

Additionally, the paper does not provide a detailed analysis of the trade-offs between the accuracy of the DL models trained with the proposed peripheral-aware approach and those trained without considering the ADC imperfections. Further research may be needed to quantify the impact on model performance.

It would also be valuable to see a more comprehensive comparison of the proposed approach against alternative techniques for addressing the ADC challenges in IMC, such as advanced ADC designs or novel MVM computation schemes that minimize the reliance on ADCs.

Conclusion

The provided paper presents a novel peripheral-aware design approach for in-memory computing (IMC) architectures used in deep learning (DL) accelerators. By incorporating the non-idealities of the analog-to-digital converters (ADCs) into the training of the DL models, the proposed method can significantly simplify the design of the mixed-signal IMC units, reducing the need for dedicated calibration units.

This work addresses a key challenge in IMC architectures, where the power-hungry and area-dominating ADCs have been a significant bottleneck, undermining the efficiency benefits of the IMC approach. The peripheral-aware design presented in this paper offers a promising solution that could help unlock the full potential of IMC for energy-efficient DL acceleration.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤿

Approximate ADCs for In-Memory Computing

Arkapravo Ghosh, Hemkar Reddy Sadana, Mukut Debnath, Panthadip Maji, Shubham Negi, Sumeet Gupta, Mrigank Sharad, Kaushik Roy

In memory computing (IMC) architectures for deep learning (DL) accelerators leverage energy-efficient and highly parallel matrix vector multiplication (MVM) operations, implemented directly in memory arrays. Such IMC designs have been explored based on CMOS as well as emerging non-volatile memory (NVM) technologies like RRAM. IMC architectures generally involve a large number of cores consisting of memory arrays, storing the trained weights of the DL model. Peripheral units like DACs and ADCs are also used for applying inputs and reading out the output values. Recently reported designs reveal that the ADCs required for reading out the MVM results, consume more than 85% of the total compute power and also dominate the area, thereby eschewing the benefits of the IMC scheme. Mitigation of imperfections in the ADCs, namely, non-linearity and variations, incur significant design overheads, due to dedicated calibration units. In this work we present peripheral aware design of IMC cores, to mitigate such overheads. It involves incorporating the non-idealities of ADCs in the training of the DL models, along with that of the memory units. The proposed approach applies equally well to both current mode as well as charge mode MVM operations demonstrated in recent years., and can significantly simplify the design of mixed-signal IMC units.

8/14/2024

Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling

Jiacong Sun, Pouya Houshmand, Marian Verhelst

In-Memory Computing (IMC) has emerged as a promising paradigm for energy-efficient, throughput-efficient and area-efficient machine learning at the edge. However, the differences in hardware architectures, array dimensions, and fabrication technologies among published IMC realizations have made it difficult to grasp their relative strengths. Moreover, previous studies have primarily focused on exploring and benchmarking the peak performance of a single IMC macro rather than full system performance on real workloads. This paper aims to address the lack of a quantitative comparison of Analog In-Memory Computing (AIMC) and Digital In-Memory Computing (DIMC) processor architectures. We propose an analytical IMC performance model that is validated against published implementations and integrated into a system-level exploration framework for comprehensive performance assessments on different workloads with varying IMC configurations. Our experiments show that while DIMC generally has higher computational density than AIMC, AIMC with large macro sizes may have better energy efficiency than DIMC on convolutional-layers and pointwise-layers, which can exploit high spatial unrolling. On the other hand, DIMC with small macro size outperforms AIMC on depthwise-layers, which feature limited spatial unrolling opportunities inside a macro.

5/27/2024

StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators

Ethan G Rogers, Sohan Salahuddin Mugdho, Kshemal Kshemendra Gupte, Cheng Wang

Crossbar-based in-memory computing (IMC) has emerged as a promising platform for hardware acceleration of deep neural networks (DNNs). However, the energy and latency of IMC systems are dominated by the large overhead of the peripheral analog-to-digital converters (ADCs). To address such ADC bottleneck, here we propose to implement stochastic processing of array-level partial sums (PS) for efficient IMC. Leveraging the probabilistic switching of spin-orbit torque magnetic tunnel junctions, the proposed PS processing eliminates the costly ADC, achieving significant improvement in energy and area efficiency. To mitigate accuracy loss, we develop PS-quantization-aware training that enables backward propagation across stochastic PS. Furthermore, a novel scheme with an inhomogeneous sampling length of the stochastic conversion is proposed. When running ResNet20 on the CIFAR-10 dataset, our architecture-to-algorithm co-design demonstrates up to 22x, 30x, and 142x improvement in energy, latency, and area, respectively, compared to IMC with standard ADC. Our optimized design configuration using stochastic PS achieved 666x (111x) improvement in Energy-Delay-Product compared to IMC with full precision ADC (sparse low-bit ADC), while maintaining near-software accuracy at various benchmark classification tasks.

7/18/2024

Multi-Objective Neural Architecture Search for In-Memory Computing

Md Hasibul Amin, Mohammadreza Mohammadi, Ramtin Zand

In this work, we employ neural architecture search (NAS) to enhance the efficiency of deploying diverse machine learning (ML) tasks on in-memory computing (IMC) architectures. Initially, we design three fundamental components inspired by the convolutional layers found in VGG and ResNet models. Subsequently, we utilize Bayesian optimization to construct a convolutional neural network (CNN) model with adaptable depths, employing these components. Through the Bayesian search algorithm, we explore a vast search space comprising over 640 million network configurations to identify the optimal solution, considering various multi-objective cost functions like accuracy/latency and accuracy/energy. Our evaluation of this NAS approach for IMC architecture deployment spans three distinct image classification datasets, demonstrating the effectiveness of our method in achieving a balanced solution characterized by high accuracy and reduced latency and energy consumption.

6/12/2024