Get a weekly rundown of the latest AI models and research... subscribe! https://aimodels.substack.com/

Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis

2402.18736

YC

2

Reddit

0

Published 4/23/2024 by Ismail Emir Yuksel, Yahya Can Tugrul, Ataberk Olgun, F. Nisa Bostanci, A. Giray Yaglikci, Geraldo F. Oliveira, Haocong Luo, Juan G'omez-Luna, Mohammad Sadrosadati, Onur Mutlu

🎲

Abstract

Processing-using-DRAM (PuD) is an emerging paradigm that leverages the analog operational properties of DRAM circuitry to enable massively parallel in-DRAM computation. PuD has the potential to reduce or eliminate costly data movement between processing elements and main memory. Prior works experimentally demonstrate three-input MAJ (MAJ3) and two-input AND and OR operations in commercial off-the-shelf (COTS) DRAM chips. Yet, demonstrations on COTS DRAM chips do not provide a functionally complete set of operations. We experimentally demonstrate that COTS DRAM chips are capable of performing 1) functionally-complete Boolean operations: NOT, NAND, and NOR and 2) many-input (i.e., more than two-input) AND and OR operations. We present an extensive characterization of new bulk bitwise operations in 256 off-the-shelf modern DDR4 DRAM chips. We evaluate the reliability of these operations using a metric called success rate: the fraction of correctly performed bitwise operations. Among our 19 new observations, we highlight four major results. First, we can perform the NOT operation on COTS DRAM chips with a 98.37% success rate on average. Second, we can perform up to 16-input NAND, NOR, AND, and OR operations on COTS DRAM chips with high reliability (e.g., 16-input NAND, NOR, AND, and OR with an average success rate of 94.94%, 95.87%, 94.94%, and 95.85%, respectively). Third, data pattern only slightly affects bitwise operations. Our results show that executing NAND, NOR, AND, and OR operations with random data patterns decreases the success rate compared to all logic-1/logic-0 patterns by 1.39%, 1.97%, 1.43%, and 1.98%, respectively. Fourth, bitwise operations are highly resilient to temperature changes, with small success rate fluctuations of at most 1.66% when the temperature is increased from 50C to 95C. We open-source our infrastructure at https://github.com/CMU-SAFARI/FCDRAM

Get summaries of the top AI research delivered straight to your inbox:

Overview

  • Introduces an emerging paradigm called "Processing-using-DRAM" (PuD) that leverages the analog properties of DRAM circuitry for massively parallel in-DRAM computation
  • Demonstrates that commercial off-the-shelf (COTS) DRAM chips can perform a functionally complete set of Boolean operations, including NOT, NAND, and NOR, as well as many-input AND and OR operations
  • Provides an extensive characterization of the reliability of these new bulk bitwise operations in 256 modern DDR4 DRAM chips

Plain English Explanation

The paper discusses an innovative approach called "Processing-using-DRAM" (PuD) that leverages the unique properties of DRAM [computer memory] circuitry to perform computations directly within the memory, rather than moving data back and forth between the memory and a separate processor. This has the potential to significantly reduce the energy and time required for data-intensive tasks.

The researchers discovered that off-the-shelf DRAM chips are capable of performing a full set of basic logic operations, including NOT, NAND, and NOR, as well as multi-input AND and OR operations. They tested these capabilities across 256 modern DDR4 DRAM chips and measured the reliability, or "success rate," of these operations.

Some key findings include:

  • The NOT operation can be performed with 98.37% success on average
  • Up to 16-input NAND, NOR, AND, and OR operations can be executed with high reliability (94-96% success rates)
  • The success rates are only slightly affected by the data patterns being used or changes in temperature

These results demonstrate that DRAM chips have untapped computational capabilities that could be leveraged to accelerate a wide range of data-intensive applications, from machine learning to scientific simulations, without the need for expensive specialized hardware. The researchers have made their experimental infrastructure publicly available to encourage further exploration in this area.

Technical Explanation

The paper experimentally demonstrates that commercial off-the-shelf (COTS) DRAM chips are capable of performing a functionally complete set of Boolean operations, including NOT, NAND, and NOR, as well as many-input (e.g., up to 16-input) AND and OR operations. This builds upon prior work that had only shown the feasibility of three-input MAJ (majority) and two-input AND and OR operations in COTS DRAM.

The researchers characterized the reliability of these new bitwise operations across 256 modern DDR4 DRAM chips, using a "success rate" metric to quantify the fraction of correctly performed operations. Key findings include:

  1. The NOT operation can be executed with an average success rate of 98.37%.
  2. Up to 16-input NAND, NOR, AND, and OR operations can be performed with high reliability, achieving average success rates of 94.94%, 95.87%, 94.94%, and 95.85%, respectively.
  3. The data pattern used in the operations has only a slight effect on the success rates. Executing the operations with random data patterns decreases the success rates by 1.39% to 1.98% compared to using all logic-1 or logic-0 patterns.
  4. The bitwise operations are highly resilient to temperature changes, with success rate fluctuations of at most 1.66% when the temperature is increased from 50°C to 95°C.

The researchers have open-sourced their experimental infrastructure at https://github.com/CMU-SAFARI/FCDRAM to encourage further exploration and development of in-DRAM processing capabilities.

Critical Analysis

The paper provides a thorough and well-designed experimental evaluation of the computational capabilities of COTS DRAM chips, going beyond prior work to demonstrate a functionally complete set of Boolean operations. The high reliability of these operations, even with many-input configurations and under varying temperature conditions, is a significant finding that highlights the untapped potential of DRAM for in-memory processing.

However, the paper does not address several important practical considerations for deploying PuD in real-world systems. For example, it does not discuss the overhead and challenges of integrating these bulk bitwise operations into existing hardware and software architectures, or the potential impact on DRAM performance and energy efficiency. Additionally, the paper does not explore the scalability of these techniques to larger DRAM arrays or newer DRAM technologies.

Nonetheless, the results presented in this paper are a valuable contribution to the growing body of research on in-memory computing and processing-in-memory architectures. The open-sourcing of the experimental infrastructure should also help facilitate further advancements in this area and encourage more researchers to explore the computational potential of commodity DRAM chips.

Conclusion

This paper demonstrates that commercial off-the-shelf DRAM chips have untapped computational capabilities that go beyond the simple memory access operations they are typically used for. By leveraging the analog properties of DRAM circuitry, the researchers were able to implement a functionally complete set of Boolean operations, including NOT, NAND, and NOR, as well as many-input AND and OR operations, with high reliability.

These findings have significant implications for the field of in-memory computing, as they suggest that DRAM chips could be leveraged to perform a wide range of data-intensive computations directly within the memory, potentially leading to substantial improvements in energy efficiency and performance for applications like machine learning, scientific simulations, and distributed optimization algorithms.

The open-sourcing of the experimental infrastructure should encourage further research and development in this area, ultimately paving the way for more efficient and scalable processing-in-memory architectures.



Related Papers

🌐

A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface

Guodong Yin, Mufeng Zhou, Yiming Chen, Wenjun Tang, Zekun Yang, Mingyen Lee, Xirui Du, Jinshan Yue, Jiaxin Liu, Huazhong Yang, Yongpan Liu, Xueqing Li

YC

0

Reddit

0

Performing data-intensive tasks in the von Neumann architecture is challenging to achieve both high performance and power efficiency due to the memory wall bottleneck. Computing-in-memory (CiM) is a promising mitigation approach by enabling parallel in-situ multiply-accumulate (MAC) operations within the memory with support from the peripheral interface and datapath. SRAM-based charge-domain CiM (CD-CiM) has shown its potential of enhanced power efficiency and computing accuracy. However, existing SRAM-based CD-CiM faces scaling challenges to meet the throughput requirement of high-performance multi-bit-quantization applications. This paper presents an SRAM-based high-throughput ReLU-optimized CD-CiM macro. It is capable of completing MAC and ReLU of two signed 8b vectors in one CiM cycle with only one A/D conversion. Along with non-linearity compensation for the analog computing and A/D conversion interfaces, this work achieves 51.2GOPS throughput and 10.3TOPS/W energy efficiency, while showing 88.6% accuracy in the CIFAR-10 dataset.

Read more

4/3/2024

Complete Boolean Algebra for Memristive and Spintronic Asymmetric Basis Logic Functions

Complete Boolean Algebra for Memristive and Spintronic Asymmetric Basis Logic Functions

Vaibhav Vyas, Joseph S. Friedman

YC

0

Reddit

0

The increasing advancement of emerging device technologies that provide alternative basis logic sets necessitates the exploration of innovative logic design automation methodologies. Specifically, emerging computing architectures based on the memristor and the bilayer avalanche spin-diode offer non-commutative or `asymmetric' operations, namely the inverted-input AND (IAND) and implication as basis logic gates. Existing logic design techniques inadequately leverage the unique characteristics of asymmetric logic functions resulting in insufficiently optimized logic circuits. This paper presents a complete Boolean algebraic framework specifically tailored to asymmetric logic functions, introducing fundamental identities, theorems and canonical normal forms that lay the groundwork for efficient synthesis and minimization of such logic circuits without relying on conventional Boolean algebra. Further, this paper establishes a logical relationship between implication and IAND operations. A previously proposed modified Karnaugh map method based on a subset of the presented algebraic principles demonstrated a 28% reduction in computational steps for an algorithmically designed memristive full adder; the presently-proposed algebraic framework lays the foundation for much greater future improvements.

Read more

4/29/2024

🏷️

Experimental demonstration of magnetic tunnel junction-based computational random-access memory

Yang Lv, Brandon R. Zink, Robert P. Bloom, Husrev C{i}lasun, Pravin Khanal, Salonik Resch, Zamshed Chowdhury, Ali Habiboglu, Weigang Wang, Sachin S. Sapatnekar, Ulya Karpuzcu, Jian-Ping Wang

YC

0

Reddit

0

Conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence, because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called computational random-access memory (CRAM) has emerged to address this fundamental limitation. CRAM performs logic operations directly using the memory cells themselves, without having the data ever leave the memory. The energy and performance benefits of CRAM for both conventional and emerging applications have been well established by prior numerical studies. However, there lacks an experimental demonstration and study of CRAM to evaluate its computation accuracy, which is a realistic and application-critical metrics for its technological feasibility and competitiveness. In this work, a CRAM array based on magnetic tunnel junctions (MTJs) is experimentally demonstrated. First, basic memory operations as well as 2-, 3-, and 5-input logic operations are studied. Then, a 1-bit full adder with two different designs is demonstrated. Based on the experimental results, a suite of modeling has been developed to characterize the accuracy of CRAM computation. Further analysis of scalar addition, multiplication, and matrix multiplication shows promising results. These results are then applied to a complete application: a neural network based handwritten digit classifier, as an example to show the connection between the application performance and further MTJ development. The classifier achieved almost-perfect classification accuracy, with reasonable projections of future MTJ development. With the confirmation of MTJ-based CRAM's accuracy, there is a strong case that this technology will have a significant impact on power- and energy-demanding applications of machine intelligence.

Read more

4/8/2024

🔄

On Error Correction for Nonvolatile Processing-In-Memory

Husrev C{i}lasun, Salonik Resch, Zamshed I. Chowdhury, Masoud Zabihi, Yang Lv, Brandon Zink, Jian-Ping Wang, Sachin S. Sapatnekar, Ulya R. Karpuzcu

YC

0

Reddit

0

Processing in memory (PiM) represents a promising computing paradigm to enhance performance of numerous data-intensive applications. Variants performing computing directly in emerging nonvolatile memories can deliver very high energy efficiency. PiM architectures directly inherit the vulnerabilities of the underlying memory substrates, but they also are subject to errors due to the computation in place. Numerous well-established error correcting codes (ECC) for memory exist, and are also considered in the PiM context, however, they typically ignore errors that occur throughout computation. In this paper we revisit the error correction design space for nonvolatile PiM, considering both storage/memory and computation-induced errors, surveying several self-checking and homomorphic approaches. We propose several solutions and analyze their complex performance-area-coverage trade-off, using three representative nonvolatile PiM technologies. All of these solutions guarantee single error correction for both, bulk bitwise computations and ordinary memory/storage errors.

Read more

4/30/2024