STT-RAM-based Hierarchical In-Memory Computing

Read original: arXiv:2407.19637 - Published 7/30/2024 by Dhruv Gajaria, Kevin Antony Gomez, Tosiron Adegbija

STT-RAM-based Hierarchical In-Memory Computing

Overview

STT-RAM (Spin-Transfer Torque Magnetic RAM) is a type of non-volatile memory that can be used for in-memory computing
The paper proposes a hierarchical in-memory computing architecture that leverages the relaxed retention time of STT-RAM to improve energy efficiency
Key ideas include using STT-RAM for in-cache computing and exploiting the variable retention time properties of STT-RAM

Plain English Explanation

The paper explores a new way to use a type of computer memory called STT-RAM to perform computations more efficiently. STT-RAM is special because it can retain data without constantly using power, unlike traditional computer memory.

The researchers developed a hierarchical in-memory computing architecture that takes advantage of STT-RAM's unique properties. They found that STT-RAM can actually hold onto data for varying lengths of time before needing to be refreshed. By predicting the retention times of different pieces of data, the system can selectively refresh only what's needed, saving a lot of energy compared to constantly refreshing everything.

The researchers also show how to use STT-RAM for in-cache computing, performing computations right within the computer's memory instead of shuttling data back and forth. This can further improve efficiency by reducing the amount of data movement required.

Overall, this work demonstrates how the special properties of emerging memory technologies like STT-RAM can be leveraged to build more energy-efficient computing systems, an important goal as computing becomes more pervasive.

Technical Explanation

The paper presents a hierarchical in-memory computing architecture that utilizes the unique properties of STT-RAM to improve energy efficiency. STT-RAM is a type of non-volatile memory that can retain data without continuous power, unlike traditional SRAM and DRAM.

The key innovation is exploiting the relaxed retention time characteristics of STT-RAM. The authors show that the retention time of STT-RAM can vary significantly, and by predicting these retention times, the system can selectively refresh only the data that is about to be lost. This selective refresh mechanism saves a substantial amount of energy compared to constantly refreshing all data.

The proposed architecture uses STT-RAM in a hierarchical manner, with a small SRAM cache on top of a larger STT-RAM main memory. The in-cache computing capability of STT-RAM is exploited to perform computations directly within the memory, further reducing energy consumption by minimizing data movement.

The authors evaluate their design through simulations and show significant improvements in energy efficiency over traditional memory hierarchies, with only modest performance degradation. The flexible retention time of STT-RAM is a key enabler for these energy savings.

Critical Analysis

The paper presents a well-designed and thorough study of using STT-RAM for energy-efficient in-memory computing. The proposed hierarchical architecture and selective refresh mechanism are novel and promising approaches to leveraging the unique properties of STT-RAM.

One potential limitation is the reliance on accurate retention time prediction, which may be challenging in practice due to factors like process variation and temperature. The authors acknowledge this and suggest techniques like adaptive refresh to mitigate the impact.

Additionally, while the authors demonstrate energy savings, the overall system performance impact is not extensively explored. Further research is needed to fully characterize the tradeoffs between energy efficiency and performance in real-world applications.

Conclusion

This paper presents an innovative approach to leveraging the unique properties of emerging memory technologies like STT-RAM for energy-efficient in-memory computing. By exploiting the relaxed retention time of STT-RAM and integrating in-cache computing capabilities, the proposed hierarchical architecture can significantly reduce the energy consumption of computing systems.

As the demand for energy-efficient computing continues to grow, especially in areas like edge devices and data centers, this research highlights the potential of novel memory technologies to enable more sustainable and high-performance computing solutions. Further advances in areas like retention time prediction and system-level optimization can help bring these ideas closer to real-world implementation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

STT-RAM-based Hierarchical In-Memory Computing

Dhruv Gajaria, Kevin Antony Gomez, Tosiron Adegbija

In-memory computing promises to overcome the von Neumann bottleneck in computer systems by performing computations directly within the memory. Previous research has suggested using Spin-Transfer Torque RAM (STT-RAM) for in-memory computing due to its non-volatility, low leakage power, high density, endurance, and commercial viability. This paper explores hierarchical in-memory computing, where different levels of the memory hierarchy are augmented with processing elements to optimize workload execution. The paper investigates processing in memory (PiM) using non-volatile STT-RAM and processing in cache (PiC) using volatile STT-RAM with relaxed retention, which helps mitigate STT-RAM's write latency and energy overheads. We analyze tradeoffs and overheads associated with data movement for PiC versus write overheads for PiM using STT-RAMs for various workloads. We examine workload characteristics, such as computational intensity and CPU-dependent workloads with limited instruction-level parallelism, and their impact on PiC/PiM tradeoffs. Using these workloads, we evaluate computing in STT-RAM versus SRAM at different cache hierarchy levels and explore the potential of heterogeneous STT-RAM cache architectures with various retention times for PiC and CPU-based computing. Our experiments reveal significant advantages of STT-RAM-based PiC over PiM for specific workloads. Finally, we describe open research problems in hierarchical in-memory computing architectures to further enhance this paradigm.

7/30/2024

CHIME: Energy-Efficient STT-RAM-based Concurrent Hierarchical In-Memory Processing

Dhruv Gajaria, Tosiron Adegbija, Kevin Gomez

Processing-in-cache (PiC) and Processing-in-memory (PiM) architectures, especially those utilizing bit-line computing, offer promising solutions to mitigate data movement bottlenecks within the memory hierarchy. While previous studies have explored the integration of compute units within individual memory levels, the complexity and potential overheads associated with these designs have often limited their capabilities. This paper introduces a novel PiC/PiM architecture, Concurrent Hierarchical In-Memory Processing (CHIME), which strategically incorporates heterogeneous compute units across multiple levels of the memory hierarchy. This design targets the efficient execution of diverse, domain-specific workloads by placing computations closest to the data where it optimizes performance, energy consumption, data movement costs, and area. CHIME employs STT-RAM due to its various advantages in PiC/PiM computing, such as high density, low leakage, and better resiliency to data corruption from activating multiple word lines. We demonstrate that CHIME enhances concurrency and improves compute unit utilization at each level of the memory hierarchy. We present strategies for exploring the design space, grouping, and placing the compute units across the memory hierarchy. Experiments reveal that, compared to the state-of-the-art bit-line computing approaches, CHIME achieves significant speedup and energy savings of 57.95% and 78.23% for various domain-specific workloads, while reducing the overheads associated with single-level compute designs.

7/30/2024

SCART: Predicting STT-RAM Cache Retention Times Using Machine Learning

Dhruv Gajaria, Kyle Kuan, Tosiron Adegbija

Prior studies have shown that the retention time of the non-volatile spin-transfer torque RAM (STT-RAM) can be relaxed in order to reduce STT-RAM's write energy and latency. However, since different applications may require different retention times, STT-RAM retention times must be critically explored to satisfy various applications' needs. This process can be challenging due to exploration overhead, and exacerbated by the fact that STT-RAM caches are emerging and are not readily available for design time exploration. This paper explores using known and easily obtainable statistics (e.g., SRAM statistics) to predict the appropriate STT-RAM retention times, in order to minimize exploration overhead. We propose an STT-RAM Cache Retention Time (SCART) model, which utilizes machine learning to enable design time or runtime prediction of right-provisioned STT-RAM retention times for latency or energy optimization. Experimental results show that, on average, SCART can reduce the latency and energy by 20.34% and 29.12%, respectively, compared to a homogeneous retention time while reducing the exploration overheads by 52.58% compared to prior work.

7/30/2024

🔄

On Error Correction for Nonvolatile Processing-In-Memory

Husrev C{i}lasun, Salonik Resch, Zamshed I. Chowdhury, Masoud Zabihi, Yang Lv, Brandon Zink, Jian-Ping Wang, Sachin S. Sapatnekar, Ulya R. Karpuzcu

Processing in memory (PiM) represents a promising computing paradigm to enhance performance of numerous data-intensive applications. Variants performing computing directly in emerging nonvolatile memories can deliver very high energy efficiency. PiM architectures directly inherit the vulnerabilities of the underlying memory substrates, but they also are subject to errors due to the computation in place. Numerous well-established error correcting codes (ECC) for memory exist, and are also considered in the PiM context, however, they typically ignore errors that occur throughout computation. In this paper we revisit the error correction design space for nonvolatile PiM, considering both storage/memory and computation-induced errors, surveying several self-checking and homomorphic approaches. We propose several solutions and analyze their complex performance-area-coverage trade-off, using three representative nonvolatile PiM technologies. All of these solutions guarantee single error correction for both, bulk bitwise computations and ordinary memory/storage errors.

4/30/2024