ARC: DVFS-Aware Asymmetric-Retention STT-RAM Caches for Energy-Efficient Multicore Processors

Read original: arXiv:2407.19612 - Published 7/30/2024 by Dhruv Gajaria, Tosiron Adegbija

ARC: DVFS-Aware Asymmetric-Retention STT-RAM Caches for Energy-Efficient Multicore Processors

Overview

The paper proposes a new cache design called ARC (Asymmetric-Retention STT-RAM Caches) for energy-efficient multicore processors.
ARC leverages the variable retention time characteristics of Spin-Transfer Torque RAM (STT-RAM) to reduce energy consumption.
It uses DVFS (Dynamic Voltage and Frequency Scaling) to further optimize energy usage.

Plain English Explanation

Spin-Transfer Torque RAM (STT-RAM) is a type of computer memory that can store data without requiring constant power. Unlike traditional RAM, STT-RAM can retain information even when the power is turned off. However, the information in STT-RAM can decay over time, with some cells losing their data faster than others.

The researchers behind this paper recognized that this "asymmetric retention" of STT-RAM could be used to improve the energy efficiency of computer processors. They developed a new cache design called ARC that takes advantage of this variable retention time.

Caches are small, fast memory units that store frequently accessed data, helping to speed up a processor's performance. ARC divides the cache into sections with different retention times. Data that is accessed frequently is stored in the sections with the longest retention, while less frequently used data is placed in the shorter retention sections.

This allows the processor to use lower voltage and frequency settings (DVFS) for the shorter retention sections, saving a significant amount of energy without impacting performance. The processor can then focus its higher power settings on the critical data in the longer retention sections.

By carefully managing the cache in this way, ARC is able to reduce the overall energy consumption of the processor while maintaining its speed and responsiveness.

Technical Explanation

The key innovation in ARC is its asymmetric-retention STT-RAM cache design. Traditional STT-RAM caches have uniform retention times, meaning all cache lines have the same potential to lose their data over time. ARC instead divides the cache into regions with different retention times.

The high-retention region stores critical data that needs to be accessed quickly, while the low-retention region holds less frequently used data. The processor can then apply DVFS (dynamic voltage and frequency scaling) to the low-retention region, reducing its power consumption without affecting performance.

ARC uses a runtime management mechanism to dynamically track cache line access patterns and migrate data between the high and low-retention regions accordingly. This ensures that frequently accessed data is kept in the high-retention, high-power region, while less critical data is stored in the low-retention, low-power region.

The researchers evaluated ARC through detailed simulations and found that it can reduce energy consumption by up to 40% compared to a traditional STT-RAM cache, with minimal impact on performance.

Critical Analysis

The paper provides a comprehensive evaluation of ARC, including comparisons to other cache designs and analysis of various trade-offs. However, the authors acknowledge that the benefits of ARC may be workload-dependent, with some applications benefiting more than others.

Additionally, the implementation complexity of ARC, particularly the runtime data migration mechanism, could be a practical concern. The authors suggest that further research is needed to optimize this aspect and explore ways to reduce the overhead.

Another limitation is that the paper does not consider the potential reliability implications of the asymmetric retention times. It's possible that the frequent power state changes and data migrations could accelerate the degradation of the STT-RAM cells over time, which could impact the overall system reliability.

Conclusion

The ARC cache design presented in this paper demonstrates a promising approach to leveraging the unique characteristics of STT-RAM to improve the energy efficiency of multicore processors. By exploiting the variable retention times of STT-RAM cells, ARC is able to achieve significant energy savings without compromising performance.

While the paper highlights the potential of this technique, further research is needed to address the implementation challenges and explore the long-term reliability implications. Nonetheless, ARC represents an important step towards more energy-efficient computing systems that can better meet the growing demand for high-performance, low-power processing.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

ARC: DVFS-Aware Asymmetric-Retention STT-RAM Caches for Energy-Efficient Multicore Processors

Dhruv Gajaria, Tosiron Adegbija

Relaxed retention (or volatile) spin-transfer torque RAM (STT-RAM) has been widely studied as a way to reduce STT-RAM's write energy and latency overheads. Given a relaxed retention time STT-RAM level one (L1) cache, we analyze the impacts of dynamic voltage and frequency scaling (DVFS) -- a common optimization in modern processors -- on STT-RAM L1 cache design. Our analysis reveals that, apart from the fact that different applications may require different retention times, the clock frequency, which is typically ignored in most STT-RAM studies, may also significantly impact applications' retention time needs. Based on our findings, we propose an asymmetric-retention core (ARC) design for multicore architectures. ARC features retention time heterogeneity to specialize STT-RAM retention times to applications' needs. We also propose a runtime prediction model to determine the best core on which to run an application, based on the applications' characteristics, their retention time requirements, and available DVFS settings. Results reveal that the proposed approach can reduce the average cache energy by 20.19% and overall processor energy by 7.66%, compared to a homogeneous STT-RAM cache design.

7/30/2024

SCART: Predicting STT-RAM Cache Retention Times Using Machine Learning

Dhruv Gajaria, Kyle Kuan, Tosiron Adegbija

Prior studies have shown that the retention time of the non-volatile spin-transfer torque RAM (STT-RAM) can be relaxed in order to reduce STT-RAM's write energy and latency. However, since different applications may require different retention times, STT-RAM retention times must be critically explored to satisfy various applications' needs. This process can be challenging due to exploration overhead, and exacerbated by the fact that STT-RAM caches are emerging and are not readily available for design time exploration. This paper explores using known and easily obtainable statistics (e.g., SRAM statistics) to predict the appropriate STT-RAM retention times, in order to minimize exploration overhead. We propose an STT-RAM Cache Retention Time (SCART) model, which utilizes machine learning to enable design time or runtime prediction of right-provisioned STT-RAM retention times for latency or energy optimization. Experimental results show that, on average, SCART can reduce the latency and energy by 20.34% and 29.12%, respectively, compared to a homogeneous retention time while reducing the exploration overheads by 52.58% compared to prior work.

7/30/2024

STT-RAM-based Hierarchical In-Memory Computing

Dhruv Gajaria, Kevin Antony Gomez, Tosiron Adegbija

In-memory computing promises to overcome the von Neumann bottleneck in computer systems by performing computations directly within the memory. Previous research has suggested using Spin-Transfer Torque RAM (STT-RAM) for in-memory computing due to its non-volatility, low leakage power, high density, endurance, and commercial viability. This paper explores hierarchical in-memory computing, where different levels of the memory hierarchy are augmented with processing elements to optimize workload execution. The paper investigates processing in memory (PiM) using non-volatile STT-RAM and processing in cache (PiC) using volatile STT-RAM with relaxed retention, which helps mitigate STT-RAM's write latency and energy overheads. We analyze tradeoffs and overheads associated with data movement for PiC versus write overheads for PiM using STT-RAMs for various workloads. We examine workload characteristics, such as computational intensity and CPU-dependent workloads with limited instruction-level parallelism, and their impact on PiC/PiM tradeoffs. Using these workloads, we evaluate computing in STT-RAM versus SRAM at different cache hierarchy levels and explore the potential of heterogeneous STT-RAM cache architectures with various retention times for PiC and CPU-based computing. Our experiments reveal significant advantages of STT-RAM-based PiC over PiM for specific workloads. Finally, we describe open research problems in hierarchical in-memory computing architectures to further enhance this paradigm.

7/30/2024

An Energy-efficient Capacitive-RRAM Content Addressable Memory

Yihan Pan, Adrian Wheeldon, Mohammed Mughal, Shady Agwa, Themis Prodromakis, Alexantrou Serb

Content addressable memory is popular in intelligent computing systems as it allows parallel content-searching in memory. Emerging CAMs show a promising increase in bitcell density and a decrease in power consumption than pure CMOS solutions. This article introduced an energy-efficient 3T1R1C TCAM cooperating with capacitor dividers and RRAM devices. The RRAM as a storage element also acts as a switch to the capacitor divider while searching for content. CAM cells benefit from working parallel in an array structure. We implemented a 64 x 64 array and digital controllers to perform with an internal built-in clock frequency of 875MHz. Both data searches and reads take three clock cycles. Its worst average energy for data match is reported to be 1.71fJ/bit-search and the worst average energy for data miss is found at 4.69fJ/bit-search. The prototype is simulated and fabricated in 0.18um technology with in-lab RRAM post-processing. Such memory explores the charge domain searching mechanism and can be applied to data centers that are power-hungry.

9/17/2024