SELCC: Coherent Caching over Compute-Limited Disaggregated Memory

Read original: arXiv:2409.02088 - Published 9/6/2024 by Ruihong Wang, Jianguo Wang, Walid G. Aref

SELCC: Coherent Caching over Compute-Limited Disaggregated Memory

Overview

The paper proposes a technique called SELCC (Coherent Caching over Compute-Limited Disaggregated Memory) to address the challenges of coherent caching in disaggregated memory systems.
SELCC aims to provide efficient and coherent caching over compute-limited disaggregated memory, where the memory resources are physically separated from the compute resources.
The key idea is to leverage the concept of 'cache coherence' to enable effective data sharing and collaboration across different components in the disaggregated architecture.

Plain English Explanation

In a traditional computer system, the memory and the processors that use that memory are tightly integrated. However, in a disaggregated memory system, the memory resources are physically separated from the compute resources. This allows for more flexibility and scalability, but it also introduces new challenges, such as ensuring that the data stored in the memory is consistent and up-to-date across different components.

The SELCC technique aims to address this challenge by leveraging the concept of cache coherence. Cache coherence ensures that when one component of the system modifies a piece of data, all other components that use that data are aware of the change and can access the correct, up-to-date version.

By implementing SELCC, the researchers hope to enable efficient and coherent caching over the compute-limited disaggregated memory, allowing different components in the system to collaborate and share data effectively, even though the memory and compute resources are physically separate.

Technical Explanation

The SELCC approach consists of several key elements:

Coherent Caching: SELCC introduces a coherent caching mechanism that maintains consistency across the different caches in the system, even when the memory and compute resources are physically separated.
Compute-Limited Disaggregated Memory: The researchers focus on a scenario where the memory resources are abundant, but the compute resources are limited. This is a common challenge in many modern data-intensive workloads.
Cache Management: SELCC includes a cache management strategy that efficiently utilizes the available compute resources to manage the caches and maintain coherence, despite the compute-limited nature of the system.
Evaluation: The researchers evaluate the performance of SELCC using various benchmarks and workloads, and compare it to alternative approaches. The results demonstrate the effectiveness of SELCC in improving the efficiency and coherence of caching in disaggregated memory systems.

Critical Analysis

The SELCC approach addresses an important challenge in the design of disaggregated memory systems, namely the need for efficient and coherent caching. The researchers have proposed a well-designed solution that leverages the concept of cache coherence to enable effective collaboration and data sharing across the system components.

One potential limitation of the research is that it focuses on a specific scenario where the memory resources are abundant but the compute resources are limited. While this is a relevant and common scenario, it would be interesting to see how SELCC performs in other configurations, such as when both memory and compute resources are constrained.

Additionally, the researchers have not explored the implications of SELCC on the overall system architecture and the potential trade-offs it may introduce. For example, the additional overhead of maintaining cache coherence could impact the system's overall performance and efficiency.

Conclusion

The SELCC technique proposed in this paper represents a significant contribution to the field of disaggregated memory systems. By addressing the challenge of coherent caching in a compute-limited environment, the researchers have developed a solution that can enable more effective collaboration and data sharing across different components in the system.

The successful implementation of SELCC could have broader implications for the design of scalable and efficient data-intensive systems, where the separation of memory and compute resources is becoming increasingly important. As the field of disaggregated computing continues to evolve, techniques like SELCC will be crucial in ensuring that these systems can fully leverage the benefits of this architectural approach.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

SELCC: Coherent Caching over Compute-Limited Disaggregated Memory

Ruihong Wang, Jianguo Wang, Walid G. Aref

Disaggregating memory from compute offers the opportunity to better utilize stranded memory in data centers. It is important to cache data in the compute nodes and maintain cache coherence across multiple compute nodes to save on round-trip communication cost between the disaggregated memory and the compute nodes. However, the limited computing power on the disaggregated memory servers makes it challenging to maintain cache coherence among multiple compute-side caches over disaggregated shared memory. This paper introduces SELCC; a Shared-Exclusive Latch Cache Coherence protocol that maintains cache coherence without imposing any computational burden on the remote memory side. SELCC builds on a one-sided shared-exclusive latch protocol by introducing lazy latch release and invalidation messages among the compute nodes so that it can guarantee both data access atomicity and cache coherence. SELCC minimizes communication round-trips by embedding the current cache copy holder IDs into RDMA latch words and prioritizes local concurrency control over global concurrency control. We instantiate the SELCC protocol onto compute-sided cache, forming an abstraction layer over disaggregated memory. This abstraction layer provides main-memory-like APIs to upper-level applications, and thus enabling existing data structures and algorithms to function over disaggregated memory with minimal code change. To demonstrate the usability of SELCC, we implement a B-tree and three transaction concurrency control algorithms over SELCC's APIs. Micro-benchmark results show that the SELCC protocol achieves better performance compared to RPC-based cache-coherence protocols. Additionally, YCSB and TPC-C benchmarks indicate that applications over SELCC can achieve comparable or superior performance against competitors over disaggregated memory.

9/6/2024

A Programming Model for Disaggregated Memory over CXL

Gal Assa, Michal Friedman, Ori Lahav

CXL (Compute Express Link) is an emerging open industry-standard interconnect between processing and memory devices that is expected to revolutionize the way systems are designed in the near future. It enables cache-coherent shared memory pools in a disaggregated fashion at unprecedented scales, allowing algorithms to interact with a variety of storage devices using simple loads and stores in a cacheline granularity. Alongside with unleashing unique opportunities for a wide range of applications, CXL introduces new challenges of data management and crash consistency. Alas, CXL lacks an adequate programming model, which makes reasoning about the correctness and expected behaviors of algorithms and systems on top of it nearly impossible. In this work, we present CXL0, the first programming model for concurrent programs running on top of CXL. We propose a high-level abstraction for CXL memory accesses and formally define operational semantics on top of that abstraction. We provide a set of general transformations that adapt concurrent algorithms to the new disruptive technology. Using these transformations, every linearizable algorithm can be easily transformed into its provably correct version in the face of a full-system or sub-system crash. We believe that this work will serve as the stepping stone for systems design and modelling on top of CXL, and support the development of future models as software and hardware evolve.

7/24/2024

ICGMM: CXL-enabled Memory Expansion with Intelligent Caching Using Gaussian Mixture Model

Hanqiu Chen, Yitu Wang, Luis Vitorio Cargnini, Mohammadreza Soltaniyeh, Dongyang Li, Gongjin Sun, Pradeep Subedi, Andrew Chang, Yiran Chen, Cong Hao

Compute Express Link (CXL) emerges as a solution for wide gap between computational speed and data communication rates among host and multiple devices. It fosters a unified and coherent memory space between host and CXL storage devices such as such as Solid-state drive (SSD) for memory expansion, with a corresponding DRAM implemented as the device cache. However, this introduces challenges such as substantial cache miss penalties, sub-optimal caching due to data access granularity mismatch between the DRAM cache and SSD memory, and inefficient hardware cache management. To address these issues, we propose a novel solution, named ICGMM, which optimizes caching and eviction directly on hardware, employing a Gaussian Mixture Model (GMM)-based approach. We prototype our solution on an FPGA board, which demonstrates a noteworthy improvement compared to the classic Least Recently Used (LRU) cache strategy. We observe a decrease in the cache miss rate ranging from 0.32% to 6.14%, leading to a substantial 16.23% to 39.14% reduction in the average SSD access latency. Furthermore, when compared to the state-of-the-art Long Short-Term Memory (LSTM)-based cache policies, our GMM algorithm on FPGA showcases an impressive latency reduction of over 10,000 times. Remarkably, this is achieved while demanding much fewer hardware resources.

8/13/2024

🎯

MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool

Cunchen Hu, Heyang Huang, Junhao Hu, Jiang Xu, Xusheng Chen, Tao Xie, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan

Large language model (LLM) serving has transformed from stateless to stateful systems, utilizing techniques like context caching and disaggregated inference. These optimizations extend the lifespan and domain of the KV cache, necessitating a new architectural approach. We present MemServe, a unified system that integrates both inter-request and intra-request optimizations. MemServe introduces MemPool, an elastic memory pool managing distributed memory and KV caches across serving instances. Using MemPool APIs, MemServe combines context caching with disaggregated inference for the first time, supported by a global scheduler that enhances cache reuse through a global prompt tree-based locality-aware policy. Tests show that MemServe significantly improves job completion time and time-to-first-time.

6/27/2024