Position: CXL Shared Memory Programming: Barely Distributed and Almost Persistent

Read original: arXiv:2405.19626 - Published 7/18/2024 by Yi Xu, Suyash Mahar, Ziheng Liu, Mingyao Shen, Steven Swanson

Position: CXL Shared Memory Programming: Barely Distributed and Almost Persistent

Overview

This research paper explores the design approaches for memory sharing using the Compute Express Link (CXL) hardware and software.
It discusses the failure model, programming model, and persistence guarantees of CXL-based shared memory systems.
The paper also covers the potential benefits and challenges of adopting CXL technology in hyperscale environments.

Plain English Explanation

CXL is a new hardware interface that allows different computing devices, like CPUs and GPUs, to share memory with each other. This can make computing more efficient and powerful, as devices can access data and resources from a common pool of memory.

The paper looks at how developers can write software to take advantage of CXL's memory sharing capabilities. It explains the types of failures that can occur in these systems, and the guarantees that the software can provide about data persistence - in other words, how well the data will be preserved even if there is a power outage or other disruption.

The researchers also discuss the potential benefits of using CXL technology in large-scale data centers and cloud computing environments. CXL could help improve the efficiency and performance of these systems, but there may also be some challenges to adopting the new technology.

Technical Explanation

The paper presents a programming model and failure semantics for CXL-based shared memory systems. It introduces the concept of "barely distributed" and "almost persistent" shared memory, which aims to provide a simple and efficient way for applications to leverage CXL's memory sharing capabilities.

The proposed failure model accounts for different types of failures, such as node failures and network partitions. The paper discusses how the software can provide persistence guarantees to ensure that data is preserved even in the face of these failures.

The authors also describe the design and implementation of a CXL emulation framework that can be used to evaluate CXL-based systems. This framework allows developers to test their applications and study the performance and reliability of CXL-enabled architectures.

Critical Analysis

The paper presents a well-designed programming model and failure semantics for CXL-based shared memory systems. The authors have carefully considered the tradeoffs between simplicity, performance, and reliability, and have proposed an approach that tries to balance these factors.

However, the paper does not address some potential issues, such as the challenges of error correction and fault tolerance in CXL-based systems. Additionally, the paper does not explore the implications of offloading compute tasks to the CXL-connected devices, which could be an important use case for CXL technology.

Overall, the research provides a solid foundation for developing CXL-based shared memory systems, but there is still room for further exploration and refinement of the concepts presented.

Conclusion

This paper offers a compelling approach to leveraging the memory sharing capabilities of CXL technology in software systems. By introducing the "barely distributed" and "almost persistent" programming model, the authors have outlined a way to simplify the development of CXL-based applications while still providing important reliability and persistence guarantees.

The insights and tools presented in this research could pave the way for more widespread adoption of CXL technology in a variety of computing environments, from data centers to edge devices. As the industry continues to explore the potential of CXL, this work serves as an important stepping stone towards realizing the full benefits of this emerging hardware interface.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Position: CXL Shared Memory Programming: Barely Distributed and Almost Persistent

Yi Xu, Suyash Mahar, Ziheng Liu, Mingyao Shen, Steven Swanson

While Compute Express Link (CXL) enables support for cache-coherent shared memory among multiple nodes, it also introduces new types of failures--processes can fail before data does, or data might fail before a process does. The lack of a failure model for CXL-based shared memory makes it challenging to understand and mitigate these failures. To solve these challenges, in this paper, we describe a model categorizing and handling the CXL-based shared memory's failures: data and process failures. Data failures in CXL-based shared memory render data inaccessible or inconsistent for a currently running application. We argue that such failures are unlike data failures in distributed storage systems and require CXL-specific handling. To address this, we look into traditional data failure mitigation techniques like erasure coding and replication and propose new solutions to better handle data failures in CXL-based shared memory systems. Next, we look into process failures and compare the failures and potential solutions with PMEM's failure model and programming solutions. We argue that although PMEM shares some of CXL's characteristics, it does not fully address CXL's volatile nature and low access latencies. Finally, taking inspiration from PMEM programming solutions, we propose techniques to handle these new failures. Thus, this paper is the first work to define the CXL-based shared memory failure model and propose tailored solutions that address challenges specific to CXL-based systems.

7/18/2024

A Programming Model for Disaggregated Memory over CXL

Gal Assa, Michal Friedman, Ori Lahav

CXL (Compute Express Link) is an emerging open industry-standard interconnect between processing and memory devices that is expected to revolutionize the way systems are designed in the near future. It enables cache-coherent shared memory pools in a disaggregated fashion at unprecedented scales, allowing algorithms to interact with a variety of storage devices using simple loads and stores in a cacheline granularity. Alongside with unleashing unique opportunities for a wide range of applications, CXL introduces new challenges of data management and crash consistency. Alas, CXL lacks an adequate programming model, which makes reasoning about the correctness and expected behaviors of algorithms and systems on top of it nearly impossible. In this work, we present CXL0, the first programming model for concurrent programs running on top of CXL. We propose a high-level abstraction for CXL memory accesses and formally define operational semantics on top of that abstraction. We provide a set of general transformations that adapt concurrent algorithms to the new disruptive technology. Using these transformations, every linearizable algorithm can be easily transformed into its provably correct version in the face of a full-system or sub-system crash. We believe that this work will serve as the stepping stone for systems design and modelling on top of CXL, and support the development of future models as software and hardware evolve.

7/24/2024

Memory Sharing with CXL: Hardware and Software Design Approaches

Sunita Jain, Nagaradhesh Yeleswarapu, Hasan Al Maruf, Rita Gupta

Compute Express Link (CXL) is a rapidly emerging coherent interconnect standard that provides opportunities for memory pooling and sharing. Memory sharing is a well-established software feature that improves memory utilization by avoiding unnecessary data movement. In this paper, we discuss multiple approaches to enable memory sharing with different generations of CXL protocol (i.e., CXL 2.0 and CXL 3.0) considering the challenges with each of the architectures from the device hardware and software viewpoint.

4/5/2024

Streamlining CXL Adoption for Hyperscale Efficiency

Angelos Arelakis, Nilesh Shah, Yiannis Nikolakopoulos, Dimitrios Palyvos-Giannas

In our exploration of Composable Memory systems utilizing CXL, we focus on overcoming adoption barriers at Hyperscale, underscored by economic models demonstrating Total Cost of Ownership (TCO). While CXL addresses the pressing memory capacity needs of emerging Hyperscale applications, the escalating demands from evolving use cases such as AI outpace the capabilities of current CXL solutions. Hyperscalers resort to software-based memory (de)compression technology, alleviating memory capacity, storage, and network constraints but incurring a notable Tax on Compute CPU cycles. As a pivotal guide to the CXL community, Hyperscalers have formulated the groundbreaking Open Compute Project (OCP) Hyperscale CXL Tiered Memory Expander specification. If implemented, this specification lowers TCO adoption barriers, enabling diverse CXL deployments at both Hyperscaler and Enterprise levels. We present a CXL integrated solution, aligning with the aforementioned specification, introducing an energy-efficient, scalable, hardware-accelerated, Lossless Compressed Memory CXL Tier. This solution, slated for mid-2024 production and open for integration with Memory Expander controller manufacturers, offers 2-3X CXL memory compression in nanoseconds, delivering a 20-25% reduction in TCO for end customers without requiring additional physical slots. In our discussion, we pinpoint areas for collaborative innovation within the CXL Community to expedite software/hardware advancements for CXL Tiered Memory Expansion. Furthermore, we delve into unresolved challenges in Pooled deployment and explore potential solutions, collectively aiming to make CXL adoption a No Brainer at Hyperscale.

4/5/2024