Reducing Memory Contention and I/O Congestion for Disk-based GNN Training

Read original: arXiv:2406.13984 - Published 6/21/2024 by Qisheng Jiang, Lei Jia, Chundong Wang

Reducing Memory Contention and I/O Congestion for Disk-based GNN Training

Overview

This paper presents techniques to improve the efficiency of training Graph Neural Networks (GNNs) on disk-based systems, addressing the key challenges of memory contention and I/O congestion.
The proposed methods include asynchronous I/O, tile-based data loading, and an adaptive batch size scheme to mitigate these issues and enable effective disk-based GNN training.
The research demonstrates significant performance improvements over existing disk-based GNN training approaches, making GNN models more accessible on resource-constrained devices.

Plain English Explanation

Graph Neural Networks (GNNs) are a type of machine learning model that can learn from and make predictions on data represented as graphs, which are networks of interconnected nodes and edges. Training these models typically requires a lot of memory and computational resources, which can be challenging for devices with limited hardware capabilities, such as mobile phones or embedded systems.

To address this, the researchers in this paper developed new techniques to enable effective GNN training on disk-based systems, where the model's data is stored on a hard drive rather than in main memory. This is important because disk-based storage is generally slower and more constrained than in-memory storage, but it can be more affordable and accessible, especially for devices with limited resources.

The key challenges the researchers tackled were memory contention, where multiple processes compete for the same memory resources, and I/O congestion, where the disk becomes overwhelmed with too many simultaneous read and write operations. To overcome these issues, the researchers proposed several innovations:

Asynchronous I/O: By overlapping disk I/O operations with the model's computation, the system can continue processing while waiting for data to be loaded from the disk, reducing the impact of slow disk access.
Tile-based data loading: The researchers divided the GNN's input data into smaller "tiles" that can be loaded and processed independently, reducing memory requirements and I/O congestion.
Adaptive batch size: The system dynamically adjusts the size of the data batches being processed to match the available memory and disk bandwidth, optimizing performance.

By implementing these techniques, the researchers were able to significantly improve the efficiency and performance of GNN training on disk-based systems, making these powerful models more accessible on a broader range of devices, including those with limited resources. This could have important implications for applications like edge computing, where GNNs could be used to analyze data directly on low-power devices, rather than relying on powerful servers.

Technical Explanation

The paper presents several novel techniques to address the challenges of memory contention and I/O congestion in disk-based GNN training:

Asynchronous I/O: The system overlaps disk I/O operations with the model's computation by using a producer-consumer design. While the model is processing a batch of data, the system asynchronously loads the next batch from the disk, reducing the impact of slow disk access.
Tile-based data loading: The researchers divide the GNN's input data into smaller "tiles" that can be loaded and processed independently. This reduces memory requirements and I/O congestion, as only the necessary data for each batch needs to be loaded.
Adaptive batch size: The system dynamically adjusts the size of the data batches being processed to match the available memory and disk bandwidth. This optimization helps to maximize performance by avoiding memory oversubscription or I/O bottlenecks.

The researchers evaluated their techniques on several GNN benchmarks, including DiskGNN, SpanGNN, and Device-Training-Under-256KB-Memory. They demonstrated significant performance improvements over existing disk-based GNN training approaches, reducing training time by up to 80% while maintaining model accuracy.

Critical Analysis

The researchers have done a commendable job in addressing the practical challenges of running GNN models on resource-constrained devices with limited memory and disk bandwidth. Their techniques, such as asynchronous I/O and adaptive batch sizing, are well-designed and effective in mitigating memory contention and I/O congestion issues.

However, the paper does not discuss the potential limitations or caveats of their approach. For example, it's unclear how the performance of their techniques would scale with larger datasets or more complex GNN architectures. Additionally, the paper does not explore the energy efficiency or power consumption implications of their disk-based training approach, which could be an important consideration for mobile and embedded applications.

Further research could investigate the trade-offs between training time, model accuracy, and resource utilization (e.g., memory, disk bandwidth, energy consumption) to provide a more comprehensive understanding of the practical constraints and considerations for deploying GNNs on low-power devices. Comparisons to alternative approaches, such as ProTrain for memory-efficient training or Rethinking-Accelerating-Graph-Condensation for training-free graph condensation, could also provide valuable insights.

Conclusion

This paper presents a set of innovative techniques to enable efficient disk-based training of Graph Neural Networks, addressing the key challenges of memory contention and I/O congestion. By leveraging asynchronous I/O, tile-based data loading, and adaptive batch sizing, the researchers have demonstrated significant performance improvements over existing disk-based GNN training approaches.

The proposed methods have the potential to make GNN models more accessible on resource-constrained devices, such as mobile phones and embedded systems, which could lead to new applications in edge computing, IoT, and various other domains where powerful machine learning models need to be deployed on hardware with limited resources. As the field of GNNs continues to evolve, techniques like those presented in this paper will be crucial in bridging the gap between model accuracy and real-world deployment constraints.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Reducing Memory Contention and I/O Congestion for Disk-based GNN Training

Qisheng Jiang, Lei Jia, Chundong Wang

Graph neural networks (GNNs) gain wide popularity. Large graphs with high-dimensional features become common and training GNNs on them is non-trivial on an ordinary machine. Given a gigantic graph, even sample-based GNN training cannot work efficiently, since it is difficult to keep the graph's entire data in memory during the training process. Leveraging a solid-state drive (SSD) or other storage devices to extend the memory space has been studied in training GNNs. Memory and I/Os are hence critical for effectual disk-based training. We find that state-of-the-art (SoTA) disk-based GNN training systems severely suffer from issues like the memory contention between a graph's topological and feature data, and severe I/O congestion upon loading data from SSD for training. We accordingly develop GNNDrive. GNNDrive 1) minimizes the memory footprint with holistic buffer management across sampling and extracting, and 2) avoids I/O congestion through a strategy of asynchronous feature extraction. It also avoids costly data preparation on the critical path and makes the most of software and hardware resources. Experiments show that GNNDrive achieves superior performance. For example, when training with the Papers100M dataset and GraphSAGE model, GNNDrive is faster than SoTA PyG+, Ginex, and MariusGNN by 16.9x, 2.6x, and 2.7x, respectively.

6/21/2024

📈

DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-Core GNN Training

Renjie Liu, Yichuan Wang, Xiao Yan, Zhenkun Cai, Minjie Wang, Haitian Jiang, Bo Tang, Jinyang Li

Graph neural networks (GNNs) are machine learning models specialized for graph data and widely used in many applications. To train GNNs on large graphs that exceed CPU memory, several systems store data on disk and conduct out-of-core processing. However, these systems suffer from either read amplification when reading node features that are usually smaller than a disk page or degraded model accuracy by treating the graph as disconnected partitions. To close this gap, we build a system called DiskGNN, which achieves high I/O efficiency and thus fast training without hurting model accuracy. The key technique used by DiskGNN is offline sampling, which helps decouple graph sampling from model computation. In particular, by conducting graph sampling beforehand, DiskGNN acquires the node features that will be accessed by model computation, and such information is utilized to pack the target node features contiguously on disk to avoid read amplification. Besides, name{} also adopts designs including four-level feature store to fully utilize the memory hierarchy to cache node features and reduce disk access, batched packing to accelerate the feature packing process, and pipelined training to overlap disk access with other operations. We compare DiskGNN with Ginex and MariusGNN, which are state-of-the-art systems for out-of-core GNN training. The results show that DiskGNN can speed up the baselines by over 8x while matching their best model accuracy.

5/9/2024

🏋️

LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme

Jeongmin Brian Park, Kun Wu, Vikram Sharma Mailthody, Zaid Quresh, Scott Mahlke, Wen-mei Hwu

Graph Neural Networks (GNNs) are widely used today in recommendation systems, fraud detection, and node/link classification tasks. Real world GNNs continue to scale in size and require a large memory footprint for storing graphs and embeddings that often exceed the memory capacities of the target GPUs used for training. To address limited memory capacities, traditional GNN training approaches use graph partitioning and sharding techniques to scale up across multiple GPUs within a node and/or scale out across multiple nodes. However, this approach suffers from the high computational costs of graph partitioning algorithms and inefficient communication across GPUs. To address these overheads, we propose Large-scale Storage-based Multi-GPU GNN framework (LSM-GNN), a storagebased approach to train GNN models that utilizes a novel communication layer enabling GPU software caches to function as a system-wide shared cache with low overheads.LSM-GNN incorporates a hybrid eviction policy that intelligently manages cache space by using both static and dynamic node information to significantly enhance cache performance. Furthermore, we introduce the Preemptive Victim-buffer Prefetcher (PVP), a mechanism for prefetching node feature data from a Victim Buffer located in CPU pinned-memory to further reduce the pressure on the storage devices. Experimental results show that despite the lower compute capabilities and memory capacities, LSM-GNN in a single node with two GPUs offers superior performance over two-node-four-GPU Dist-DGL baseline and provides up to 3.75x speed up on end-to-end epoch time while running large-scale GNN training

7/23/2024

SpanGNN: Towards Memory-Efficient Graph Neural Networks via Spanning Subgraph Training

Xizhi Gu, Hongzheng Li, Shihong Gao, Xinyan Zhang, Lei Chen, Yingxia Shao

Graph Neural Networks (GNNs) have superior capability in learning graph data. Full-graph GNN training generally has high accuracy, however, it suffers from large peak memory usage and encounters the Out-of-Memory problem when handling large graphs. To address this memory problem, a popular solution is mini-batch GNN training. However, mini-batch GNN training increases the training variance and sacrifices the model accuracy. In this paper, we propose a new memory-efficient GNN training method using spanning subgraph, called SpanGNN. SpanGNN trains GNN models over a sequence of spanning subgraphs, which are constructed from empty structure. To overcome the excessive peak memory consumption problem, SpanGNN selects a set of edges from the original graph to incrementally update the spanning subgraph between every epoch. To ensure the model accuracy, we introduce two types of edge sampling strategies (i.e., variance-reduced and noise-reduced), and help SpanGNN select high-quality edges for the GNN learning. We conduct experiments with SpanGNN on widely used datasets, demonstrating SpanGNN's advantages in the model performance and low peak memory usage.

6/10/2024