Towards Graph Contrastive Learning: A Survey and Beyond

2405.11868

Published 5/21/2024 by Wei Ju, Yifan Wang, Yifang Qin, Zhengyang Mao, Zhiping Xiao, Junyu Luo, Junwei Yang, Yiyang Gu, Dongjie Wang, Qingqing Long and 3 others

cs.LG cs.AI cs.CE cs.IR cs.SI

➖

Abstract

In recent years, deep learning on graphs has achieved remarkable success in various domains. However, the reliance on annotated graph data remains a significant bottleneck due to its prohibitive cost and time-intensive nature. To address this challenge, self-supervised learning (SSL) on graphs has gained increasing attention and has made significant progress. SSL enables machine learning models to produce informative representations from unlabeled graph data, reducing the reliance on expensive labeled data. While SSL on graphs has witnessed widespread adoption, one critical component, Graph Contrastive Learning (GCL), has not been thoroughly investigated in the existing literature. Thus, this survey aims to fill this gap by offering a dedicated survey on GCL. We provide a comprehensive overview of the fundamental principles of GCL, including data augmentation strategies, contrastive modes, and contrastive optimization objectives. Furthermore, we explore the extensions of GCL to other aspects of data-efficient graph learning, such as weakly supervised learning, transfer learning, and related scenarios. We also discuss practical applications spanning domains such as drug discovery, genomics analysis, recommender systems, and finally outline the challenges and potential future directions in this field.

Create account to get full access

Overview

Deep learning on graphs has been successful in various domains, but relies heavily on annotated graph data, which is costly and time-intensive to obtain.
To address this challenge, self-supervised learning (SSL) on graphs has gained attention, enabling machine learning models to produce informative representations from unlabeled graph data.
One critical component of SSL on graphs, Graph Contrastive Learning (GCL), has not been thoroughly investigated in the existing literature.
This survey aims to fill this gap by providing a comprehensive overview of the fundamental principles of GCL, including data augmentation strategies, contrastive modes, and contrastive optimization objectives.
The survey also explores the extensions of GCL to other aspects of data-efficient graph learning, such as weakly supervised learning, transfer learning, and related scenarios.
Practical applications spanning domains like drug discovery, genomics analysis, and recommender systems are discussed, and the challenges and potential future directions in this field are outlined.

Plain English Explanation

Deep learning, a powerful type of artificial intelligence, has been very successful at working with graph-structured data, which is data that can be represented as a set of interconnected nodes and edges (like a social network or transportation network). However, these deep learning models on graphs rely heavily on having a lot of labeled data, which means data that has been manually annotated or classified by human experts.

Obtaining this labeled graph data can be extremely expensive and time-consuming, as it requires a lot of human effort. To get around this problem, researchers have been exploring a technique called self-supervised learning on graphs. With self-supervised learning, the models can learn useful representations of the graph data without needing any human-provided labels. One specific approach to self-supervised learning on graphs is called Graph Contrastive Learning (GCL).

This survey paper provides a detailed overview of GCL, explaining the key ideas behind it, such as how the models are trained to identify similar and dissimilar parts of the graph data. The paper also discusses how GCL can be applied to other graph learning tasks, like weakly supervised learning and transfer learning, and highlights some real-world applications in areas like drug discovery and recommender systems.

Overall, the goal of this survey is to give researchers and practitioners a comprehensive understanding of the current state of GCL and how it can be used to enable more efficient and effective machine learning on graph-structured data, even when labeled data is scarce.

Technical Explanation

The paper provides a thorough survey of Graph Contrastive Learning (GCL), a key component of self-supervised learning on graphs. The authors first explain the fundamental principles of GCL, including the strategies used for data augmentation, the different contrastive modes (e.g., node-level, graph-level), and the contrastive optimization objectives.

The authors then explore how GCL can be extended to other aspects of data-efficient graph learning, such as weakly supervised learning, where the models are trained on a mix of labeled and unlabeled data, and transfer learning, where the models are adapted to new domains or tasks.

The paper also covers practical applications of GCL across various domains, including drug discovery, genomics analysis, and recommender systems. These case studies demonstrate the versatility and effectiveness of GCL in real-world scenarios.

Finally, the authors discuss the challenges and potential future directions in the field of data-efficient graph learning, highlighting areas for further research and development, such as the integration of GCL with large language models and the exploration of more advanced data augmentation techniques.

Critical Analysis

The survey provides a comprehensive and well-structured overview of Graph Contrastive Learning (GCL), a crucial component of self-supervised learning on graphs. The authors have done an excellent job of highlighting the fundamental principles of GCL, including the data augmentation strategies, contrastive modes, and optimization objectives.

One potential limitation discussed in the paper is the need for further investigation into the theoretical underpinnings of GCL, particularly in terms of understanding the connections between different contrastive objectives and their impact on downstream task performance. Additionally, the authors acknowledge that the practical applications of GCL covered in the survey, while diverse, may not be exhaustive, and there could be other domains where GCL could prove valuable.

Another area for potential future research mentioned in the paper is the integration of GCL with large language models, which have shown remarkable performance in various tasks. Exploring how GCL can be combined with these powerful language models could lead to further advancements in data-efficient graph learning.

Overall, this survey offers a valuable contribution to the field of graph machine learning by providing a comprehensive and insightful analysis of Graph Contrastive Learning. The authors have done an excellent job of synthesizing the current state of the art and outlining the challenges and future research directions, which will undoubtedly be useful for both researchers and practitioners working in this area.

Conclusion

This survey paper provides a comprehensive overview of Graph Contrastive Learning (GCL), a crucial component of self-supervised learning on graphs. The authors have done an excellent job of explaining the fundamental principles of GCL, including data augmentation strategies, contrastive modes, and optimization objectives. Additionally, the paper explores how GCL can be extended to other aspects of data-efficient graph learning, such as weakly supervised learning and transfer learning, and highlights real-world applications in domains like drug discovery and recommender systems.

The survey also identifies key challenges and potential future directions in the field, such as the need for a deeper understanding of the theoretical underpinnings of GCL and the integration of GCL with large language models. Overall, this paper offers a valuable resource for researchers and practitioners working on graph-based machine learning, as it provides a thorough and insightful analysis of the current state of GCL and its implications for the future of data-efficient graph learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Dual-perspective Cross Contrastive Learning in Graph Transformers

Zelin Yao, Chuang Liu, Xueqi Ma, Mukun Chen, Jia Wu, Xiantao Cai, Bo Du, Wenbin Hu

Graph contrastive learning (GCL) is a popular method for leaning graph representations by maximizing the consistency of features across augmented views. Traditional GCL methods utilize single-perspective i.e. data or model-perspective) augmentation to generate positive samples, restraining the diversity of positive samples. In addition, these positive samples may be unreliable due to uncontrollable augmentation strategies that potentially alter the semantic information. To address these challenges, this paper proposed a innovative framework termed dual-perspective cross graph contrastive learning (DC-GCL), which incorporates three modifications designed to enhance positive sample diversity and reliability: 1) We propose dual-perspective augmentation strategy that provide the model with more diverse training data, enabling the model effective learning of feature consistency across different views. 2) From the data perspective, we slightly perturb the original graphs using controllable data augmentation, effectively preserving their semantic information. 3) From the model perspective, we enhance the encoder by utilizing more powerful graph transformers instead of graph neural networks. Based on the model's architecture, we propose three pruning-based strategies to slightly perturb the encoder, providing more reliable positive samples. These modifications collectively form the DC-GCL's foundation and provide more diverse and reliable training inputs, offering significant improvements over traditional GCL methods. Extensive experiments on various benchmarks demonstrate that DC-GCL consistently outperforms different baselines on various datasets and tasks.

6/4/2024

cs.LG cs.AI

👨‍🏫

Mixed Supervised Graph Contrastive Learning for Recommendation

Weizhi Zhang, Liangwei Yang, Zihe Song, Henry Peng Zou, Ke Xu, Yuanjie Zhu, Philip S. Yu

Recommender systems (RecSys) play a vital role in online platforms, offering users personalized suggestions amidst vast information. Graph contrastive learning aims to learn from high-order collaborative filtering signals with unsupervised augmentation on the user-item bipartite graph, which predominantly relies on the multi-task learning framework involving both the pair-wise recommendation loss and the contrastive loss. This decoupled design can cause inconsistent optimization direction from different losses, which leads to longer convergence time and even sub-optimal performance. Besides, the self-supervised contrastive loss falls short in alleviating the data sparsity issue in RecSys as it learns to differentiate users/items from different views without providing extra supervised collaborative filtering signals during augmentations. In this paper, we propose Mixed Supervised Graph Contrastive Learning for Recommendation (MixSGCL) to address these concerns. MixSGCL originally integrates the training of recommendation and unsupervised contrastive losses into a supervised contrastive learning loss to align the two tasks within one optimization direction. To cope with the data sparsity issue, instead unsupervised augmentation, we further propose node-wise and edge-wise mixup to mine more direct supervised collaborative filtering signals based on existing user-item interactions. Extensive experiments on three real-world datasets demonstrate that MixSGCL surpasses state-of-the-art methods, achieving top performance on both accuracy and efficiency. It validates the effectiveness of MixSGCL with our coupled design on supervised graph contrastive learning.

4/29/2024

cs.IR cs.LG

🔎

Community-Invariant Graph Contrastive Learning

Shiyin Tan, Dongyuan Li, Renhe Jiang, Ying Zhang, Manabu Okumura

Graph augmentation has received great attention in recent years for graph contrastive learning (GCL) to learn well-generalized node/graph representations. However, mainstream GCL methods often favor randomly disrupting graphs for augmentation, which shows limited generalization and inevitably leads to the corruption of high-level graph information, i.e., the graph community. Moreover, current knowledge-based graph augmentation methods can only focus on either topology or node features, causing the model to lack robustness against various types of noise. To address these limitations, this research investigated the role of the graph community in graph augmentation and figured out its crucial advantage for learnable graph augmentation. Based on our observations, we propose a community-invariant GCL framework to maintain graph community structure during learnable graph augmentation. By maximizing the spectral changes, this framework unifies the constraints of both topology and feature augmentation, enhancing the model's robustness. Empirical evidence on 21 benchmark datasets demonstrates the exclusive merits of our framework. Code is released on Github (https://github.com/ShiyinTan/CI-GCL.git).

5/3/2024

cs.LG cs.SI

🤖

A Survey of Data-Efficient Graph Learning

Wei Ju, Siyu Yi, Yifan Wang, Qingqing Long, Junyu Luo, Zhiping Xiao, Ming Zhang

Graph-structured data, prevalent in domains ranging from social networks to biochemical analysis, serve as the foundation for diverse real-world systems. While graph neural networks demonstrate proficiency in modeling this type of data, their success is often reliant on significant amounts of labeled data, posing a challenge in practical scenarios with limited annotation resources. To tackle this problem, tremendous efforts have been devoted to enhancing graph machine learning performance under low-resource settings by exploring various approaches to minimal supervision. In this paper, we introduce a novel concept of Data-Efficient Graph Learning (DEGL) as a research frontier, and present the first survey that summarizes the current progress of DEGL. We initiate by highlighting the challenges inherent in training models with large labeled data, paving the way for our exploration into DEGL. Next, we systematically review recent advances on this topic from several key aspects, including self-supervised graph learning, semi-supervised graph learning, and few-shot graph learning. Also, we state promising directions for future research, contributing to the evolution of graph machine learning.

6/21/2024

cs.LG cs.AI cs.SI