Towards Generalised Pre-Training of Graph Models

Read original: arXiv:2311.03976 - Published 5/15/2024 by Alex O. Davies, Riku W. Green, Nirav S. Ajmeri, Telmo M. Silva Filho

👨‍🏫

Overview

Unsupervised representation learning can help when data or labels are scarce
Existing graph representation learning approaches are domain-specific
This work presents a new method called Topology Only Pre-Training (ToP) that can transfer across domains

Plain English Explanation

Unsupervised representation learning is a powerful technique where a machine learning model is first trained on a large, unlabeled dataset. This pre-trained model can then be "fine-tuned" on a smaller, labeled dataset relevant to the task at hand. This is useful when you don't have much labeled data available.

Previous methods for learning representations of graph-structured data, like social networks or molecular structures, were limited to a specific domain. The node and edge features in the pre-training data had to be similar to the target data for the model to work well.

The new Topology Only Pre-Training (ToP) method gets around this limitation. It separates the process of learning the graph topology (the connections between nodes) from learning the node and edge features. By pre-training the model only on the graph topology, across multiple domains, it can then be applied to a wide variety of target datasets, even if the node and edge features are very different.

The results show that these ToP models outperform other approaches, especially when the target data has different features than the pre-training data. This suggests that the general patterns in graph topology can be more useful for transfer learning than domain-specific features.

Technical Explanation

The key innovation in this work is the Topology Only Pre-Training (ToP) method. Typically, graph representation learning approaches learn both the graph topology (connections between nodes) and the node/edge features in a single step. ToP separates this into two stages:

Pre-training on the graph topology alone, using contrastive learning to learn topological representations.
Fine-tuning on the target dataset, which may have very different node and edge features.

This allows the pre-trained model to transfer effectively to multiple domains, even when the target data has different characteristics than the pre-training data.

The experiments show that ToP models perform significantly better than supervised baselines on 75% of evaluation tasks. When node and edge features are used, ToP outperforms single-domain or non-pre-trained models on 85.7% of tasks.

Interestingly, the results also indicate that pre-training on topologies from unrelated domains can sometimes be more beneficial than pre-training on topologies from the target domain. For example, pre-training on non-molecular graphs led to better transfer on molecular benchmarks compared to pre-training on molecular graphs.

Critical Analysis

The paper provides a compelling approach to enable cross-domain transfer learning for graph-structured data. However, there are a few potential limitations and areas for further research:

The experiments focus on relatively small, curated datasets. It's unclear how well the ToP method would scale to larger, noisier real-world graphs.
The paper does not explore the limits of how different the pre-training and target domains can be before performance degrades. More research is needed to understand the boundaries of effective cross-domain transfer.
The analysis of why out-of-domain topologies can sometimes be more useful is rather limited. Further investigation into the influence of graph topology on learning could provide deeper insights.

Overall, the ToP method represents an important step towards more flexible and generalizable graph representation learning. With further development, it could have significant impacts on a wide range of applications that rely on graph-structured data.

Conclusion

This paper introduces a novel pre-training approach for graph representation learning called Topology Only Pre-Training (ToP). By separating the learning of graph topology from node/edge features, ToP models can effectively transfer to target datasets across multiple domains, even when the features differ significantly from the pre-training data.

The results demonstrate the power of this approach, with ToP models outperforming supervised baselines and single-domain models in the majority of experiments. This suggests that the general patterns in graph topology can be more useful for transfer learning than domain-specific features.

While there are some limitations to explore, the ToP method represents an important advance in making graph representation learning more flexible and applicable to a wider range of real-world problems. As the field continues to evolve, techniques like this will be crucial for unlocking the full potential of graph-based machine learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👨‍🏫

Towards Generalised Pre-Training of Graph Models

Alex O. Davies, Riku W. Green, Nirav S. Ajmeri, Telmo M. Silva Filho

The principal benefit of unsupervised representation learning is that a pre-trained model can be fine-tuned where data or labels are scarce. Existing approaches for graph representation learning are domain specific, maintaining consistent node and edge features across the pre-training and target datasets. This has precluded transfer to multiple domains. In this work we present Topology Only Pre-Training, a graph pre-training method based on node and edge feature exclusion. Separating graph learning into two stages, topology and features, we use contrastive learning to pre-train models over multiple domains. These models show positive transfer on evaluation datasets from multiple domains, including domains not present in pre-training data. On 75% of experiments, ToP models perform significantly ($P leq 0.01$) better than a supervised baseline. These results include when node and edge features are used in evaluation, where performance is significantly better on 85.7% of tasks compared to single-domain or non-pre-trained models. We further show that out-of-domain topologies can produce more useful pre-training than in-domain. We show better transfer from non-molecule pre-training, compared to molecule pre-training, on 79% of molecular benchmarks.

5/15/2024

📉

There is more to graphs than meets the eye: Learning universal features with self-supervision

Laya Das, Sai Munikoti, Nrushad Joshi, Mahantesh Halappanavar

We study the problem of learning features through self-supervision that are generalisable to multiple graphs. State-of-the-art graph self-supervision restricts training to only one graph, resulting in graph-specific models that are incompatible with different but related graphs. We hypothesize that training with more than one graph that belong to the same family can improve the quality of the learnt representations. However, learning universal features from disparate node/edge features in different graphs is non-trivial. To address this challenge, we first homogenise the disparate features with graph-specific encoders that transform the features into a common space. A universal representation learning module then learns generalisable features on this common space. We show that compared to traditional self-supervision with one graph, our approach results in (1) better performance on downstream node classification, (2) learning features that can be re-used for unseen graphs of the same family, (3) more efficient training and (4) compact yet generalisable models. We also show ability of the proposed framework to deliver these benefits for relatively larger graphs. In this paper, we present a principled way to design foundation graph models that learn from more than one graph in an end-to-end manner, while bridging the gap between self-supervised and supervised performance.

7/31/2024

Unsupervised Generative Feature Transformation via Graph Contrastive Pre-training and Multi-objective Fine-tuning

Wangyang Ying, Dongjie Wang, Xuanming Hu, Yuanchun Zhou, Charu C. Aggarwal, Yanjie Fu

Feature transformation is to derive a new feature set from original features to augment the AI power of data. In many science domains such as material performance screening, while feature transformation can model material formula interactions and compositions and discover performance drivers, supervised labels are collected from expensive and lengthy experiments. This issue motivates an Unsupervised Feature Transformation Learning (UFTL) problem. Prior literature, such as manual transformation, supervised feedback guided search, and PCA, either relies on domain knowledge or expensive supervised feedback, or suffers from large search space, or overlooks non-linear feature-feature interactions. UFTL imposes a major challenge on existing methods: how to design a new unsupervised paradigm that captures complex feature interactions and avoids large search space? To fill this gap, we connect graph, contrastive, and generative learning to develop a measurement-pretrain-finetune paradigm for UFTL. For unsupervised feature set utility measurement, we propose a feature value consistency preservation perspective and develop a mean discounted cumulative gain like unsupervised metric to evaluate feature set utility. For unsupervised feature set representation pretraining, we regard a feature set as a feature-feature interaction graph, and develop an unsupervised graph contrastive learning encoder to embed feature sets into vectors. For generative transformation finetuning, we regard a feature set as a feature cross sequence and feature transformation as sequential generation. We develop a deep generative feature transformation model that coordinates the pretrained feature set encoder and the gradient information extracted from a feature set utility evaluator to optimize a transformed feature generator.

5/28/2024

Generalizing Graph Transformers Across Diverse Graphs and Tasks via Pre-Training on Industrial-Scale Data

Yufei He, Zhenyu Hou, Yukuo Cen, Feng He, Xu Cheng, Bryan Hooi

Graph pre-training has been concentrated on graph-level on small graphs (e.g., molecular graphs) or learning node representations on a fixed graph. Extending graph pre-trained models to web-scale graphs with billions of nodes in industrial scenarios, while avoiding negative transfer across graphs or tasks, remains a challenge. We aim to develop a general graph pre-trained model with inductive ability that can make predictions for unseen new nodes and even new graphs. In this work, we introduce a scalable transformer-based graph pre-training framework called PGT (Pre-trained Graph Transformer). Specifically, we design a flexible and scalable graph transformer as the backbone network. Meanwhile, based on the masked autoencoder architecture, we design two pre-training tasks: one for reconstructing node features and the other one for reconstructing local structures. Unlike the original autoencoder architecture where the pre-trained decoder is discarded, we propose a novel strategy that utilizes the decoder for feature augmentation. We have deployed our framework on Tencent's online game data. Extensive experiments have demonstrated that our framework can perform pre-training on real-world web-scale graphs with over 540 million nodes and 12 billion edges and generalizes effectively to unseen new graphs with different downstream tasks. We further conduct experiments on the publicly available ogbn-papers100M dataset, which consists of 111 million nodes and 1.6 billion edges. Our framework achieves state-of-the-art performance on both industrial datasets and public datasets, while also enjoying scalability and efficiency.

9/16/2024