Label-based Graph Augmentation with Metapath for Graph Anomaly Detection

Read original: arXiv:2308.10918 - Published 4/15/2024 by Hwan Kim, Junghoon Kim, Byung Suk Lee, Sungsu Lim

❗

Overview

Graph anomaly detection is an important problem with applications in areas like network security and finance.
Existing unsupervised methods can detect anomalies, but the results may not be interesting due to lack of prior knowledge.
Using a few labeled anomalies as prior knowledge can help, but efficiently leveraging this limited information is challenging.

Plain English Explanation

Detecting anomalies, or unusual patterns, in graph data is important for many real-world applications. For example, graph anomaly detection can be used to identify suspicious activity in computer networks or detect financial fraud.

Most existing methods for graph anomaly detection are unsupervised, meaning they don't use any labeled examples of anomalies. While these methods can find unusual patterns, the results may not be particularly useful because the algorithm doesn't know what kind of anomalies it's looking for.

To address this, the researchers propose using a small number of labeled anomalies as a starting point. In many real-world scenarios, it's possible to get a few examples of anomalies, even if it's too expensive to label a large dataset. By leveraging these few labeled examples, the algorithm can focus on finding the types of anomalies that are actually interesting and relevant.

However, efficiently using this limited labeled information is challenging due to the inherent sparsity of anomalies in most graphs. The key innovation in this paper is a new approach called MGAD that uses "metapaths" to better capture the connectivity patterns between anomalous and normal nodes in the graph. MGAD then uses a graph neural network to propagate this contextual information and improve the detection of anomalies.

Technical Explanation

The core of the MGAD approach is a graph autoencoder architecture that uses dual encoders to capture both global and local metapath-based context information. The first encoder takes the entire graph as input and learns a representation of the overall connectivity patterns, including those related to the labeled anomalies. The second encoder focuses on the local neighborhood around each node, again leveraging the metapath-based context.

These dual encoders feed into a shared decoder that reconstructs the input graph. The intuition is that by forcing the model to accurately reconstruct the graph, it will learn representations that capture the important structural features related to anomalies. The researchers show that this approach outperforms state-of-the-art unsupervised and semi-supervised graph anomaly detection methods on a variety of real-world datasets.

Critical Analysis

A key strength of the MGAD approach is its ability to effectively leverage a small number of labeled anomalies as prior knowledge. This is an important practical consideration, as obtaining large labeled datasets for graph anomaly detection can be extremely costly and time-consuming.

That said, the paper does not provide a deep analysis of how the performance of MGAD scales as the number of labeled anomalies is varied. It would be interesting to understand the "sweet spot" in terms of the minimum number of labels required to achieve good results, as well as how robust the method is to noisy or incorrect labels.

Additionally, the paper focuses on static graph data, but many real-world graphs evolve over time. Extending MGAD to handle dynamic graphs could significantly broaden its applicability. Incorporating heterogeneous graph structure may also be a fruitful direction for future research.

Conclusion

This paper presents a novel graph anomaly detection method called MGAD that effectively leverages a small number of labeled anomalies as prior knowledge. By using metapaths to capture the connectivity patterns around anomalous nodes, MGAD is able to outperform state-of-the-art techniques on a variety of real-world datasets.

The ability to incorporate limited labeled information is a key strength of the MGAD approach and makes it a promising solution for practical graph anomaly detection problems. Further research to address scalability, dynamic graphs, and heterogeneous data could help unlock even broader applications of this technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

❗

Label-based Graph Augmentation with Metapath for Graph Anomaly Detection

Hwan Kim, Junghoon Kim, Byung Suk Lee, Sungsu Lim

Graph anomaly detection has attracted considerable attention from various domain ranging from network security to finance in recent years. Due to the fact that labeling is very costly, existing methods are predominately developed in an unsupervised manner. However, the detected anomalies may be found out uninteresting instances due to the absence of prior knowledge regarding the anomalies looking for. This issue may be solved by using few labeled anomalies as prior knowledge. In real-world scenarios, we can easily obtain few labeled anomalies. Efficiently leveraging labelled anomalies as prior knowledge is crucial for graph anomaly detection; however, this process remains challenging due to the inherently limited number of anomalies available. To address the problem, we propose a novel approach that leverages metapath to embed actual connectivity patterns between anomalous and normal nodes. To further efficiently exploit context information from metapath-based anomaly subgraph, we present a new framework, Metapath-based Graph Anomaly Detection (MGAD), incorporating GCN layers in both the dual-encoders and decoders to efficiently propagate context information between abnormal and normal nodes. Specifically, MGAD employs GNN-based graph autoencoder as its backbone network. Moreover, dual encoders capture the complex interactions and metapath-based context information between labeled and unlabeled nodes both globally and locally. Through a comprehensive set of experiments conducted on seven real-world networks, this paper demonstrates the superiority of the MGAD method compared to state-of-the-art techniques. The code is available at https://github.com/missinghwan/MGAD.

4/15/2024

❗

MetaGAD: Meta Representation Adaptation for Few-Shot Graph Anomaly Detection

Xiongxiao Xu, Kaize Ding, Canyu Chen, Kai Shu

Graph anomaly detection has long been an important problem in various domains pertaining to information security such as financial fraud, social spam and network intrusion. The majority of existing methods are performed in an unsupervised manner, as labeled anomalies in a large scale are often too expensive to acquire. However, the identified anomalies may turn out to be uninteresting data instances due to the lack of prior knowledge. In real-world scenarios, it is often feasible to obtain limited labeled anomalies, which have great potential to advance graph anomaly detection. However, the work exploring limited labeled anomalies and a large amount of unlabeled nodes in graphs to detect anomalies is relatively limited. Therefore, in this paper, we study an important problem of few-shot graph anomaly detection. Nonetheless, it is challenging to fully leverage the information of few-shot anomalous nodes due to the irregularity of anomalies and the overfitting issue in the few-shot learning. To tackle the above challenges, we propose a novel meta-learning based framework, MetaGAD, that learns to adapt the knowledge from self-supervised learning to few-shot supervised learning for graph anomaly detection. In specific, we formulate the problem as a bi-level optimization, ensuring MetaGAD converging to minimizing the validation loss, thus enhancing the generalization capacity. The comprehensive experiments on six real-world datasets with synthetic anomalies and organic anomalies (available in the datasets) demonstrate the effectiveness of MetaGAD in detecting anomalies with few-shot anomalies. The code is available at https://github.com/XiongxiaoXu/MetaGAD.

8/27/2024

🧠

Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection

Yuanchen Bei, Sheng Zhou, Jinke Shi, Yao Ma, Haishuai Wang, Jiajun Bu

Unsupervised graph anomaly detection aims at identifying rare patterns that deviate from the majority in a graph without the aid of labels, which is important for a variety of real-world applications. Recent advances have utilized Graph Neural Networks (GNNs) to learn effective node representations by aggregating information from neighborhoods. This is motivated by the hypothesis that nodes in the graph tend to exhibit consistent behaviors with their neighborhoods. However, such consistency can be disrupted by graph anomalies in multiple ways. Most existing methods directly employ GNNs to learn representations, disregarding the negative impact of graph anomalies on GNNs, resulting in sub-optimal node representations and anomaly detection performance. While a few recent approaches have redesigned GNNs for graph anomaly detection under semi-supervised label guidance, how to address the adverse effects of graph anomalies on GNNs in unsupervised scenarios and learn effective representations for anomaly detection are still under-explored. To bridge this gap, in this paper, we propose a simple yet effective framework for Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection (G3AD). Specifically, G3AD introduces two auxiliary networks along with correlation constraints to guard the GNNs from inconsistent information encoding. Furthermore, G3AD introduces an adaptive caching module to guard the GNNs from solely reconstructing the observed data that contains anomalies. Extensive experiments demonstrate that our proposed G3AD can outperform seventeen state-of-the-art methods on both synthetic and real-world datasets.

4/26/2024

❗

Generative Semi-supervised Graph Anomaly Detection

Hezhe Qiao, Qingsong Wen, Xiaoli Li, Ee-Peng Lim, Guansong Pang

This work considers a practical semi-supervised graph anomaly detection (GAD) scenario, where part of the nodes in a graph are known to be normal, contrasting to the extensively explored unsupervised setting with a fully unlabeled graph. We reveal that having access to the normal nodes, even just a small percentage of normal nodes, helps enhance the detection performance of existing unsupervised GAD methods when they are adapted to the semi-supervised setting. However, their utilization of these normal nodes is limited. In this paper, we propose a novel Generative GAD approach (namely GGAD) for the semi-supervised scenario to better exploit the normal nodes. The key idea is to generate pseudo anomaly nodes, referred to as 'outlier nodes', for providing effective negative node samples in training a discriminative one-class classifier. The main challenge here lies in the lack of ground truth information about real anomaly nodes. To address this challenge, GGAD is designed to leverage two important priors about the anomaly nodes -- asymmetric local affinity and egocentric closeness -- to generate reliable outlier nodes that assimilate anomaly nodes in both graph structure and feature representations. Comprehensive experiments on six real-world GAD datasets are performed to establish a benchmark for semi-supervised GAD and show that GGAD substantially outperforms state-of-the-art unsupervised and semi-supervised GAD methods with varying numbers of training normal nodes. Code will be made available at https://github.com/mala-lab/GGAD.

5/29/2024