Generative Semi-supervised Graph Anomaly Detection

2402.11887

YC

0

Reddit

0

Published 5/29/2024 by Hezhe Qiao, Qingsong Wen, Xiaoli Li, Ee-Peng Lim, Guansong Pang

Abstract

This work considers a practical semi-supervised graph anomaly detection (GAD) scenario, where part of the nodes in a graph are known to be normal, contrasting to the extensively explored unsupervised setting with a fully unlabeled graph. We reveal that having access to the normal nodes, even just a small percentage of normal nodes, helps enhance the detection performance of existing unsupervised GAD methods when they are adapted to the semi-supervised setting. However, their utilization of these normal nodes is limited. In this paper, we propose a novel Generative GAD approach (namely GGAD) for the semi-supervised scenario to better exploit the normal nodes. The key idea is to generate pseudo anomaly nodes, referred to as 'outlier nodes', for providing effective negative node samples in training a discriminative one-class classifier. The main challenge here lies in the lack of ground truth information about real anomaly nodes. To address this challenge, GGAD is designed to leverage two important priors about the anomaly nodes -- asymmetric local affinity and egocentric closeness -- to generate reliable outlier nodes that assimilate anomaly nodes in both graph structure and feature representations. Comprehensive experiments on six real-world GAD datasets are performed to establish a benchmark for semi-supervised GAD and show that GGAD substantially outperforms state-of-the-art unsupervised and semi-supervised GAD methods with varying numbers of training normal nodes. Code will be made available at https://github.com/mala-lab/GGAD.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper presents a novel approach called Generative GAD (GGAD) for semi-supervised graph anomaly detection.
  • Most existing graph anomaly detection methods are unsupervised, but this work considers a practical scenario where some normal nodes in the graph are labeled.
  • The key idea of GGAD is to generate outlier nodes that resemble anomaly nodes in both local structure and node representations, which can then be used as effective negative samples to train a one-class classifier.

Plain English Explanation

In this work, the researchers looked at the problem of detecting anomalies or unusual nodes in a graph-structured dataset, such as a social network or a biological network. Most existing methods for this task are unsupervised, meaning they try to find anomalies without any prior information about what normal nodes look like.

However, the researchers argue that in many real-world scenarios, we may have access to at least some normal nodes that we know are not anomalies. For example, in a social network, we might know that certain users are regular, everyday users rather than bots or bad actors. The researchers wanted to explore how we can leverage this additional information about normal nodes to improve the detection of anomalies.

Their key insight was to generate synthetic outlier nodes that look similar to real anomalies in terms of their connections to other nodes and their internal representations. By training a one-class classifier to distinguish these generated outliers from the known normal nodes, the researchers found they could achieve better anomaly detection performance compared to existing unsupervised and semi-supervised methods.

The motivation behind this approach is that having access to some normal nodes allows the model to learn what normal looks like, and then it can use that knowledge to better identify anomalies. The generated outliers act as useful negative examples to help the classifier draw a clear boundary between normal and anomalous nodes.

Technical Explanation

The researchers proposed a novel Generative GAD (GGAD) approach for the semi-supervised graph anomaly detection scenario. In this setting, a portion of the nodes in the graph are known to be normal, in contrast to the fully unsupervised setting assumed by most previous graph anomaly detection studies.

The key idea behind GGAD is to generate outlier nodes that have a similar local structure and node representations as actual anomaly nodes. These generated outliers can then be used as effective negative samples to train a discriminative one-class classifier that distinguishes normal nodes from anomalies.

To achieve this, the GGAD model incorporates several novel components:

  1. A graph structure-aware generator that can produce outlier nodes with asymmetric affinity separability from normal nodes.
  2. An egocentric closeness constraint that ensures the generated outliers are enforced to be similar to normal nodes in the node representation space.
  3. A one-class classifier trained on the normal nodes and the generated outliers to detect anomalies.

The researchers evaluated GGAD on four real-world graph datasets and found that it substantially outperformed existing unsupervised and semi-supervised graph anomaly detection methods, especially when the number of labeled normal nodes was limited.

Critical Analysis

The researchers acknowledge several limitations and areas for future work in their paper. One key limitation is that the performance of GGAD depends on the quality of the generated outlier nodes, and the generator's ability to produce realistic-looking anomalies. If the generated outliers do not sufficiently capture the characteristics of real anomalies, the one-class classifier may not be as effective.

Additionally, the paper does not provide a detailed analysis of the computational complexity and training efficiency of the GGAD approach, which could be an important practical consideration, especially for large-scale graph datasets.

Another potential issue is that the paper only evaluates GGAD on static graph datasets, and it's unclear how the approach would perform on dynamic graphs where the structure and node attributes may change over time. Extending GGAD to handle temporal graph anomaly detection could be an interesting direction for future research.

Overall, the GGAD approach is a promising contribution to the field of semi-supervised graph anomaly detection, but further research is needed to address its limitations and explore its broader applicability.

Conclusion

This paper presents a novel Generative GAD (GGAD) approach for semi-supervised graph anomaly detection, where a portion of the nodes in the graph are known to be normal. GGAD leverages this additional information by generating outlier nodes that resemble actual anomalies, which can then be used as effective negative samples to train a one-class classifier.

The key innovation of GGAD is its ability to generate graph structure-aware outlier nodes that have asymmetric affinity separability from normal nodes, while also being enforced to be similar to normal nodes in the node representation space. This allows GGAD to outperform existing unsupervised and semi-supervised graph anomaly detection methods, especially when the number of labeled normal nodes is limited.

The successful application of GGAD demonstrates the potential of generative approaches to enhance semi-supervised learning tasks on graph-structured data. This work could inspire further research into incorporating domain knowledge and leveraging limited supervision to tackle challenging problems in graph mining and analysis.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🧠

Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection

Yuanchen Bei, Sheng Zhou, Jinke Shi, Yao Ma, Haishuai Wang, Jiajun Bu

YC

0

Reddit

0

Unsupervised graph anomaly detection aims at identifying rare patterns that deviate from the majority in a graph without the aid of labels, which is important for a variety of real-world applications. Recent advances have utilized Graph Neural Networks (GNNs) to learn effective node representations by aggregating information from neighborhoods. This is motivated by the hypothesis that nodes in the graph tend to exhibit consistent behaviors with their neighborhoods. However, such consistency can be disrupted by graph anomalies in multiple ways. Most existing methods directly employ GNNs to learn representations, disregarding the negative impact of graph anomalies on GNNs, resulting in sub-optimal node representations and anomaly detection performance. While a few recent approaches have redesigned GNNs for graph anomaly detection under semi-supervised label guidance, how to address the adverse effects of graph anomalies on GNNs in unsupervised scenarios and learn effective representations for anomaly detection are still under-explored. To bridge this gap, in this paper, we propose a simple yet effective framework for Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection (G3AD). Specifically, G3AD introduces two auxiliary networks along with correlation constraints to guard the GNNs from inconsistent information encoding. Furthermore, G3AD introduces an adaptive caching module to guard the GNNs from solely reconstructing the observed data that contains anomalies. Extensive experiments demonstrate that our proposed G3AD can outperform seventeen state-of-the-art methods on both synthetic and real-world datasets.

Read more

4/26/2024

Open-Set Graph Anomaly Detection via Normal Structure Regularisation

Qizhou Wang, Guansong Pang, Mahsa Salehi, Xiaokun Xia, Christopher Leckie

YC

0

Reddit

0

This paper considers an important Graph Anomaly Detection (GAD) task, namely open-set GAD, which aims to train a detection model using a small number of normal and anomaly nodes (referred to as seen anomalies) to detect both seen anomalies and unseen anomalies (i.e., anomalies that cannot be illustrated the training anomalies). The availability of those labelled training data provides crucial prior knowledge about abnormalities for GAD models, enabling substantially reduced detection errors. However, current methods tend to over-emphasise fitting the seen anomalies, leading to a weak generalisation ability to detect the unseen anomalies. Further, they were introduced to handle Euclidean data, failing to effectively capture important information on graph structure and node attributes for GAD. In this work, we propose a novel open-set GAD approach, namely Normal Structure Regularisation (NSReg) to achieve generalised detection ability to unseen anomalies, while maintaining its effectiveness on detecting seen anomalies. The key idea in NSReg is to introduce a regularisation term that enforces the learning of compact, semantically-rich representations of normal nodes based on their structural relations to other nodes. When being optimised with supervised anomaly detection losses, the regularisation term helps incorporate strong normality into the modelling, and thus, it effectively avoids the overfitting the seen anomalies solely. In doing so, it helps learn better normality decision boundary, reducing the errors of detecting unseen anomalies as normal. Extensive empirical results on seven real-world datasets show the superiority of NSReg for open-set GAD.

Read more

6/4/2024

🌐

Spatial-aware Attention Generative Adversarial Network for Semi-supervised Anomaly Detection in Medical Image

Zerui Zhang, Zhichao Sun, Zelong Liu, Bo Du, Rui Yu, Zhou Zhao, Yongchao Xu

YC

0

Reddit

0

Medical anomaly detection is a critical research area aimed at recognizing abnormal images to aid in diagnosis.Most existing methods adopt synthetic anomalies and image restoration on normal samples to detect anomaly. The unlabeled data consisting of both normal and abnormal data is not well explored. We introduce a novel Spatial-aware Attention Generative Adversarial Network (SAGAN) for one-class semi-supervised generation of health images.Our core insight is the utilization of position encoding and attention to accurately focus on restoring abnormal regions and preserving normal regions. To fully utilize the unlabelled data, SAGAN relaxes the cyclic consistency requirement of the existing unpaired image-to-image conversion methods, and generates high-quality health images corresponding to unlabeled data, guided by the reconstruction of normal images and restoration of pseudo-anomaly images.Subsequently, the discrepancy between the generated healthy image and the original image is utilized as an anomaly score.Extensive experiments on three medical datasets demonstrate that the proposed SAGAN outperforms the state-of-the-art methods.

Read more

5/22/2024

SmoothGNN: Smoothing-based GNN for Unsupervised Node Anomaly Detection

SmoothGNN: Smoothing-based GNN for Unsupervised Node Anomaly Detection

Xiangyu Dong, Xingyi Zhang, Yanni Sun, Lei Chen, Mingxuan Yuan, Sibo Wang

YC

0

Reddit

0

The smoothing issue leads to indistinguishable node representations, which poses a significant challenge in the field of graph learning. However, this issue also presents an opportunity to reveal underlying properties behind different types of nodes, which have been overlooked in previous studies. Through empirical and theoretical analysis of real-world node anomaly detection (NAD) datasets, we observe that anomalous and normal nodes show different patterns in the smoothing process, which can be leveraged to enhance NAD tasks. Motivated by these findings, in this paper, we propose a novel unsupervised NAD framework. Specifically, according to our theoretical analysis, we design a Smoothing Learning Component. Subsequently, we introduce a Smoothing-aware Spectral Graph Neural Network, which establishes the connection between the spectral space of graphs and the smoothing process. Additionally, we demonstrate that the Dirichlet Energy, which reflects the smoothness of a graph, can serve as coefficients for node representations across different dimensions of the spectral space. Building upon these observations and analyses, we devise a novel anomaly measure for the NAD task. Extensive experiments on 9 real-world datasets show that SmoothGNN outperforms the best rival by an average of 14.66% in AUC and 7.28% in Precision, with 75x running time speed-up, which validates the effectiveness and efficiency of our framework.

Read more

5/29/2024