Truncated Affinity Maximization: One-class Homophily Modeling for Graph Anomaly Detection

Read original: arXiv:2306.00006 - Published 4/5/2024 by Hezhe Qiao, Guansong Pang

❗

Overview

The paper reveals a one-class homophily phenomenon in real-world graph anomaly detection (GAD) datasets, where normal nodes tend to have strong connections with each other, while abnormal nodes have significantly weaker homophily.
Existing GAD methods typically focus on data reconstruction, ignoring this anomaly-discriminative property.
The authors propose a novel unsupervised anomaly scoring measure called "local node affinity" that assigns higher scores to nodes with weaker affinity to their neighbors.
They further introduce "Truncated Affinity Maximization" (TAM) to learn tailored node representations that maximize the local affinity of normal nodes while minimizing that of abnormal nodes.

Plain English Explanation

In the context of graph anomaly detection, the researchers found an interesting pattern in real-world datasets: normal nodes (non-anomalous nodes) tend to have strong connections with each other, while abnormal nodes (anomalous nodes) have much weaker connections to their neighbors.

However, existing methods for detecting anomalies in graphs don't take advantage of this property. Instead, they focus on trying to reconstruct the entire graph, which may not be the best approach for identifying unusual or problematic nodes.

To address this, the researchers developed a new way to measure node anomaly. Their "local node affinity" score looks at how similar a node is to its neighbors. Nodes with weaker connections to their neighbors are assigned a higher anomaly score.

The researchers also created a technique called "Truncated Affinity Maximization" (TAM) to learn representations of the nodes that amplify this anomaly-discriminative property. TAM optimizes the node representations to maximize the local affinity of normal nodes while minimizing it for abnormal nodes.

Technical Explanation

The key idea in this paper is to leverage a one-class homophily phenomenon observed in real-world graph anomaly detection (GAD) datasets. The authors found that normal nodes tend to have strong connections with each other (high homophily), while the homophily in abnormal nodes is significantly weaker.

To capture this anomaly-discriminative property, the authors introduce a novel unsupervised anomaly scoring measure called local node affinity. This score reflects how similar a node is to its neighbors, with lower affinity indicating a higher anomaly score.

To further exploit this property, the authors propose Truncated Affinity Maximization (TAM). TAM learns tailored node representations by maximizing the local affinity of normal nodes and minimizing it for abnormal nodes. Crucially, TAM optimizes on a truncated graph where non-homophily edges (i.e., edges connecting normal and abnormal nodes) are removed iteratively to mitigate the bias introduced by these edges.

The authors evaluate TAM on 10 real-world GAD datasets and show that it substantially outperforms seven competing models, achieving over 10% increase in AUROC/AUPRC compared to the best contenders on challenging datasets. This demonstrates the effectiveness of leveraging the one-class homophily property for [object Object] graph anomaly detection.

Critical Analysis

The paper makes a compelling case for exploiting the one-class homophily phenomenon observed in real-world GAD datasets. The proposed TAM approach effectively learns node representations that amplify this anomaly-discriminative property, leading to significant performance gains over existing methods.

That said, the paper does not extensively discuss the potential reasons or origins of this one-class homophily phenomenon. Understanding the underlying factors that give rise to this property could lead to further insights and improvements in graph anomaly detection.

Additionally, the authors mention that optimizing on the original graph structure can be biased by non-homophily edges. While the iterative truncation approach helps mitigate this issue, it would be interesting to explore alternative ways of handling such edges, perhaps through more sophisticated graph neural network architectures or attention mechanisms.

Finally, the authors do not provide a deep analysis of the limitations or failure cases of their TAM method. Understanding the scenarios where TAM may struggle or perform suboptimally could guide future research directions and the development of more robust anomaly detection techniques for graphs.

Conclusion

This paper makes an important contribution to the field of graph anomaly detection by revealing a previously overlooked one-class homophily phenomenon in real-world datasets and leveraging it to develop a novel anomaly scoring and representation learning approach.

The proposed Truncated Affinity Maximization (TAM) method significantly outperforms existing techniques, demonstrating the value of exploiting anomaly-discriminative properties of graphs. This research highlights the importance of carefully analyzing the underlying characteristics of real-world data and using these insights to design more effective anomaly detection algorithms.

The findings in this paper could have broader implications for other graph-based machine learning tasks, such as community detection, image segmentation, and multi-relational learning, where exploiting anomaly-discriminative properties of the graph structure could lead to significant performance improvements.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

❗

Truncated Affinity Maximization: One-class Homophily Modeling for Graph Anomaly Detection

Hezhe Qiao, Guansong Pang

We reveal a one-class homophily phenomenon, which is one prevalent property we find empirically in real-world graph anomaly detection (GAD) datasets, i.e., normal nodes tend to have strong connection/affinity with each other, while the homophily in abnormal nodes is significantly weaker than normal nodes. However, this anomaly-discriminative property is ignored by existing GAD methods that are typically built using a conventional anomaly detection objective, such as data reconstruction. In this work, we explore this property to introduce a novel unsupervised anomaly scoring measure for GAD, local node affinity, that assigns a larger anomaly score to nodes that are less affiliated with their neighbors, with the affinity defined as similarity on node attributes/representations. We further propose Truncated Affinity Maximization (TAM) that learns tailored node representations for our anomaly measure by maximizing the local affinity of nodes to their neighbors. Optimizing on the original graph structure can be biased by nonhomophily edges (i.e., edges connecting normal and abnormal nodes). Thus, TAM is instead optimized on truncated graphs where non-homophily edges are removed iteratively to mitigate this bias. The learned representations result in significantly stronger local affinity for normal nodes than abnormal nodes. Extensive empirical results on 10 real-world GAD datasets show that TAM substantially outperforms seven competing models, achieving over 10% increase in AUROC/AUPRC compared to the best contenders on challenging datasets. Our code is available at https://github.com/mala-lab/TAM-master/.

4/5/2024

🔗

Restructuring Graph for Higher Homophily via Adaptive Spectral Clustering

Shouheng Li, Dongwoo Kim, Qing Wang

While a growing body of literature has been studying new Graph Neural Networks (GNNs) that work on both homophilic and heterophilic graphs, little has been done on adapting classical GNNs to less-homophilic graphs. Although the ability to handle less-homophilic graphs is restricted, classical GNNs still stand out in several nice properties such as efficiency, simplicity, and explainability. In this work, we propose a novel graph restructuring method that can be integrated into any type of GNNs, including classical GNNs, to leverage the benefits of existing GNNs while alleviating their limitations. Our contribution is threefold: a) learning the weight of pseudo-eigenvectors for an adaptive spectral clustering that aligns well with known node labels, b) proposing a new density-aware homophilic metric that is robust to label imbalance, and c) reconstructing the adjacency matrix based on the result of adaptive spectral clustering to maximize the homophilic scores. The experimental results show that our graph restructuring method can significantly boost the performance of six classical GNNs by an average of 25% on less-homophilic graphs. The boosted performance is comparable to state-of-the-art methods.

4/30/2024

What Is Missing In Homophily? Disentangling Graph Homophily For Graph Neural Networks

Yilun Zheng, Sitao Luan, Lihui Chen

Graph homophily refers to the phenomenon that connected nodes tend to share similar characteristics. Understanding this concept and its related metrics is crucial for designing effective Graph Neural Networks (GNNs). The most widely used homophily metrics, such as edge or node homophily, quantify such similarity as label consistency across the graph topology. These metrics are believed to be able to reflect the performance of GNNs, especially on node-level tasks. However, many recent studies have empirically demonstrated that the performance of GNNs does not always align with homophily metrics, and how homophily influences GNNs still remains unclear and controversial. Then, a crucial question arises: What is missing in our current understanding of homophily? To figure out the missing part, in this paper, we disentangle the graph homophily into $3$ aspects: label, structural, and feature homophily, providing a more comprehensive understanding of GNN performance. To investigate their synergy, we propose a Contextual Stochastic Block Model with $3$ types of Homophily (CSBM-3H), where the topology and feature generation are controlled by the $3$ metrics. Based on the theoretical analysis of CSBM-3H, we derive a new composite metric, named Tri-Hom, that considers all $3$ aspects and overcomes the limitations of conventional homophily metrics. The theoretical conclusions and the effectiveness of Tri-Hom have been verified through synthetic experiments on CSBM-3H. In addition, we conduct experiments on $31$ real-world benchmark datasets and calculate the correlations between homophily metrics and model performance. Tri-Hom has significantly higher correlation values than $17$ existing metrics that only focus on a single homophily aspect, demonstrating its superiority and the importance of homophily synergy. Our code is available at url{https://github.com/zylMozart/Disentangle_GraphHom}.

6/28/2024

Exploiting Global Graph Homophily for Generalized Defense in Graph Neural Networks

Duanyu Li, Huijun Wu, Min Xie, Xugang Wu, Zhenwei Wu, Wenzhe Zhang

Graph neural network (GNN) models play a pivotal role in numerous tasks involving graph-related data analysis. Despite their efficacy, similar to other deep learning models, GNNs are susceptible to adversarial attacks. Even minor perturbations in graph data can induce substantial alterations in model predictions. While existing research has explored various adversarial defense techniques for GNNs, the challenge of defending against adversarial attacks on real-world scale graph data remains largely unresolved. On one hand, methods reliant on graph purification and preprocessing tend to excessively emphasize local graph information, leading to sub-optimal defensive outcomes. On the other hand, approaches rooted in graph structure learning entail significant time overheads, rendering them impractical for large-scale graphs. In this paper, we propose a new defense method named Talos, which enhances the global, rather than local, homophily of graphs as a defense. Experiments show that the proposed approach notably outperforms state-of-the-art defense approaches, while imposing little computational overhead.

8/23/2024