OCGEC: One-class Graph Embedding Classification for DNN Backdoor Detection

Read original: arXiv:2312.01585 - Published 4/9/2024 by Haoyu Jiang, Haiyang Yu, Nan Li, Ping Yi

OCGEC: One-class Graph Embedding Classification for DNN Backdoor Detection

Overview

The paper introduces OCGEC, a one-class graph embedding classification approach for detecting backdoor attacks in deep neural networks (DNNs).
Backdoor attacks involve injecting malicious triggers into a DNN model during the training process, causing the model to behave abnormally when the trigger is present.
OCGEC aims to detect these backdoor attacks by modeling the DNN's internal representation as a graph and using one-class classification to identify anomalies.

Plain English Explanation

Artificial intelligence (AI) models, like deep neural networks (DNNs), are widely used in various applications, from image recognition to language processing. However, these models can be vulnerable to a type of attack called a "backdoor attack." In a backdoor attack, the attacker sneaks in a hidden trigger during the training process, which causes the model to behave strangely when that trigger is present, even though the model works fine otherwise.

The OCGEC approach tackles this problem by representing the internal workings of the DNN as a graph. The researchers then use a one-class classification technique to identify if this graph looks "abnormal," which could indicate the presence of a backdoor. This is like looking for unusual patterns in the way the model processes information, which could be a sign that something is wrong.

The key idea is that a normal, uncompromised DNN model will have a certain structure and pattern to its internal representation, and a backdoored model will have a different, abnormal structure. By capturing this difference, OCGEC can detect the presence of a backdoor without needing to know what the backdoor trigger is or what the model is supposed to do normally.

Technical Explanation

The OCGEC approach works by first converting the DNN model into a graph representation, where the nodes represent the model's internal layers and the edges represent the connections between them. This graph-based representation allows the researchers to capture the intricate structure and relationships within the DNN.

Next, OCGEC uses a graph embedding technique to convert this graph into a numerical vector representation. This vector can then be fed into a one-class classifier, which is trained on examples of "normal" (i.e., uncompromised) DNN models. The one-class classifier learns to identify the typical characteristics of a normal DNN model and can then be used to detect anomalies, which may indicate the presence of a backdoor.

The key insight is that a backdoored DNN model will have a different graph structure compared to a normal model, and this difference will be reflected in the graph embedding vector. By training the one-class classifier on normal models, OCGEC can learn to recognize these structural anomalies and flag potentially backdoored models.

The researchers evaluated OCGEC on various DNN architectures and backdoor attack scenarios, demonstrating its effectiveness in detecting a wide range of backdoor attacks with high accuracy. The approach also showed robustness to different model architectures and backdoor attack settings, making it a promising tool for securing DNN-based systems.

Critical Analysis

The OCGEC paper presents a novel and promising approach for detecting backdoor attacks in DNN models. By leveraging the graph-based representation of the DNN and one-class classification, the method can identify anomalies without requiring knowledge of the specific backdoor trigger or the model's expected behavior.

However, the paper also acknowledges several limitations and areas for further research. For example, the authors note that OCGEC may be less effective in scenarios where the backdoor trigger is very subtle or when the attacker carefully crafts the backdoor to mimic the behavior of a normal model. Additionally, the approach currently relies on the availability of a sufficient number of "normal" DNN models for training the one-class classifier, which may not always be feasible in real-world deployments.

Further research could explore ways to enhance OCGEC's robustness, such as by incorporating additional features or developing more advanced one-class classification techniques. It would also be valuable to investigate the scalability and efficiency of the approach, particularly as the size and complexity of DNN models continue to grow.

Another potential area for improvement is the interpretability of the OCGEC system. While the graph-based representation provides some insight into the structural changes introduced by a backdoor, it may be beneficial to develop methods that can more clearly explain the reasons behind the anomaly detection, making it easier for users to understand and trust the system's decisions.

Overall, the OCGEC paper presents an interesting and valuable contribution to the field of DNN security, highlighting the potential of graph-based techniques for addressing the critical challenge of backdoor attacks. As the use of AI models continues to expand, developing robust and interpretable approaches for detecting and mitigating such threats will become increasingly important.

Conclusion

The OCGEC paper introduces a novel approach for detecting backdoor attacks in deep neural networks (DNNs) by modeling the DNN's internal representation as a graph and using one-class classification to identify anomalies.

The key innovation of OCGEC is its ability to detect backdoors without requiring knowledge of the specific trigger or the model's expected behavior. By capturing the structural differences between normal and backdoored DNN models, OCGEC can reliably identify the presence of a backdoor attack, even in the face of sophisticated attempts to hide it.

The paper's evaluation demonstrates the effectiveness of OCGEC across various DNN architectures and backdoor attack scenarios, making it a promising tool for securing DNN-based systems. As the use of AI continues to expand, developing robust and interpretable techniques for detecting and mitigating such threats will be increasingly crucial to ensure the reliability and trustworthiness of these powerful technologies.

While the paper highlights several limitations and areas for further research, the OCGEC approach represents an important step forward in the ongoing efforts to protect AI systems from malicious attacks and ensure their safe and responsible deployment.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

OCGEC: One-class Graph Embedding Classification for DNN Backdoor Detection

Haoyu Jiang, Haiyang Yu, Nan Li, Ping Yi

Deep neural networks (DNNs) have been found vulnerable to backdoor attacks, raising security concerns about their deployment in mission-critical applications. There are various approaches to detect backdoor attacks, however they all make certain assumptions about the target attack to be detected and require equal and huge numbers of clean and backdoor samples for training, which renders these detection methods quite limiting in real-world circumstances. This study proposes a novel one-class classification framework called One-class Graph Embedding Classification (OCGEC) that uses GNNs for model-level backdoor detection with only a little amount of clean data. First, we train thousands of tiny models as raw datasets from a small number of clean datasets. Following that, we design a ingenious model-to-graph method for converting the model's structural details and weight features into graph data. We then pre-train a generative self-supervised graph autoencoder (GAE) to better learn the features of benign models in order to detect backdoor models without knowing the attack strategy. After that, we dynamically combine the GAE and one-class classifier optimization goals to form classification boundaries that distinguish backdoor models from benign models. Our OCGEC combines the powerful representation capabilities of graph neural networks with the utility of one-class classification techniques in the field of anomaly detection. In comparison to other baselines, it achieves AUC scores of more than 98% on a number of tasks, which far exceeds existing methods for detection even when they rely on a huge number of positive and negative samples. Our pioneering application of graphic scenarios for generic backdoor detection can provide new insights that can be used to improve other backdoor defense tasks. Code is available at https://github.com/jhy549/OCGEC.

4/9/2024

A Clean-graph Backdoor Attack against Graph Convolutional Networks with Poisoned Label Only

Jiazhu Dai, Haoyu Sun

Graph Convolutional Networks (GCNs) have shown excellent performance in dealing with various graph structures such as node classification, graph classification and other tasks. However,recent studies have shown that GCNs are vulnerable to a novel threat known as backdoor attacks. However, all existing backdoor attacks in the graph domain require modifying the training samples to accomplish the backdoor injection, which may not be practical in many realistic scenarios where adversaries have no access to modify the training samples and may leads to the backdoor attack being detected easily. In order to explore the backdoor vulnerability of GCNs and create a more practical and stealthy backdoor attack method, this paper proposes a clean-graph backdoor attack against GCNs (CBAG) in the node classification task,which only poisons the training labels without any modification to the training samples, revealing that GCNs have this security vulnerability. Specifically, CBAG designs a new trigger exploration method to find important feature dimensions as the trigger patterns to improve the attack performance. By poisoning the training labels, a hidden backdoor is injected into the GCNs model. Experimental results show that our clean graph backdoor can achieve 99% attack success rate while maintaining the functionality of the GCNs model on benign samples.

4/22/2024

OLGA: One-cLass Graph Autoencoder

M. P. S. G^olo, J. G. B. M. Junior, D. F. Silva, R. M. Marcacini

One-class learning (OCL) comprises a set of techniques applied when real-world problems have a single class of interest. The usual procedure for OCL is learning a hypersphere that comprises instances of this class and, ideally, repels unseen instances from any other classes. Besides, several OCL algorithms for graphs have been proposed since graph representation learning has succeeded in various fields. These methods may use a two-step strategy, initially representing the graph and, in a second step, classifying its nodes. On the other hand, end-to-end methods learn the node representations while classifying the nodes in one learning process. We highlight three main gaps in the literature on OCL for graphs: (i) non-customized representations for OCL; (ii) the lack of constraints on hypersphere parameters learning; and (iii) the methods' lack of interpretability and visualization. We propose One-cLass Graph Autoencoder (OLGA). OLGA is end-to-end and learns the representations for the graph nodes while encapsulating the interest instances by combining two loss functions. We propose a new hypersphere loss function to encapsulate the interest instances. OLGA combines this new hypersphere loss with the graph autoencoder reconstruction loss to improve model learning. OLGA achieved state-of-the-art results and outperformed six other methods with a statistically significant difference from five methods. Moreover, OLGA learns low-dimensional representations maintaining the classification performance with an interpretable model representation learning and results.

8/27/2024

Rethinking Graph Backdoor Attacks: A Distribution-Preserving Perspective

Zhiwei Zhang, Minhua Lin, Enyan Dai, Suhang Wang

Graph Neural Networks (GNNs) have shown remarkable performance in various tasks. However, recent works reveal that GNNs are vulnerable to backdoor attacks. Generally, backdoor attack poisons the graph by attaching backdoor triggers and the target class label to a set of nodes in the training graph. A GNN trained on the poisoned graph will then be misled to predict test nodes attached with trigger to the target class. Despite their effectiveness, our empirical analysis shows that triggers generated by existing methods tend to be out-of-distribution (OOD), which significantly differ from the clean data. Hence, these injected triggers can be easily detected and pruned with widely used outlier detection methods in real-world applications. Therefore, in this paper, we study a novel problem of unnoticeable graph backdoor attacks with in-distribution (ID) triggers. To generate ID triggers, we introduce an OOD detector in conjunction with an adversarial learning strategy to generate the attributes of the triggers within distribution. To ensure a high attack success rate with ID triggers, we introduce novel modules designed to enhance trigger memorization by the victim model trained on poisoned graph. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method in generating in distribution triggers that can by-pass various defense strategies while maintaining a high attack success rate.

7/15/2024