Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders

Read original: arXiv:2405.01460 - Published 5/7/2024 by Yi Yu, Yufei Wang, Song Xia, Wenhan Yang, Shijian Lu, Yap-Peng Tan, Alex C. Kot

📈

Overview

This paper introduces a novel technique called Disentangle Variational Autoencoder (D-VAE) to defend against a type of attack called Unlearnable Examples (UEs).
UEs are subtle modifications to training data that can cause machine learning models to perform poorly on test data, even when the training data is correctly labeled.
The paper explores two main approaches to defend against UE attacks: training-time defenses and pre-training purification.
The key innovation in this paper is the D-VAE, which can effectively disentangle and remove the perturbations in UEs while preserving the original class information.

Plain English Explanation

The paper is about a technique to protect machine learning models from a particular type of attack called Unlearnable Examples (UEs). UEs are subtle changes made to training data that can trick the model into performing poorly on real test data, even though the original training data was correctly labeled.

The researchers propose a new method called the Disentangle Variational Autoencoder (D-VAE) to defend against these UE attacks. The D-VAE works by separating or "disentangling" the original class information in the training data from the sneaky perturbations (changes) that make the data "unlearnable."

By isolating and removing just the perturbations, the D-VAE can clean up the training data and allow the model to learn the true underlying patterns, making it more robust against UE attacks. This is an improvement over previous defenses that either require a lot of extra computation during training or struggle to handle the variety of UEs that can be created.

The key insight is that certain types of variational autoencoders (VAEs) have a natural tendency to suppress these perturbations. The researchers build on this by creating the D-VAE, which can systematically disentangle the class information from the perturbations. This allows for an efficient two-stage purification process to clean up the training data.

Technical Explanation

The paper proposes a novel Disentangle Variational Autoencoder (D-VAE) architecture to defend against Unlearnable Examples (UEs) - carefully crafted perturbations to training data that can significantly degrade model performance on test data.

The authors first observe that certain rate-constrained variational autoencoders (VAEs) have a natural tendency to suppress UE perturbations. They provide a theoretical analysis to explain this phenomenon.

Building on this insight, the D-VAE is designed with learnable class-wise embeddings that can disentangle the perturbations from the original class information. This allows for a two-stage purification process:

The first stage roughly eliminates the perturbations.
The second stage produces refined, poison-free results.

The authors extensively evaluate the D-VAE approach on CIFAR-10, CIFAR-100, and a 100-class ImageNet subset, demonstrating its effectiveness and robustness against UE attacks compared to prior defenses. The source code is available on GitHub.

Critical Analysis

The paper presents a novel and promising defense against a important class of attacks (UEs) that can significantly degrade machine learning model performance. The key innovation of the D-VAE is its ability to systematically disentangle the class information from the perturbations, enabling an efficient two-stage purification process.

However, the paper does not explore the generalization of the D-VAE to other types of poisoning attacks beyond UEs, such as backdoor attacks or data augmentation attacks. The authors also do not discuss the potential computational overhead or training complexity of the D-VAE compared to simpler purification methods.

Additionally, the paper focuses on image classification tasks, and it is unclear how the D-VAE would perform on other modalities like text or speech. Further research is needed to explore the broader applicability and limitations of this approach.

Conclusion

This paper introduces a novel Disentangle Variational Autoencoder (D-VAE) architecture that can effectively defend against Unlearnable Examples (UEs) - a type of poisoning attack that degrades machine learning model performance through subtle modifications to the training data.

The key innovation is the D-VAE's ability to disentangle the perturbations in UEs from the original class information, enabling an efficient two-stage purification process. Extensive experiments demonstrate the D-VAE's impressive performance and robustness on image classification tasks.

While the paper focuses on UEs, further research is needed to explore the D-VAE's generalization to other types of poisoning attacks and its applicability to a wider range of machine learning domains. Overall, this work represents an important step forward in building more secure and reliable machine learning systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📈

Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders

Yi Yu, Yufei Wang, Song Xia, Wenhan Yang, Shijian Lu, Yap-Peng Tan, Alex C. Kot

Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled. Defenses against these poisoning attacks can be categorized based on whether specific interventions are adopted during training. The first approach is training-time defense, such as adversarial training, which can mitigate poisoning effects but is computationally intensive. The other approach is pre-training purification, e.g., image short squeezing, which consists of several simple compressions but often encounters challenges in dealing with various UEs. Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method. Firstly, we uncover rate-constrained variational autoencoders (VAEs), demonstrating a clear tendency to suppress the perturbations in UEs. We subsequently conduct a theoretical analysis for this phenomenon. Building upon these insights, we introduce a disentangle variational autoencoder (D-VAE), capable of disentangling the perturbations with learnable class-wise embeddings. Based on this network, a two-stage purification approach is naturally developed. The first stage focuses on roughly eliminating perturbations, while the second stage produces refined, poison-free results, ensuring effectiveness and robustness across various scenarios. Extensive experiments demonstrate the remarkable performance of our method across CIFAR-10, CIFAR-100, and a 100-class ImageNet-subset. Code is available at https://github.com/yuyi-sd/D-VAE.

5/7/2024

Unlearnable Examples Detection via Iterative Filtering

Yi Yu, Qichen Zheng, Siyuan Yang, Wenhan Yang, Jun Liu, Shijian Lu, Yap-Peng Tan, Kwok-Yan Lam, Alex Kot

Deep neural networks are proven to be vulnerable to data poisoning attacks. Recently, a specific type of data poisoning attack known as availability attacks has led to the failure of data utilization for model learning by adding imperceptible perturbations to images. Consequently, it is quite beneficial and challenging to detect poisoned samples, also known as Unlearnable Examples (UEs), from a mixed dataset. In response, we propose an Iterative Filtering approach for UEs identification. This method leverages the distinction between the inherent semantic mapping rules and shortcuts, without the need for any additional information. We verify that when training a classifier on a mixed dataset containing both UEs and clean data, the model tends to quickly adapt to the UEs compared to the clean data. Due to the accuracy gaps between training with clean/poisoned samples, we employ a model to misclassify clean samples while correctly identifying the poisoned ones. The incorporation of additional classes and iterative refinement enhances the model's ability to differentiate between clean and poisoned samples. Extensive experiments demonstrate the superiority of our method over state-of-the-art detection approaches across various attacks, datasets, and poison ratios, significantly reducing the Half Total Error Rate (HTER) compared to existing methods.

8/16/2024

🔗

Provably Unlearnable Examples

Derui Wang, Minhui Xue, Bo Li, Seyit Camtepe, Liming Zhu

The exploitation of publicly accessible data has led to escalating concerns regarding data privacy and intellectual property (IP) breaches in the age of artificial intelligence. As a strategy to safeguard both data privacy and IP-related domain knowledge, efforts have been undertaken to render shared data unlearnable for unauthorized models in the wild. Existing methods apply empirically optimized perturbations to the data in the hope of disrupting the correlation between the inputs and the corresponding labels such that the data samples are converted into Unlearnable Examples (UEs). Nevertheless, the absence of mechanisms that can verify how robust the UEs are against unknown unauthorized models and train-time techniques engenders several problems. First, the empirically optimized perturbations may suffer from the problem of cross-model generalization, which echoes the fact that the unauthorized models are usually unknown to the defender. Second, UEs can be mitigated by train-time techniques such as data augmentation and adversarial training. Furthermore, we find that a simple recovery attack can restore the clean-task performance of the classifiers trained on UEs by slightly perturbing the learned weights. To mitigate the aforementioned problems, in this paper, we propose a mechanism for certifying the so-called $(q, eta)$-Learnability of an unlearnable dataset via parametric smoothing. A lower certified $(q, eta)$-Learnability indicates a more robust protection over the dataset. Finally, we try to 1) improve the tightness of certified $(q, eta)$-Learnability and 2) design Provably Unlearnable Examples (PUEs) which have reduced $(q, eta)$-Learnability. According to experimental results, PUEs demonstrate both decreased certified $(q, eta)$-Learnability and enhanced empirical robustness compared to existing UEs.

5/7/2024

Nonlinear Transformations Against Unlearnable Datasets

Thushari Hapuarachchi, Jing Lin, Kaiqi Xiong, Mohamed Rahouti, Gitte Ost

Automated scraping stands out as a common method for collecting data in deep learning models without the authorization of data owners. Recent studies have begun to tackle the privacy concerns associated with this data collection method. Notable approaches include Deepconfuse, error-minimizing, error-maximizing (also known as adversarial poisoning), Neural Tangent Generalization Attack, synthetic, autoregressive, One-Pixel Shortcut, Self-Ensemble Protection, Entangled Features, Robust Error-Minimizing, Hypocritical, and TensorClog. The data generated by those approaches, called unlearnable examples, are prevented learning by deep learning models. In this research, we investigate and devise an effective nonlinear transformation framework and conduct extensive experiments to demonstrate that a deep neural network can effectively learn from the data/examples traditionally considered unlearnable produced by the above twelve approaches. The resulting approach improves the ability to break unlearnable data compared to the linear separable technique recently proposed by researchers. Specifically, our extensive experiments show that the improvement ranges from 0.34% to 249.59% for the unlearnable CIFAR10 datasets generated by those twelve data protection approaches, except for One-Pixel Shortcut. Moreover, the proposed framework achieves over 100% improvement of test accuracy for Autoregressive and REM approaches compared to the linear separable technique. Our findings suggest that these approaches are inadequate in preventing unauthorized uses of data in machine learning models. There is an urgent need to develop more robust protection mechanisms that effectively thwart an attacker from accessing data without proper authorization from the owners.

6/6/2024