PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

Read original: arXiv:2405.18627 - Published 6/4/2024 by Sunay Bhat, Jeffrey Jiang, Omead Pooladzandi, Alexander Branch, Gregory Pottie

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

Overview

The paper introduces PureGen, a novel approach for defending against train-time poisoning attacks in machine learning models.
PureGen uses a generative model to purify the training data, removing the influence of poisoned samples and improving the model's robustness.
The method is claimed to be universal, meaning it can be applied to a wide range of models and tasks without requiring specialized architectures or training procedures.

Plain English Explanation

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics is a technique that aims to protect machine learning models from a type of attack called "train-time poisoning." In this attack, the attacker contaminates the training data with malicious samples, hoping to undermine the model's performance during deployment.

PureGen addresses this problem by using a special kind of machine learning model called a "generative model." This model learns to generate new, "purified" versions of the training data that are free from the influence of the poisoned samples. The purified data is then used to train the target model, making it more robust to the poisoning attack.

Unlike some previous approaches, PureGen is designed to be "universal," meaning it can be applied to a wide range of machine learning models and tasks without requiring significant modifications or specialized architectures. This makes it a potentially more practical and versatile solution for defending against train-time poisoning attacks.

Technical Explanation

The key idea behind PureGen is to leverage the dynamics of generative model training to purify the training data and remove the influence of poisoned samples. The authors propose a two-stage process:

Train a generative model (e.g., a variational autoencoder or a diffusion model) on the original training data, including the poisoned samples.
Use the trained generative model to generate "purified" versions of the training data, which are then used to train the target model.

The generative model is able to learn the underlying data distribution, effectively separating the "clean" data from the poisoned samples. By generating new samples from this learned distribution, the authors argue that the poisoned samples are effectively removed, leading to a more robust target model.

The authors demonstrate the effectiveness of PureGen through experiments on various datasets and models, including image classification and text generation tasks. They show that PureGen can outperform other state-of-the-art defenses against train-time poisoning attacks, such as Towards Better Adversarial Purification via Adversarial Denoising and Mitigating Backdoor Attack by Injecting Proactive Defensive.

Critical Analysis

The authors of the paper make a strong case for the effectiveness of PureGen in defending against train-time poisoning attacks. However, it's important to note that the method is not a silver bullet and may have some limitations:

Generative Model Performance: The success of PureGen heavily depends on the ability of the generative model to accurately learn the underlying data distribution and separate the clean data from the poisoned samples. If the generative model fails to capture the relevant patterns, the purification process may not be effective.
Computational Complexity: Training a generative model can be computationally intensive, especially for large-scale datasets. This may limit the practical applicability of PureGen in some scenarios where computational resources are constrained.
Scalability to Real-World Scenarios: The paper focuses on controlled experimental settings, and further research may be needed to assess the performance of PureGen in more realistic, large-scale applications with diverse and complex data distributions.
Interpretability and Explainability: As with many machine learning techniques, the inner workings of PureGen may not be fully transparent, which could be a concern in applications where interpretability is crucial, such as in high-stakes decision-making scenarios.

Despite these potential limitations, the Partial Train-Isolate-Mitigate Backdoor Attack approach presented in the paper is a promising direction for defending against train-time poisoning attacks, and further research and development in this area could lead to more robust and practical solutions.

Conclusion

The PureGen method introduced in this paper offers a novel and potentially powerful approach for defending against train-time poisoning attacks in machine learning. By leveraging the dynamics of generative model training, PureGen aims to purify the training data and improve the robustness of the target model, without requiring specialized architectures or training procedures.

While the method has some limitations, the authors' experimental results are encouraging and suggest that PureGen could be a valuable tool in the ongoing efforts to build more secure and reliable machine learning systems, especially in Robust Diffusion Models for Adversarial Purification scenarios.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

Sunay Bhat, Jeffrey Jiang, Omead Pooladzandi, Alexander Branch, Gregory Pottie

Train-time data poisoning attacks threaten machine learning models by introducing adversarial examples during training, leading to misclassification. Current defense methods often reduce generalization performance, are attack-specific, and impose significant training overhead. To address this, we introduce a set of universal data purification methods using a stochastic transform, $Psi(x)$, realized via iterative Langevin dynamics of Energy-Based Models (EBMs), Denoising Diffusion Probabilistic Models (DDPMs), or both. These approaches purify poisoned data with minimal impact on classifier generalization. Our specially trained EBMs and DDPMs provide state-of-the-art defense against various attacks (including Narcissus, Bullseye Polytope, Gradient Matching) on CIFAR-10, Tiny-ImageNet, and CINIC-10, without needing attack or classifier-specific information. We discuss performance trade-offs and show that our methods remain highly effective even with poisoned or distributionally shifted generative model training data.

6/4/2024

PureEBM: Universal Poison Purification via Mid-Run Dynamics of Energy-Based Models

Omead Pooladzandi, Jeffrey Jiang, Sunay Bhat, Gregory Pottie

Data poisoning attacks pose a significant threat to the integrity of machine learning models by leading to misclassification of target distribution data by injecting adversarial examples during training. Existing state-of-the-art (SoTA) defense methods suffer from limitations, such as significantly reduced generalization performance and significant overhead during training, making them impractical or limited for real-world applications. In response to this challenge, we introduce a universal data purification method that defends naturally trained classifiers from malicious white-, gray-, and black-box image poisons by applying a universal stochastic preprocessing step $Psi_{T}(x)$, realized by iterative Langevin sampling of a convergent Energy Based Model (EBM) initialized with an image $x.$ Mid-run dynamics of $Psi_{T}(x)$ purify poison information with minimal impact on features important to the generalization of a classifier network. We show that EBMs remain universal purifiers, even in the presence of poisoned EBM training data, and achieve SoTA defense on leading triggered and triggerless poisons. This work is a subset of a larger framework introduced in pgen with a more detailed focus on EBM purification and poison defense.

6/4/2024

ECLIPSE: Expunging Clean-label Indiscriminate Poisons via Sparse Diffusion Purification

Xianlong Wang, Shengshan Hu, Yechao Zhang, Ziqi Zhou, Leo Yu Zhang, Peng Xu, Wei Wan, Hai Jin

Clean-label indiscriminate poisoning attacks add invisible perturbations to correctly labeled training images, thus dramatically reducing the generalization capability of the victim models. Recently, some defense mechanisms have been proposed such as adversarial training, image transformation techniques, and image purification. However, these schemes are either susceptible to adaptive attacks, built on unrealistic assumptions, or only effective against specific poison types, limiting their universal applicability. In this research, we propose a more universally effective, practical, and robust defense scheme called ECLIPSE. We first investigate the impact of Gaussian noise on the poisons and theoretically prove that any kind of poison will be largely assimilated when imposing sufficient random noise. In light of this, we assume the victim has access to an extremely limited number of clean images (a more practical scene) and subsequently enlarge this sparse set for training a denoising probabilistic model (a universal denoising tool). We then begin by introducing Gaussian noise to absorb the poisons and then apply the model for denoising, resulting in a roughly purified dataset. Finally, to address the trade-off of the inconsistency in the assimilation sensitivity of different poisons by Gaussian noise, we propose a lightweight corruption compensation module to effectively eliminate residual poisons, providing a more universal defense approach. Extensive experiments demonstrate that our defense approach outperforms 10 state-of-the-art defenses. We also propose an adaptive attack against ECLIPSE and verify the robustness of our defense scheme. Our code is available at https://github.com/CGCL-codes/ECLIPSE.

6/26/2024

Certified Robustness to Data Poisoning in Gradient-Based Training

Philip Sosnin, Mark N. Muller, Maximilian Baader, Calvin Tsay, Matthew Wicker

Modern machine learning pipelines leverage large amounts of public data, making it infeasible to guarantee data quality and leaving models open to poisoning and backdoor attacks. However, provably bounding model behavior under such attacks remains an open problem. In this work, we address this challenge and develop the first framework providing provable guarantees on the behavior of models trained with potentially manipulated data. In particular, our framework certifies robustness against untargeted and targeted poisoning as well as backdoor attacks for both input and label manipulations. Our method leverages convex relaxations to over-approximate the set of all possible parameter updates for a given poisoning threat model, allowing us to bound the set of all reachable parameters for any gradient-based learning algorithm. Given this set of parameters, we provide bounds on worst-case behavior, including model performance and backdoor success rate. We demonstrate our approach on multiple real-world datasets from applications including energy consumption, medical imaging, and autonomous driving.

6/11/2024