ECLIPSE: Expunging Clean-label Indiscriminate Poisons via Sparse Diffusion Purification

Read original: arXiv:2406.15093 - Published 6/26/2024 by Xianlong Wang, Shengshan Hu, Yechao Zhang, Ziqi Zhou, Leo Yu Zhang, Peng Xu, Wei Wan, Hai Jin

ECLIPSE: Expunging Clean-label Indiscriminate Poisons via Sparse Diffusion Purification

Overview

This paper presents ECLIPSE, a novel approach for removing clean-label poisoning attacks from deep neural networks.
Clean-label poisoning attacks are a type of adversarial attack where malicious data is carefully crafted to evade detection and cause model performance degradation.
ECLIPSE uses a sparse diffusion purification process to effectively remove these hidden poisons from the training data without significantly impacting the model's performance on clean, non-malicious data.

Plain English Explanation

The paper describes a new method called ECLIPSE that can help protect deep learning models from a sneaky type of attack called clean-label poisoning. In a clean-label poisoning attack, the attackers carefully craft malicious data that looks legitimate, but is designed to secretly degrade the model's performance when it's used in the real world.

ECLIPSE uses a clever technique called sparse diffusion purification to remove these hidden poisons from the training data. It does this in a way that doesn't significantly impact the model's ability to handle normal, non-malicious data. This is an important advancement because clean-label poisoning attacks can be very difficult to detect and defend against using traditional methods.

By making deep learning models more robust to these types of attacks, ECLIPSE could help improve the reliability and trustworthiness of AI systems in high-stakes applications like medical diagnosis, autonomous driving, and financial fraud detection. The key insight is that you can use a diffusion-based process to selectively remove the malicious patterns without harming the model's core capabilities.

Technical Explanation

The core innovation in ECLIPSE is a sparse diffusion purification process that can effectively remove clean-label poisoning attacks from deep neural network training data. Clean-label poisoning is a stealthy type of adversarial attack where malicious data is carefully crafted to look legitimate, but is designed to degrade model performance when deployed in the real world.

ECLIPSE works by first training a diffusion model on the original training data. This diffusion model is then used to generate purified versions of the training examples, where the clean-label poison signals have been selectively removed. The purified data is then used to fine-tune the target model, resulting in a version that is robust to the clean-label poisoning attack.

The key technical insight is that the diffusion process can isolate and remove the localized perturbations introduced by the clean-label poisons, without significantly impacting the model's performance on normal, non-malicious data. This is achieved through the use of a sparse regularization term that encourages the diffusion process to focus only on the relevant parts of the input.

The paper demonstrates the effectiveness of ECLIPSE through extensive experiments on benchmark datasets and attack scenarios. The results show that ECLIPSE can effectively mitigate clean-label poisoning attacks while maintaining high model performance on clean data, outperforming alternative purification and denoising approaches.

Critical Analysis

The ECLIPSE paper presents a promising approach for defending against clean-label poisoning attacks, which are a significant challenge in the field of adversarial machine learning. The authors have demonstrated the effectiveness of their method through rigorous experimentation and comparison to existing techniques.

One potential limitation of the ECLIPSE approach is that it relies on the availability of a pre-trained diffusion model, which may not always be readily available or easy to obtain. Additionally, the authors note that the performance of ECLIPSE can be sensitive to the choice of hyperparameters, which may require careful tuning for optimal results.

Another area for further research could be exploring the robustness of the ECLIPSE approach to more advanced or targeted clean-label poisoning attacks. It would be interesting to see how the method fares against adversaries that are aware of the ECLIPSE defense and try to circumvent it.

Overall, the ECLIPSE paper makes a valuable contribution to the field of adversarial machine learning by providing a novel and effective solution for mitigating clean-label poisoning attacks. The proposed approach has the potential to improve the reliability and trustworthiness of deep learning systems in a wide range of applications.

Conclusion

The ECLIPSE paper presents a novel approach for defending deep neural networks against clean-label poisoning attacks, a stealthy type of adversarial attack where malicious data is carefully crafted to degrade model performance. The key innovation is a sparse diffusion purification process that can selectively remove the clean-label poison signals from the training data without significantly impacting the model's performance on normal, non-malicious data.

By making deep learning models more robust to these types of attacks, ECLIPSE could help improve the reliability and trustworthiness of AI systems in high-stakes applications. The paper demonstrates the effectiveness of the ECLIPSE approach through extensive experiments, and highlights opportunities for further research to address potential limitations and explore advanced attack scenarios.

Overall, the ECLIPSE paper represents an important step forward in the field of adversarial machine learning, and its findings could have significant implications for the development of secure and reliable AI systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

ECLIPSE: Expunging Clean-label Indiscriminate Poisons via Sparse Diffusion Purification

Xianlong Wang, Shengshan Hu, Yechao Zhang, Ziqi Zhou, Leo Yu Zhang, Peng Xu, Wei Wan, Hai Jin

Clean-label indiscriminate poisoning attacks add invisible perturbations to correctly labeled training images, thus dramatically reducing the generalization capability of the victim models. Recently, some defense mechanisms have been proposed such as adversarial training, image transformation techniques, and image purification. However, these schemes are either susceptible to adaptive attacks, built on unrealistic assumptions, or only effective against specific poison types, limiting their universal applicability. In this research, we propose a more universally effective, practical, and robust defense scheme called ECLIPSE. We first investigate the impact of Gaussian noise on the poisons and theoretically prove that any kind of poison will be largely assimilated when imposing sufficient random noise. In light of this, we assume the victim has access to an extremely limited number of clean images (a more practical scene) and subsequently enlarge this sparse set for training a denoising probabilistic model (a universal denoising tool). We then begin by introducing Gaussian noise to absorb the poisons and then apply the model for denoising, resulting in a roughly purified dataset. Finally, to address the trade-off of the inconsistency in the assimilation sensitivity of different poisons by Gaussian noise, we propose a lightweight corruption compensation module to effectively eliminate residual poisons, providing a more universal defense approach. Extensive experiments demonstrate that our defense approach outperforms 10 state-of-the-art defenses. We also propose an adaptive attack against ECLIPSE and verify the robustness of our defense scheme. Our code is available at https://github.com/CGCL-codes/ECLIPSE.

6/26/2024

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

Sunay Bhat, Jeffrey Jiang, Omead Pooladzandi, Alexander Branch, Gregory Pottie

Train-time data poisoning attacks threaten machine learning models by introducing adversarial examples during training, leading to misclassification. Current defense methods often reduce generalization performance, are attack-specific, and impose significant training overhead. To address this, we introduce a set of universal data purification methods using a stochastic transform, $Psi(x)$, realized via iterative Langevin dynamics of Energy-Based Models (EBMs), Denoising Diffusion Probabilistic Models (DDPMs), or both. These approaches purify poisoned data with minimal impact on classifier generalization. Our specially trained EBMs and DDPMs provide state-of-the-art defense against various attacks (including Narcissus, Bullseye Polytope, Gradient Matching) on CIFAR-10, Tiny-ImageNet, and CINIC-10, without needing attack or classifier-specific information. We discuss performance trade-offs and show that our methods remain highly effective even with poisoned or distributionally shifted generative model training data.

6/4/2024

PureEBM: Universal Poison Purification via Mid-Run Dynamics of Energy-Based Models

Omead Pooladzandi, Jeffrey Jiang, Sunay Bhat, Gregory Pottie

Data poisoning attacks pose a significant threat to the integrity of machine learning models by leading to misclassification of target distribution data by injecting adversarial examples during training. Existing state-of-the-art (SoTA) defense methods suffer from limitations, such as significantly reduced generalization performance and significant overhead during training, making them impractical or limited for real-world applications. In response to this challenge, we introduce a universal data purification method that defends naturally trained classifiers from malicious white-, gray-, and black-box image poisons by applying a universal stochastic preprocessing step $Psi_{T}(x)$, realized by iterative Langevin sampling of a convergent Energy Based Model (EBM) initialized with an image $x.$ Mid-run dynamics of $Psi_{T}(x)$ purify poison information with minimal impact on features important to the generalization of a classifier network. We show that EBMs remain universal purifiers, even in the presence of poisoned EBM training data, and achieve SoTA defense on leading triggered and triggerless poisons. This work is a subset of a larger framework introduced in pgen with a more detailed focus on EBM purification and poison defense.

6/4/2024

New!Clean Label Attacks against SLU Systems

Henry Li Xinyuan, Sonal Joshi, Thomas Thebaud, Jesus Villalba, Najim Dehak, Sanjeev Khudanpur

Poisoning backdoor attacks involve an adversary manipulating the training data to induce certain behaviors in the victim model by inserting a trigger in the signal at inference time. We adapted clean label backdoor (CLBD)-data poisoning attacks, which do not modify the training labels, on state-of-the-art speech recognition models that support/perform a Spoken Language Understanding task, achieving 99.8% attack success rate by poisoning 10% of the training data. We analyzed how varying the signal-strength of the poison, percent of samples poisoned, and choice of trigger impact the attack. We also found that CLBD attacks are most successful when applied to training samples that are inherently hard for a proxy model. Using this strategy, we achieved an attack success rate of 99.3% by poisoning a meager 1.5% of the training data. Finally, we applied two previously developed defenses against gradient-based attacks, and found that they attain mixed success against poisoning.

9/16/2024