Towards Better Adversarial Purification via Adversarial Denoising Diffusion Training

Read original: arXiv:2404.14309 - Published 4/23/2024 by Yiming Liu, Kezhao Liu, Yao Xiao, Ziyi Dong, Xiaogang Xu, Pengxu Wei, Liang Lin

🏋️

Overview

Recent research has explored using diffusion-based purification (DBP) as a way to defend against adversarial attacks on machine learning models.
Previous studies have used questionable methods to evaluate the robustness of DBP models, and their explanations for DBP's robustness lack experimental support.
This paper re-examines the robustness of DBP models using precise gradient analysis, and explores the impact of stochasticity on DBP's robustness.
The paper proposes a new technique called Adversarial Denoising Diffusion Training (ADDT) to improve the robustness of DBP models.

Plain English Explanation

Diffusion-based purification (DBP) is a technique that has been explored as a way to make machine learning models more robust against adversarial attacks. Adversarial attacks are when small, carefully-crafted changes are made to an input that can cause a model to make mistakes.

Previous research on DBP's robustness has used questionable methods and hasn't provided good explanations for why it works. This new paper takes a closer look at DBP using more rigorous analysis. It finds that stochasticity - the random element in DBP - is a key factor in its ability to resist adversarial attacks, rather than directly countering the adversarial perturbations.

To improve DBP's robustness even further, the paper proposes a new training technique called Adversarial Denoising Diffusion Training (ADDT). ADDT uses a pre-trained classifier to generate adversarial perturbations, and then converts those perturbations to a normal distribution. Experiments show that ADDT can make DBP models even more robust to adversarial attacks.

Technical Explanation

The paper re-examines the robustness of diffusion-based purification (DBP) models using precise gradient analysis. Previous work had used questionable methods to evaluate DBP's robustness, and their explanations lacked experimental support.

The researchers assess DBP robustness under a novel "Deterministic White-box" attack setting, which allows them to pinpoint stochasticity as the main factor contributing to DBP's robustness. They find that DBP models rely on stochasticity to evade the most effective attack direction, rather than directly countering adversarial perturbations.

To improve the robustness of DBP models, the paper proposes a new technique called Adversarial Denoising Diffusion Training (ADDT). ADDT uses Classifier-Guided Perturbation Optimization (CGPO) to generate adversarial perturbations guided by a pre-trained classifier, and Rank-Based Gaussian Mapping (RBGM) to convert the adversarial perturbations into a normal Gaussian distribution.

Empirical results show that ADDT significantly improves the robustness of DBP models, equipping them with the ability to directly counter adversarial perturbations.

Critical Analysis

The paper provides a thorough and rigorous re-examination of the robustness of diffusion-based purification (DBP) models against adversarial attacks. Its use of precise gradient analysis and a novel "Deterministic White-box" attack setting helps shed light on the key factors contributing to DBP's robustness.

The finding that stochasticity, rather than directly countering adversarial perturbations, is the main driver of DBP's robustness is an important insight. This challenges the explanations provided in previous studies and suggests the need for a deeper understanding of the underlying mechanisms at play.

The proposed Adversarial Denoising Diffusion Training (ADDT) technique is a promising approach to further improving the robustness of DBP models. Its use of Classifier-Guided Perturbation Optimization (CGPO) and Rank-Based Gaussian Mapping (RBGM) is a novel and well-designed solution to the problem.

However, the paper does not discuss the potential limitations or caveats of ADDT, such as its computational complexity, the need for a pre-trained classifier, or its generalization to other types of adversarial attacks. Further research and experimentation would be needed to fully understand the strengths and weaknesses of this approach.

Overall, this paper makes a valuable contribution to the understanding of diffusion-based purification and adversarial robustness, and provides a promising direction for improving the robustness of machine learning models.

Conclusion

This paper provides a re-examination of the robustness of diffusion-based purification (DBP) models against adversarial attacks. It finds that stochasticity, rather than directly countering adversarial perturbations, is the main factor contributing to DBP's robustness. To further improve DBP's robustness, the paper proposes a new training technique called Adversarial Denoising Diffusion Training (ADDT), which uses a pre-trained classifier and Rank-Based Gaussian Mapping to generate and incorporate adversarial perturbations during training. Empirical results show that ADDT can significantly improve the robustness of DBP models, equipping them with the ability to directly counter adversarial attacks.

This research highlights the importance of understanding the underlying mechanisms driving the robustness of machine learning models, and provides a promising approach for enhancing their reliability and security in the face of adversarial threats.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏋️

Towards Better Adversarial Purification via Adversarial Denoising Diffusion Training

Yiming Liu, Kezhao Liu, Yao Xiao, Ziyi Dong, Xiaogang Xu, Pengxu Wei, Liang Lin

Recently, diffusion-based purification (DBP) has emerged as a promising approach for defending against adversarial attacks. However, previous studies have used questionable methods to evaluate the robustness of DBP models, their explanations of DBP robustness also lack experimental support. We re-examine DBP robustness using precise gradient, and discuss the impact of stochasticity on DBP robustness. To better explain DBP robustness, we assess DBP robustness under a novel attack setting, Deterministic White-box, and pinpoint stochasticity as the main factor in DBP robustness. Our results suggest that DBP models rely on stochasticity to evade the most effective attack direction, rather than directly countering adversarial perturbations. To improve the robustness of DBP models, we propose Adversarial Denoising Diffusion Training (ADDT). This technique uses Classifier-Guided Perturbation Optimization (CGPO) to generate adversarial perturbation through guidance from a pre-trained classifier, and uses Rank-Based Gaussian Mapping (RBGM) to convert adversarial pertubation into a normal Gaussian distribution. Empirical results show that ADDT improves the robustness of DBP models. Further experiments confirm that ADDT equips DBP models with the ability to directly counter adversarial perturbations.

4/23/2024

Robust Diffusion Models for Adversarial Purification

Guang Lin, Zerui Tao, Jianhai Zhang, Toshihisa Tanaka, Qibin Zhao

Diffusion models (DMs) based adversarial purification (AP) has shown to be the most powerful alternative to adversarial training (AT). However, these methods neglect the fact that pre-trained diffusion models themselves are not robust to adversarial attacks as well. Additionally, the diffusion process can easily destroy semantic information and generate a high quality image but totally different from the original input image after the reverse process, leading to degraded standard accuracy. To overcome these issues, a natural idea is to harness adversarial training strategy to retrain or fine-tune the pre-trained diffusion model, which is computationally prohibitive. We propose a novel robust reverse process with adversarial guidance, which is independent of given pre-trained DMs and avoids retraining or fine-tuning the DMs. This robust guidance can not only ensure to generate purified examples retaining more semantic content but also mitigate the accuracy-robustness trade-off of DMs for the first time, which also provides DM-based AP an efficient adaptive ability to new attacks. Extensive experiments are conducted on CIFAR-10, CIFAR-100 and ImageNet to demonstrate that our method achieves the state-of-the-art results and exhibits generalization against different attacks.

8/26/2024

ADBM: Adversarial diffusion bridge model for reliable adversarial purification

Xiao Li, Wenxuan Sun, Huanran Chen, Qiongxiu Li, Yining Liu, Yingzhe He, Jie Shi, Xiaolin Hu

Recently Diffusion-based Purification (DiffPure) has been recognized as an effective defense method against adversarial examples. However, we find DiffPure which directly employs the original pre-trained diffusion models for adversarial purification, to be suboptimal. This is due to an inherent trade-off between noise purification performance and data recovery quality. Additionally, the reliability of existing evaluations for DiffPure is questionable, as they rely on weak adaptive attacks. In this work, we propose a novel Adversarial Diffusion Bridge Model, termed ADBM. ADBM directly constructs a reverse bridge from the diffused adversarial data back to its original clean examples, enhancing the purification capabilities of the original diffusion models. Through theoretical analysis and experimental validation across various scenarios, ADBM has proven to be a superior and robust defense mechanism, offering significant promise for practical applications.

8/2/2024

Adversarially Robust Industrial Anomaly Detection Through Diffusion Model

Yuanpu Cao, Lu Lin, Jinghui Chen

Deep learning-based industrial anomaly detection models have achieved remarkably high accuracy on commonly used benchmark datasets. However, the robustness of those models may not be satisfactory due to the existence of adversarial examples, which pose significant threats to the practical deployment of deep anomaly detectors. Recently, it has been shown that diffusion models can be used to purify the adversarial noises and thus build a robust classifier against adversarial attacks. Unfortunately, we found that naively applying this strategy in anomaly detection (i.e., placing a purifier before an anomaly detector) will suffer from a high anomaly miss rate since the purifying process can easily remove both the anomaly signal and the adversarial perturbations, causing the later anomaly detector failed to detect anomalies. To tackle this issue, we explore the possibility of performing anomaly detection and adversarial purification simultaneously. We propose a simple yet effective adversarially robust anomaly detection method, textit{AdvRAD}, that allows the diffusion model to act both as an anomaly detector and adversarial purifier. We also extend our proposed method for certified robustness to $l_2$ norm bounded perturbations. Through extensive experiments, we show that our proposed method exhibits outstanding (certified) adversarial robustness while also maintaining equally strong anomaly detection performance on par with the state-of-the-art methods on industrial anomaly detection benchmark datasets.

8/12/2024