Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser

Read original: arXiv:2304.06430 - Published 7/9/2024 by Astha Verma, A V Subramanyam, Siddhesh Bangar, Naman Lal, Rajiv Ratn Shah, Shin'ichi Satoh

🌿

Overview

This research paper proposes a certified zeroth-order (ZO) preprocessing technique to remove adversarial perturbations from images in a black-box setting.
The authors introduce a robust UNet denoiser (RDUNet) to ensure the robustness of black-box models trained on high-dimensional datasets.
They present two defense mechanisms: ZO-RUDS, which prepends RDUNet to the black-box model, and ZO-AE-RUDS, which combines RDUNet with an autoencoder (AE) before the black-box model.
The proposed methods significantly outperform state-of-the-art approaches on both low-dimensional (CIFAR-10) and high-dimensional (STL-10) datasets.

Plain English Explanation

Adversarial attacks are a type of cyber attack where small, carefully crafted changes are made to an image, causing an AI model to misclassify it. This can be a serious problem for real-world AI applications.

The researchers in this paper tackle this issue in a "black-box" setting, where the attackers don't have full access to the AI model's inner workings. They propose a way to preprocess the images to remove the adversarial perturbations before they reach the AI model.

The key idea is to use a special neural network called a "denoiser" to clean up the images. The researchers developed a robust version of the denoiser, called RDUNet, that works well even on high-resolution images. They then tested two different ways of using this denoiser to defend against attacks: one where the denoiser is directly connected to the AI model, and another where an autoencoder is added as an extra step.

Both of these defense methods significantly outperformed previous approaches, especially on the more challenging high-resolution datasets. This shows that their technique is an important step forward in protecting AI systems from adversarial attacks in the real world.

Technical Explanation

The paper focuses on the black-box setting, where the attacker has limited access to the target AI model and can only query it to get output predictions. Previous zeroth-order (ZO) defense methods for this setting have suffered from high model variance and low performance, especially on high-dimensional datasets.

To address these limitations, the authors propose a certified ZO preprocessing technique called ZO-RUDS. It uses a robust UNet-based denoiser (RDUNet) to remove adversarial perturbations from the input image before feeding it to the black-box model. RDUNet is designed to be more effective at denoising high-dimensional data compared to previous approaches.

They also introduce ZO-AE-RUDS, which combines RDUNet with an autoencoder (AE) to further improve the denoising process. The AE helps capture the underlying structure of the clean images, allowing more effective removal of the adversarial noise.

The authors conduct extensive experiments on four image classification datasets (CIFAR-10, CIFAR-100, Tiny ImageNet, STL-10) and the MNIST dataset for image reconstruction. They show that their proposed methods, ZO-RUDS and ZO-AE-RUDS, outperform state-of-the-art zeroth-order adversarial attack defenses by a large margin, achieving up to 35% and 23.51% higher accuracy on low-dimensional and high-dimensional datasets, respectively.

Critical Analysis

The paper presents a well-designed and thorough approach to defending against adversarial attacks in the black-box setting. The use of a robust denoiser, RDUNet, and the combination with an autoencoder are innovative ideas that demonstrate the potential of preprocessing techniques for adversarial defense.

However, the paper does not explore the computational overhead or latency introduced by the additional preprocessing steps. This could be an important factor, especially for real-time applications. Additionally, the paper does not discuss the robustness of the proposed methods against adaptive attackers who might try to bypass the denoising process.

Further research could investigate the transferability of the trained RDUNet and AE models to different black-box models, as well as ways to make the defense more efficient and scalable. It would also be valuable to explore the generalization of the approach to other data modalities, such as text or audio, where adversarial attacks are also a concern.

Conclusion

This research paper presents a promising approach to defending against adversarial attacks in the black-box setting. By introducing a robust denoiser (RDUNet) and combining it with an autoencoder, the authors have developed two effective defense mechanisms, ZO-RUDS and ZO-AE-RUDS, that significantly outperform state-of-the-art methods.

The key contribution of this work is the development of a certified ZO preprocessing technique that can reliably remove adversarial perturbations, even on high-dimensional datasets. This is an important step forward in enhancing the robustness of AI systems and promoting their safe and trustworthy deployment in real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌿

Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser

Astha Verma, A V Subramanyam, Siddhesh Bangar, Naman Lal, Rajiv Ratn Shah, Shin'ichi Satoh

Certified defense methods against adversarial perturbations have been recently investigated in the black-box setting with a zeroth-order (ZO) perspective. However, these methods suffer from high model variance with low performance on high-dimensional datasets due to the ineffective design of the denoiser and are limited in their utilization of ZO techniques. To this end, we propose a certified ZO preprocessing technique for removing adversarial perturbations from the attacked image in the black-box setting using only model queries. We propose a robust UNet denoiser (RDUNet) that ensures the robustness of black-box models trained on high-dimensional datasets. We propose a novel black-box denoised smoothing (DS) defense mechanism, ZO-RUDS, by prepending our RDUNet to the black-box model, ensuring black-box defense. We further propose ZO-AE-RUDS in which RDUNet followed by autoencoder (AE) is prepended to the black-box model. We perform extensive experiments on four classification datasets, CIFAR-10, CIFAR-10, Tiny Imagenet, STL-10, and the MNIST dataset for image reconstruction tasks. Our proposed defense methods ZO-RUDS and ZO-AE-RUDS beat SOTA with a huge margin of $35%$ and $9%$, for low dimensional (CIFAR-10) and with a margin of $20.61%$ and $23.51%$ for high-dimensional (STL-10) datasets, respectively.

7/9/2024

🏅

Universal Adversarial Defense in Remote Sensing Based on Pre-trained Denoising Diffusion Models

Weikang Yu, Yonghao Xu, Pedram Ghamisi

Deep neural networks (DNNs) have risen to prominence as key solutions in numerous AI applications for earth observation (AI4EO). However, their susceptibility to adversarial examples poses a critical challenge, compromising the reliability of AI4EO algorithms. This paper presents a novel Universal Adversarial Defense approach in Remote Sensing Imagery (UAD-RS), leveraging pre-trained diffusion models to protect DNNs against universal adversarial examples exhibiting heterogeneous patterns. Specifically, a universal adversarial purification framework is developed utilizing pre-trained diffusion models to mitigate adversarial perturbations through the introduction of Gaussian noise and subsequent purification of the perturbations from adversarial examples. Additionally, an Adaptive Noise Level Selection (ANLS) mechanism is introduced to determine the optimal noise level for the purification framework with a task-guided Frechet Inception Distance (FID) ranking strategy, thereby enhancing purification performance. Consequently, only a single pre-trained diffusion model is required for purifying universal adversarial samples with heterogeneous patterns across each dataset, significantly reducing training efforts for multiple attack settings while maintaining high performance without prior knowledge of adversarial perturbations. Experimental results on four heterogeneous RS datasets, focusing on scene classification and semantic segmentation, demonstrate that UAD-RS outperforms state-of-the-art adversarial purification approaches, providing universal defense against seven commonly encountered adversarial perturbations. Codes and the pre-trained models are available online (https://github.com/EricYu97/UAD-RS).

5/28/2024

Privacy-preserving Universal Adversarial Defense for Black-box Models

Qiao Li, Cong Wu, Jing Chen, Zijun Zhang, Kun He, Ruiying Du, Xinxin Wang, Qingchuang Zhao, Yang Liu

Deep neural networks (DNNs) are increasingly used in critical applications such as identity authentication and autonomous driving, where robustness against adversarial attacks is crucial. These attacks can exploit minor perturbations to cause significant prediction errors, making it essential to enhance the resilience of DNNs. Traditional defense methods often rely on access to detailed model information, which raises privacy concerns, as model owners may be reluctant to share such data. In contrast, existing black-box defense methods fail to offer a universal defense against various types of adversarial attacks. To address these challenges, we introduce DUCD, a universal black-box defense method that does not require access to the target model's parameters or architecture. Our approach involves distilling the target model by querying it with data, creating a white-box surrogate while preserving data privacy. We further enhance this surrogate model using a certified defense based on randomized smoothing and optimized noise selection, enabling robust defense against a broad range of adversarial attacks. Comparative evaluations between the certified defenses of the surrogate and target models demonstrate the effectiveness of our approach. Experiments on multiple image classification datasets show that DUCD not only outperforms existing black-box defenses but also matches the accuracy of white-box defenses, all while enhancing data privacy and reducing the success rate of membership inference attacks.

8/21/2024

DiffZOO: A Purely Query-Based Black-Box Attack for Red-teaming Text-to-Image Generative Model via Zeroth Order Optimization

Pucheng Dang, Xing Hu, Dong Li, Rui Zhang, Qi Guo, Kaidi Xu

Current text-to-image (T2I) synthesis diffusion models raise misuse concerns, particularly in creating prohibited or not-safe-for-work (NSFW) images. To address this, various safety mechanisms and red teaming attack methods are proposed to enhance or expose the T2I model's capability to generate unsuitable content. However, many red teaming attack methods assume knowledge of the text encoders, limiting their practical usage. In this work, we rethink the case of textit{purely black-box} attacks without prior knowledge of the T2l model. To overcome the unavailability of gradients and the inability to optimize attacks within a discrete prompt space, we propose DiffZOO which applies Zeroth Order Optimization to procure gradient approximations and harnesses both C-PRV and D-PRV to enhance attack prompts within the discrete prompt domain. We evaluated our method across multiple safety mechanisms of the T2I diffusion model and online servers. Experiments on multiple state-of-the-art safety mechanisms show that DiffZOO attains an 8.5% higher average attack success rate than previous works, hence its promise as a practical red teaming tool for T2l models.

8/22/2024