SemiAdv: Query-Efficient Black-Box Adversarial Attack with Unlabeled Images

Read original: arXiv:2407.11073 - Published 7/17/2024 by Mingyuan Fan, Yang Liu, Cen Chen, Ximeng Liu

SemiAdv: Query-Efficient Black-Box Adversarial Attack with Unlabeled Images

Overview

This paper introduces a new black-box adversarial attack called SemiAdv that is query-efficient and can generate adversarial examples without access to labeled data.
SemiAdv leverages a semi-supervised approach to learn a generative model of adversarial perturbations from unlabeled images, allowing it to craft adversarial examples with far fewer queries to the target model.
The authors demonstrate the effectiveness of SemiAdv on multiple benchmark datasets and target models, showing that it outperforms previous state-of-the-art black-box attacks in terms of attack success rate and query efficiency.

Plain English Explanation

In the world of machine learning, adversarial attacks are a significant concern. These are small, intentional changes to an input that can trick a model into making incorrect predictions. SemiAdv: Query-Efficient Black-Box Adversarial Attack with Unlabeled Images presents a new type of adversarial attack called SemiAdv that is particularly efficient and effective.

The key insight behind SemiAdv is that it doesn't require access to labeled data to craft adversarial examples. Instead, it learns a generative model of adversarial perturbations from unlabeled images. This allows SemiAdv to generate high-quality adversarial examples with far fewer queries to the target model, making it much more efficient than previous black-box attacks.

The authors demonstrate that SemiAdv outperforms state-of-the-art black-box attacks on multiple benchmark datasets and target models, achieving higher attack success rates while making fewer queries. This is an important advance, as query efficiency is critical for real-world applications where access to the target model may be limited or expensive.

Technical Explanation

The core idea of SemiAdv is to leverage a semi-supervised approach to learn a generative model of adversarial perturbations from unlabeled images. This allows the attack to craft adversarial examples without needing access to labeled data, which is often a limiting factor for previous black-box attacks.

The authors first train a variational autoencoder (VAE) on a large set of unlabeled images. They then fine-tune the VAE to generate adversarial perturbations that, when added to clean images, cause the target model to misclassify the resulting adversarial examples. This fine-tuning process uses a small number of queries to the target model to guide the perturbation generation.

Once the SemiAdv model is trained, it can efficiently generate adversarial examples for new inputs by passing them through the VAE and adding the generated perturbations. The authors show that this approach achieves higher attack success rates compared to previous black-box attacks, such as Certifiable Black-Box Attacks with Randomized Adversarial Examples and BRUScale Attack: Query-Efficient Score-Based Black-Box Attacks, while making significantly fewer queries to the target model.

The authors also investigate the transferability of the adversarial examples generated by SemiAdv, demonstrating that they can be effective against multiple target models, even when the models have different architectures. This suggests that the learned perturbations capture general properties of adversarial examples rather than being specific to a particular model.

Critical Analysis

One potential limitation of SemiAdv is that it relies on a semi-supervised approach to learn the generative model of adversarial perturbations. While this allows the attack to be query-efficient, it may be less flexible than fully supervised approaches that can directly optimize for the target model's specific vulnerabilities. Additionally, the authors do not explore the impact of the quality or quantity of the unlabeled data used to train the VAE, which could be an important factor in the attack's performance.

Another area for further research would be to investigate the broader implications of SemiAdv and other semi-supervised adversarial attacks. As Self-Supervised Representation Learning for Adversarial Attack Detection and From Attack to Defense: Insights into Deep Learning-based Adversarial Example Detection have shown, adversarial attacks can provide valuable insights into the weaknesses of machine learning models, which can then be used to improve their robustness. The semi-supervised nature of SemiAdv may offer new avenues for developing more adversarially robust PAC-learnable models.

Conclusion

The SemiAdv attack presented in this paper represents an important advancement in the field of black-box adversarial attacks. By leveraging a semi-supervised approach to learn a generative model of adversarial perturbations, SemiAdv can craft highly effective adversarial examples while making far fewer queries to the target model. This improved query efficiency is a crucial capability for real-world applications, where access to the target model may be limited or expensive.

The authors have demonstrated the effectiveness of SemiAdv on multiple benchmark datasets and target models, and have also shown that the generated adversarial examples exhibit strong transferability. These findings suggest that SemiAdv could be a valuable tool for assessing the robustness of machine learning models and driving the development of more secure and reliable systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

SemiAdv: Query-Efficient Black-Box Adversarial Attack with Unlabeled Images

Mingyuan Fan, Yang Liu, Cen Chen, Ximeng Liu

Adversarial attack has garnered considerable attention due to its profound implications for the secure deployment of robots in sensitive security scenarios. To potentially push for advances in the field, this paper studies the adversarial attack in the black-box setting and proposes an unlabeled data-driven adversarial attack method, called SemiAdv. Specifically, SemiAdv achieves the following breakthroughs compared with previous works. First, by introducing the semi-supervised learning technique into the adversarial attack, SemiAdv substantially decreases the number of queries required for generating adversarial samples. On average, SemiAdv only needs to query a few hundred times to launch an effective attack with more than 90% success rate. Second, many existing black-box adversarial attacks require massive labeled data to mitigate the difference between the local substitute model and the remote target model for a good attack performance. While SemiAdv relaxes this limitation and is capable of utilizing unlabeled raw data to launch an effective attack. Finally, our experiments show that SemiAdv saves up to 12x query accesses for generating adversarial samples while maintaining a competitive attack success rate compared with state-of-the-art attacks.

7/17/2024

Semi-Supervised Variational Adversarial Active Learning via Learning to Rank and Agreement-Based Pseudo Labeling

Zongyao Lyu, William J. Beksi

Active learning aims to alleviate the amount of labor involved in data labeling by automating the selection of unlabeled samples via an acquisition function. For example, variational adversarial active learning (VAAL) leverages an adversarial network to discriminate unlabeled samples from labeled ones using latent space information. However, VAAL has the following shortcomings: (i) it does not exploit target task information, and (ii) unlabeled data is only used for sample selection rather than model training. To address these limitations, we introduce novel techniques that significantly improve the use of abundant unlabeled data during training and take into account the task information. Concretely, we propose an improved pseudo-labeling algorithm that leverages information from all unlabeled data in a semi-supervised manner, thus allowing a model to explore a richer data space. In addition, we develop a ranking-based loss prediction module that converts predicted relative ranking information into a differentiable ranking loss. This loss can be embedded as a rank variable into the latent space of a variational autoencoder and then trained with a discriminator in an adversarial fashion for sample selection. We demonstrate the superior performance of our approach over the state of the art on various image classification and segmentation benchmark datasets.

8/26/2024

🤷

A Characterization of Semi-Supervised Adversarially-Robust PAC Learnability

Idan Attias, Steve Hanneke, Yishay Mansour

We study the problem of learning an adversarially robust predictor to test time attacks in the semi-supervised PAC model. We address the question of how many labeled and unlabeled examples are required to ensure learning. We show that having enough unlabeled data (the size of a labeled sample that a fully-supervised method would require), the labeled sample complexity can be arbitrarily smaller compared to previous works, and is sharply characterized by a different complexity measure. We prove nearly matching upper and lower bounds on this sample complexity. This shows that there is a significant benefit in semi-supervised robust learning even in the worst-case distribution-free model, and establishes a gap between the supervised and semi-supervised label complexities which is known not to hold in standard non-robust PAC learning.

5/7/2024

Detecting Adversarial Attacks in Semantic Segmentation via Uncertainty Estimation: A Deep Analysis

Kira Maag, Roman Resner, Asja Fischer

Deep neural networks have demonstrated remarkable effectiveness across a wide range of tasks such as semantic segmentation. Nevertheless, these networks are vulnerable to adversarial attacks that add imperceptible perturbations to the input image, leading to false predictions. This vulnerability is particularly dangerous in safety-critical applications like automated driving. While adversarial examples and defense strategies are well-researched in the context of image classification, there is comparatively less research focused on semantic segmentation. Recently, we have proposed an uncertainty-based method for detecting adversarial attacks on neural networks for semantic segmentation. We observed that uncertainty, as measured by the entropy of the output distribution, behaves differently on clean versus adversely perturbed images, and we utilize this property to differentiate between the two. In this extended version of our work, we conduct a detailed analysis of uncertainty-based detection of adversarial attacks including a diverse set of adversarial attacks and various state-of-the-art neural networks. Our numerical experiments show the effectiveness of the proposed uncertainty-based detection method, which is lightweight and operates as a post-processing step, i.e., no model modifications or knowledge of the adversarial example generation process are required.

8/20/2024