EvolBA: Evolutionary Boundary Attack under Hard-label Black Box condition

Read original: arXiv:2407.02248 - Published 7/10/2024 by Ayane Tajima, Satoshi Ono

EvolBA: Evolutionary Boundary Attack under Hard-label Black Box condition

Overview

Proposes a new evolutionary algorithm called EvolBA for conducting hard-label black box adversarial attacks on deep neural networks
Shows EvolBA can effectively generate adversarial examples that fool classifiers without access to model gradients or internal parameters
Demonstrates EvolBA outperforms prior black box attack methods on multiple datasets and model architectures

Plain English Explanation

The paper introduces a new technique called EvolBA that uses an evolutionary algorithm to generate adversarial examples that can trick deep neural network classifiers. Unlike some previous attack methods, EvolBA operates in a "hard-label black box" setting, meaning it doesn't have access to the model's internal parameters or gradients.

The key idea behind EvolBA is to view the process of finding adversarial examples as an optimization problem. The algorithm starts with a benign input and then iteratively modifies it using an evolutionary strategy, with the goal of finding a slightly perturbed version that the classifier will misclassify. The modifications are guided by a fitness function that measures how close the perturbed input is to the decision boundary of the classifier.

EvolBA is able to effectively generate adversarial examples that fool classifiers across multiple datasets and model architectures, outperforming prior black box attack methods. This demonstrates the potential risks of deep learning systems, as they can be vulnerable to carefully crafted adversarial inputs, even when the attacker has limited information about the model's internals.

Technical Explanation

The paper proposes a new evolutionary algorithm called EvolBA for conducting hard-label black box adversarial attacks on deep neural networks. EvolBA starts with a benign input and iteratively modifies it using a Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to find a perturbed input that fools the target classifier.

The key components of EvolBA are:

Fitness Function: EvolBA uses a formula-driven supervised learning approach to define a fitness function that measures how close the perturbed input is to the decision boundary of the classifier.
CMA-ES Optimization: EvolBA leverages the CMA-ES algorithm to efficiently explore the space of possible perturbations and find adversarial examples.
Termination Criterion: EvolBA stops the optimization process when it finds a perturbed input that is misclassified by the target model.

The paper evaluates EvolBA on multiple datasets and model architectures, and shows that it outperforms prior black box attack methods in terms of attack success rate and query efficiency.

Critical Analysis

The paper provides a thorough evaluation of EvolBA's performance, including comparisons to other state-of-the-art black box attack methods. However, the authors acknowledge a few limitations of their approach:

EvolBA may require a large number of queries to the target model, which could be impractical in some real-world scenarios.
The performance of EvolBA may degrade as the dimensionality of the input space increases.
The paper only considers image classification tasks, and it's unclear how well EvolBA would perform on other types of machine learning models or data domains.

Additionally, the paper does not discuss the broader implications of adversarial attacks or potential defenses against them. Further research is needed to better understand the security risks posed by hard-label black box attacks and develop effective countermeasures.

Conclusion

In summary, the EvolBA algorithm proposed in this paper demonstrates the potential vulnerability of deep neural network classifiers to carefully crafted adversarial examples, even when the attacker has limited information about the model's internal structure and parameters. The paper's technical contributions and thorough evaluation provide valuable insights for the machine learning security community, while also highlighting the need for continued research on robust and secure deep learning systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

EvolBA: Evolutionary Boundary Attack under Hard-label Black Box condition

Ayane Tajima, Satoshi Ono

Research has shown that deep neural networks (DNNs) have vulnerabilities that can lead to the misrecognition of Adversarial Examples (AEs) with specifically designed perturbations. Various adversarial attack methods have been proposed to detect vulnerabilities under hard-label black box (HL-BB) conditions in the absence of loss gradients and confidence scores.However, these methods fall into local solutions because they search only local regions of the search space. Therefore, this study proposes an adversarial attack method named EvolBA to generate AEs using Covariance Matrix Adaptation Evolution Strategy (CMA-ES) under the HL-BB condition, where only a class label predicted by the target DNN model is available. Inspired by formula-driven supervised learning, the proposed method introduces domain-independent operators for the initialization process and a jump that enhances search exploration. Experimental results confirmed that the proposed method could determine AEs with smaller perturbations than previous methods in images where the previous methods have difficulty.

7/10/2024

Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models!

Shashank Kotyan, Po-Yuan Mao, Pin-Yu Chen, Danilo Vasconcellos Vargas

Deep neural networks can be exploited using natural adversarial samples, which do not impact human perception. Current approaches often rely on deep neural networks' white-box nature to generate these adversarial samples or synthetically alter the distribution of adversarial samples compared to the training distribution. In contrast, we propose EvoSeed, a novel evolutionary strategy-based algorithmic framework for generating photo-realistic natural adversarial samples. Our EvoSeed framework uses auxiliary Conditional Diffusion and Classifier models to operate in a black-box setting. We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional Diffusion Model, results in the natural adversarial sample misclassified by the Classifier Model. Experiments show that generated adversarial images are of high image quality, raising concerns about generating harmful content bypassing safety classifiers. Our research opens new avenues to understanding the limitations of current safety mechanisms and the risk of plausible attacks against classifier systems using image generation. Project Website can be accessed at: https://shashankkotyan.github.io/EvoSeed.

5/24/2024

Evaluating the Robustness of Deep-Learning Algorithm-Selection Models by Evolving Adversarial Instances

Emma Hart, Quentin Renau, Kevin Sim, Mohamad Alissa

Deep neural networks (DNN) are increasingly being used to perform algorithm-selection in combinatorial optimisation domains, particularly as they accommodate input representations which avoid designing and calculating features. Mounting evidence from domains that use images as input shows that deep convolutional networks are vulnerable to adversarial samples, in which a small perturbation of an instance can cause the DNN to misclassify. However, it remains unknown as to whether deep recurrent networks (DRN) which have recently been shown promise as algorithm-selectors in the bin-packing domain are equally vulnerable. We use an evolutionary algorithm (EA) to find perturbations of instances from two existing benchmarks for online bin packing that cause trained DRNs to misclassify: adversarial samples are successfully generated from up to 56% of the original instances depending on the dataset. Analysis of the new misclassified instances sheds light on the `fragility' of some training instances, i.e. instances where it is trivial to find a small perturbation that results in a misclassification and the factors that influence this. Finally, the method generates a large number of new instances misclassified with a wide variation in confidence, providing a rich new source of training data to create more robust models.

6/26/2024

🔍

Post-train Black-box Defense via Bayesian Boundary Correction

He Wang, Yunfeng Diao

Classifiers based on deep neural networks are susceptible to adversarial attack, where the widely existing vulnerability has invoked the research in defending them from potential threats. Given a vulnerable classifier, existing defense methods are mostly white-box and often require re-training the victim under modified loss functions/training regimes. While the model/data/training specifics of the victim are usually unavailable to the user, re-training is unappealing, if not impossible for reasons such as limited computational resources. To this end, we propose a new post-train black-box defense framework. It can turn any pre-trained classifier into a resilient one with little knowledge of the model specifics. This is achieved by new joint Bayesian treatments on the clean data, the adversarial examples and the classifier, for maximizing their joint probability. It is further equipped with a new post-train strategy which keeps the victim intact, avoiding re-training. We name our framework Bayesian Boundary Correction (BBC). BBC is a general and flexible framework that can easily adapt to different data types. We instantiate BBC for image classification and skeleton-based human activity recognition, for both static and dynamic data. Exhaustive evaluation shows that BBC has superior robustness and can enhance robustness without severely hurting the clean accuracy, compared with existing defense methods.

6/12/2024