BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack

Read original: arXiv:2404.05311 - Published 6/4/2024 by Viet Quoc Vo, Ehsan Abbasnejad, Damith C. Ranasinghe

BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack

Overview

Proposes a query-efficient score-based black-box sparse adversarial attack called BruSLeAttack
Aims to generate sparse adversarial perturbations that can fool machine learning models with a small number of queries
Leverages gradient estimation and a sparse optimization approach to achieve efficient and effective attacks

Plain English Explanation

BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack is a new technique for tricking machine learning models into making incorrect predictions. The method works by finding small changes to the input data that can cause the model to misclassify it, even though the changes are barely noticeable to a human.

The key innovation is that BruSLeAttack can achieve these attacks using a small number of queries to the model, making it more efficient than previous approaches. It does this by estimating the gradient of the model's output with respect to the input, and then using a sparse optimization technique to find the smallest possible changes that will fool the model.

This is important because it means adversarial attacks can be launched against real-world machine learning systems with limited access, such as via an API. The sparse nature of the perturbations also makes them less likely to be detected by human observers.

Technical Explanation

The paper introduces BruSLeAttack, a query-efficient score-based black-box sparse adversarial attack. The key components are:

Gradient Estimation: Since the target model is a black-box, the method estimates the gradient of the model's output with respect to the input using finite differences. This allows the attack to be conducted without direct access to the model's parameters.
Sparse Optimization: BruSLeAttack formulates the adversarial perturbation problem as a sparse optimization task, aiming to find the smallest possible changes to the input that will cause misclassification. It uses an iterative shrinkage-thresholding algorithm to efficiently solve this optimization problem.
Query Efficiency: By carefully designing the gradient estimation and sparse optimization components, BruSLeAttack can generate adversarial examples with far fewer queries to the target model compared to previous black-box attack methods. This makes it practical for real-world deployment.

The paper evaluates BruSLeAttack on several benchmark datasets and models, demonstrating its effectiveness in generating sparse adversarial perturbations that successfully fool the target models. The attack outperforms previous state-of-the-art black-box attack methods in terms of both attack success rate and query efficiency.

Critical Analysis

The paper provides a thorough evaluation of BruSLeAttack and discusses its limitations and potential future research directions. Some key points:

The attack assumes the target model's output scores (logits) are accessible, which may not always be the case in real-world scenarios. Extensions to other black-box settings, such as only having access to the final classification decision, would be valuable.
The paper does not analyze the robustness of the generated adversarial examples, i.e., whether they would still be effective under common input transformations or defenses. Further research is needed to understand the practical limitations of the approach.
While the query efficiency is a significant improvement over prior methods, the absolute number of queries required may still be prohibitive for some applications. Exploring ways to further reduce the query complexity would be an interesting direction.
The paper focuses on image classification tasks, and it would be valuable to see how well BruSLeAttack generalizes to other domains, such as text classification or cybersecurity.

Overall, the BruSLeAttack method represents an important advance in the field of adversarial attacks, and the paper provides a solid technical foundation for further research in this area.

Conclusion

The BruSLeAttack method proposed in this paper demonstrates a significant advancement in the field of query-efficient black-box adversarial attacks. By combining gradient estimation and sparse optimization techniques, the approach can generate effective adversarial perturbations with far fewer queries to the target model compared to previous methods.

This improved efficiency makes BruSLeAttack a promising candidate for real-world applications, where access to machine learning models may be limited. The sparse nature of the generated perturbations also makes them less likely to be detected by human observers.

While the paper provides a thorough evaluation and discussion of the method's limitations, further research is needed to fully understand the practical implications and generalizability of BruSLeAttack. Exploring its applicability to a wider range of domains and robustness to common defenses would be valuable next steps.

Overall, this work represents an important contribution to the ongoing efforts to better understand and mitigate the security vulnerabilities of machine learning systems in the face of adversarial attacks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack

Viet Quoc Vo, Ehsan Abbasnejad, Damith C. Ranasinghe

We study the unique, less-well understood problem of generating sparse adversarial samples simply by observing the score-based replies to model queries. Sparse attacks aim to discover a minimum number-the l0 bounded-perturbations to model inputs to craft adversarial examples and misguide model decisions. But, in contrast to query-based dense attack counterparts against black-box models, constructing sparse adversarial perturbations, even when models serve confidence score information to queries in a score-based setting, is non-trivial. Because, such an attack leads to i) an NP-hard problem; and ii) a non-differentiable search space. We develop the BruSLeAttack-a new, faster (more query-efficient) Bayesian algorithm for the problem. We conduct extensive attack evaluations including an attack demonstration against a Machine Learning as a Service (MLaaS) offering exemplified by Google Cloud Vision and robustness testing of adversarial training regimes and a recent defense against black-box attacks. The proposed attack scales to achieve state-of-the-art attack success rates and query efficiency on standard computer vision tasks such as ImageNet across different model architectures. Our artefacts and DIY attack samples are available on GitHub. Importantly, our work facilitates faster evaluation of model vulnerabilities and raises our vigilance on the safety, security and reliability of deployed systems.

6/4/2024

✅

Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence

Hanbin Hong, Xinyu Zhang, Binghui Wang, Zhongjie Ba, Yuan Hong

Black-box adversarial attacks have demonstrated strong potential to compromise machine learning models by iteratively querying the target model or leveraging transferability from a local surrogate model. Recently, such attacks can be effectively mitigated by state-of-the-art (SOTA) defenses, e.g., detection via the pattern of sequential queries, or injecting noise into the model. To our best knowledge, we take the first step to study a new paradigm of black-box attacks with provable guarantees -- certifiable black-box attacks that can guarantee the attack success probability (ASP) of adversarial examples before querying over the target model. This new black-box attack unveils significant vulnerabilities of machine learning models, compared to traditional empirical black-box attacks, e.g., breaking strong SOTA defenses with provable confidence, constructing a space of (infinite) adversarial examples with high ASP, and the ASP of the generated adversarial examples is theoretically guaranteed without verification/queries over the target model. Specifically, we establish a novel theoretical foundation for ensuring the ASP of the black-box attack with randomized adversarial examples (AEs). Then, we propose several novel techniques to craft the randomized AEs while reducing the perturbation size for better imperceptibility. Finally, we have comprehensively evaluated the certifiable black-box attacks on the CIFAR10/100, ImageNet, and LibriSpeech datasets, while benchmarking with 16 SOTA black-box attacks, against various SOTA defenses in the domains of computer vision and speech recognition. Both theoretical and experimental results have validated the significance of the proposed attack. The code and all the benchmarks are available at url{https://github.com/datasec-lab/CertifiedAttack}.

9/9/2024

From Attack to Defense: Insights into Deep Learning Security Measures in Black-Box Settings

Firuz Juraev, Mohammed Abuhamad, Eric Chan-Tin, George K. Thiruvathukal, Tamer Abuhmed

Deep Learning (DL) is rapidly maturing to the point that it can be used in safety- and security-crucial applications. However, adversarial samples, which are undetectable to the human eye, pose a serious threat that can cause the model to misbehave and compromise the performance of such applications. Addressing the robustness of DL models has become crucial to understanding and defending against adversarial attacks. In this study, we perform comprehensive experiments to examine the effect of adversarial attacks and defenses on various model architectures across well-known datasets. Our research focuses on black-box attacks such as SimBA, HopSkipJump, MGAAttack, and boundary attacks, as well as preprocessor-based defensive mechanisms, including bits squeezing, median smoothing, and JPEG filter. Experimenting with various models, our results demonstrate that the level of noise needed for the attack increases as the number of layers increases. Moreover, the attack success rate decreases as the number of layers increases. This indicates that model complexity and robustness have a significant relationship. Investigating the diversity and robustness relationship, our experiments with diverse models show that having a large number of parameters does not imply higher robustness. Our experiments extend to show the effects of the training dataset on model robustness. Using various datasets such as ImageNet-1000, CIFAR-100, and CIFAR-10 are used to evaluate the black-box attacks. Considering the multiple dimensions of our analysis, e.g., model complexity and training dataset, we examined the behavior of black-box attacks when models apply defenses. Our results show that applying defense strategies can significantly reduce attack effectiveness. This research provides in-depth analysis and insight into the robustness of DL models against various attacks, and defenses.

5/6/2024

SemiAdv: Query-Efficient Black-Box Adversarial Attack with Unlabeled Images

Mingyuan Fan, Yang Liu, Cen Chen, Ximeng Liu

Adversarial attack has garnered considerable attention due to its profound implications for the secure deployment of robots in sensitive security scenarios. To potentially push for advances in the field, this paper studies the adversarial attack in the black-box setting and proposes an unlabeled data-driven adversarial attack method, called SemiAdv. Specifically, SemiAdv achieves the following breakthroughs compared with previous works. First, by introducing the semi-supervised learning technique into the adversarial attack, SemiAdv substantially decreases the number of queries required for generating adversarial samples. On average, SemiAdv only needs to query a few hundred times to launch an effective attack with more than 90% success rate. Second, many existing black-box adversarial attacks require massive labeled data to mitigate the difference between the local substitute model and the remote target model for a good attack performance. While SemiAdv relaxes this limitation and is capable of utilizing unlabeled raw data to launch an effective attack. Finally, our experiments show that SemiAdv saves up to 12x query accesses for generating adversarial samples while maintaining a competitive attack success rate compared with state-of-the-art attacks.

7/17/2024