Transferability Ranking of Adversarial Examples

2208.10878

Published 4/19/2024 by Mosh Levy, Guy Amit, Yuval Elovici, Yisroel Mirsky

🌿

Abstract

Adversarial transferability in black-box scenarios presents a unique challenge: while attackers can employ surrogate models to craft adversarial examples, they lack assurance on whether these examples will successfully compromise the target model. Until now, the prevalent method to ascertain success has been trial and error-testing crafted samples directly on the victim model. This approach, however, risks detection with every attempt, forcing attackers to either perfect their first try or face exposure. Our paper introduces a ranking strategy that refines the transfer attack process, enabling the attacker to estimate the likelihood of success without repeated trials on the victim's system. By leveraging a set of diverse surrogate models, our method can predict transferability of adversarial examples. This strategy can be used to either select the best sample to use in an attack or the best perturbation to apply to a specific sample. Using our strategy, we were able to raise the transferability of adversarial examples from a mere 20% - akin to random selection-up to near upper-bound levels, with some scenarios even witnessing a 100% success rate. This substantial improvement not only sheds light on the shared susceptibilities across diverse architectures but also demonstrates that attackers can forego the detectable trial-and-error tactics raising increasing the threat of surrogate-based attacks.

Create account to get full access

Overview

Attackers can use surrogate models to craft adversarial examples, but lack assurance that these examples will successfully compromise the target model.
The traditional approach of directly testing crafted samples on the victim model risks detection with each attempt.
This paper introduces a ranking strategy that enables attackers to estimate the likelihood of success without repeated trials on the victim's system.

Plain English Explanation

Adversarial attacks are a type of cyber attack where attackers try to fool machine learning models by making small, imperceptible changes to input data. In a black-box scenario, attackers don't have direct access to the target model, so they use surrogate models to craft adversarial examples. However, they can't be sure these examples will actually work against the real target model.

Traditionally, attackers would just try their crafted examples directly on the victim model and see if they worked. But this approach is risky, as each failed attempt could be detected by the target system. It's like trying to pick a lock - you don't want to get caught fiddling with the door too many times.

This paper introduces a new ranking strategy that helps attackers estimate the likelihood of success without repeatedly testing on the victim's system. By leveraging a diverse set of surrogate models, the method can predict how "transferable" the adversarial examples will be - that is, how likely they are to work on the real target model.

This allows attackers to either select the best sample to use in an attack, or the best perturbation (the small changes they make to the input) to apply to a specific sample. Using this strategy, the researchers were able to dramatically improve the transferability of adversarial examples, from around 20% success (basically random chance) up to near 100% success in some cases.

This is significant because it shows that attackers may be able to launch successful adversarial attacks without repeatedly trying their examples on the victim model, reducing the risk of detection. It also suggests that diverse machine learning models may share common vulnerabilities that attackers can exploit.

Technical Explanation

The key innovation in this paper is a ranking strategy that allows attackers to predict the transferability of adversarial examples across different target models, without needing to directly test them on the victim system.

The researchers start by generating a set of surrogate models with diverse architectures. They then craft adversarial examples using these surrogate models and measure how well the examples transfer to each other. This allows them to build a transferability matrix that captures the relationships between the surrogate models.

Using this transferability matrix, the researchers can then rank adversarial examples based on their predicted likelihood of success against the real target model. This allows the attacker to either:

Select the best sample: Choose the adversarial example that is most likely to succeed against the target.
Select the best perturbation: Identify the specific perturbation (input modification) that is most likely to result in a transferable adversarial example.

In their experiments, the researchers demonstrate that this ranking strategy can dramatically improve the transferability of adversarial examples, from around 20% success (random chance) up to near 100% success in some cases. This suggests that there may be shared vulnerabilities across diverse model architectures that attackers can exploit.

Critical Analysis

The paper provides a valuable contribution by introducing a principled approach to assessing the transferability of adversarial examples in black-box scenarios. However, there are a few potential issues and limitations to consider:

Reliance on Surrogate Models: The effectiveness of the ranking strategy depends on the surrogate models used. If the surrogate models do not adequately capture the vulnerabilities of the target model, the transferability predictions may be inaccurate. More research is needed to understand how to select or generate suitable surrogate models.
Potential for Detection: While the ranking strategy reduces the need for repeated trials on the victim model, it does not eliminate the risk of detection entirely. Attackers may still need to perform some initial probing or reconnaissance on the target system to gather the necessary information for the ranking process.
Real-World Applicability: The experiments in the paper were conducted in a controlled, academic setting. More research is needed to understand how well the ranking strategy would perform in real-world, dynamic environments where target models and their vulnerabilities may change over time.
Broader Implications: The success of this attack strategy highlights the need for continued research into adversarial robustness and bias mitigation in machine learning models. As attackers become more sophisticated, the potential for concept drift and other evolving threats will likely increase.

Overall, this paper makes an important contribution to the field of adversarial machine learning, but there is still much work to be done to fully understand and address the challenges posed by black-box adversarial attacks.

Conclusion

This paper introduces a novel ranking strategy that enables attackers to estimate the likelihood of success for adversarial examples in black-box scenarios, without needing to repeatedly test their samples on the victim model. By leveraging a diverse set of surrogate models, the method can predict the transferability of adversarial examples, allowing attackers to select the best sample or perturbation to use in an attack.

The researchers were able to substantially improve the transferability of adversarial examples using this strategy, demonstrating that attackers may be able to launch successful black-box attacks without risking detection through repeated trials. This suggests that diverse machine learning models may share common vulnerabilities that can be exploited by sophisticated adversaries.

While the paper makes an important contribution, there are still some limitations and areas for further research, such as the reliance on suitable surrogate models and the potential for detection even with the ranking strategy. Nonetheless, this work highlights the need for continued advancements in adversarial robustness and bias mitigation to stay ahead of evolving cyber threats.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤿

A Survey on Transferability of Adversarial Examples across Deep Neural Networks

Jindong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li, Xiaochun Cao, Philip Torr

The emergence of Deep Neural Networks (DNNs) has revolutionized various domains by enabling the resolution of complex tasks spanning image recognition, natural language processing, and scientific problem-solving. However, this progress has also brought to light a concerning vulnerability: adversarial examples. These crafted inputs, imperceptible to humans, can manipulate machine learning models into making erroneous predictions, raising concerns for safety-critical applications. An intriguing property of this phenomenon is the transferability of adversarial examples, where perturbations crafted for one model can deceive another, often with a different architecture. This intriguing property enables black-box attacks which circumvents the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples. We categorize existing methodologies to enhance adversarial transferability and discuss the fundamental principles guiding each approach. While the predominant body of research primarily concentrates on image classification, we also extend our discussion to encompass other vision tasks and beyond. Challenges and opportunities are discussed, highlighting the importance of fortifying DNNs against adversarial vulnerabilities in an evolving landscape.

5/3/2024

cs.CV

Adversarial Example Soups: Improving Transferability and Stealthiness for Free

Bo Yang, Hengwei Zhang, Jindong Wang, Yulong Yang, Chenhao Lin, Chao Shen, Zhengyu Zhao

Transferable adversarial examples cause practical security risks since they can mislead a target model without knowing its internal knowledge. A conventional recipe for maximizing transferability is to keep only the optimal adversarial example from all those obtained in the optimization pipeline. In this paper, for the first time, we question this convention and demonstrate that those discarded, sub-optimal adversarial examples can be reused to boost transferability. Specifically, we propose ``Adversarial Example Soups'' (AES), with AES-tune for averaging discarded adversarial examples in hyperparameter tuning and AES-rand for stability testing. In addition, our AES is inspired by ``model soups'', which averages weights of multiple fine-tuned models for improved accuracy without increasing inference time. Extensive experiments validate the global effectiveness of our AES, boosting 10 state-of-the-art transfer attacks and their combinations by up to 13% against 10 diverse (defensive) target models. We also show the possibility of generalizing AES to other types, e.g., directly averaging multiple in-the-wild adversarial examples that yield comparable success. A promising byproduct of AES is the improved stealthiness of adversarial examples since the perturbation variances are naturally reduced.

5/1/2024

cs.CV

Transferable Availability Poisoning Attacks

Yiyong Liu, Michael Backes, Xiao Zhang

We consider availability data poisoning attacks, where an adversary aims to degrade the overall test accuracy of a machine learning model by crafting small perturbations to its training data. Existing poisoning strategies can achieve the attack goal but assume the victim to employ the same learning method as what the adversary uses to mount the attack. In this paper, we argue that this assumption is strong, since the victim may choose any learning algorithm to train the model as long as it can achieve some targeted performance on clean data. Empirically, we observe a large decrease in the effectiveness of prior poisoning attacks if the victim employs an alternative learning algorithm. To enhance the attack transferability, we propose Transferable Poisoning, which first leverages the intrinsic characteristics of alignment and uniformity to enable better unlearnability within contrastive learning, and then iteratively utilizes the gradient information from supervised and unsupervised contrastive learning paradigms to generate the poisoning perturbations. Through extensive experiments on image benchmarks, we show that our transferable poisoning attack can produce poisoned samples with significantly improved transferability, not only applicable to the two learners used to devise the attack but also to learning algorithms and even paradigms beyond.

6/7/2024

cs.CR cs.LG

Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability

Junqi Gao, Biqing Qi, Yao Li, Zhichang Guo, Dong Li, Yuming Xing, Dazhi Zhang

The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class instead of low sample density regions. Therefore, in the target setting, adding perturbations towards HSDR of the target class is more effective in improving transferability. However, density estimation is challenging in high-dimensional scenarios. Further theoretical and experimental verification demonstrates that easy samples with low loss are more likely to be located in HSDR. Perturbations towards such easy samples in the target class can avoid density estimation for HSDR location. Based on the above facts, we verified that adding perturbations to easy samples in the target class improves targeted adversarial transferability of existing attack methods. A generative targeted attack strategy named Easy Sample Matching Attack (ESMA) is proposed, which has a higher success rate for targeted attacks and outperforms the SOTA generative method. Moreover, ESMA requires only 5% of the storage space and much less computation time comparing to the current SOTA, as ESMA attacks all classes with only one model instead of seperate models for each class. Our code is available at https://github.com/gjq100/ESMA.

6/11/2024

cs.LG cs.AI cs.CR