A Survey on Transferability of Adversarial Examples across Deep Neural Networks

2310.17626

Published 5/3/2024 by Jindong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li and 2 others

cs.CV

🤿

Abstract

The emergence of Deep Neural Networks (DNNs) has revolutionized various domains by enabling the resolution of complex tasks spanning image recognition, natural language processing, and scientific problem-solving. However, this progress has also brought to light a concerning vulnerability: adversarial examples. These crafted inputs, imperceptible to humans, can manipulate machine learning models into making erroneous predictions, raising concerns for safety-critical applications. An intriguing property of this phenomenon is the transferability of adversarial examples, where perturbations crafted for one model can deceive another, often with a different architecture. This intriguing property enables black-box attacks which circumvents the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples. We categorize existing methodologies to enhance adversarial transferability and discuss the fundamental principles guiding each approach. While the predominant body of research primarily concentrates on image classification, we also extend our discussion to encompass other vision tasks and beyond. Challenges and opportunities are discussed, highlighting the importance of fortifying DNNs against adversarial vulnerabilities in an evolving landscape.

Create account to get full access

Overview

The emergence of Deep Neural Networks (DNNs) has revolutionized various domains, but also introduced a concerning vulnerability: adversarial examples.
Adversarial examples are crafted inputs that can manipulate machine learning models into making erroneous predictions, posing risks for safety-critical applications.
An intriguing property of adversarial examples is their transferability, where perturbations crafted for one model can deceive another, often with a different architecture.
This survey explores the landscape of the adversarial transferability of adversarial examples, categorizing existing methodologies and discussing the fundamental principles guiding each approach.

Plain English Explanation

Machine learning models, particularly Deep Neural Networks (DNNs), have made remarkable progress in tackling complex tasks like image recognition, natural language processing, and scientific problem-solving. However, these models have also revealed a concerning vulnerability: adversarial examples.

Adversarial examples are specially crafted inputs that appear normal to humans but can trick machine learning models into making incorrect predictions. This is a significant concern for safety-critical applications, where errors could have serious consequences.

An interesting property of adversarial examples is their transferability. This means that the perturbations (small changes) made to one model can often be used to deceive another model, even if it has a different architecture. This transferability enables black-box attacks, where attackers can manipulate a model without needing detailed knowledge of its inner workings.

This paper surveys the research on the transferability of adversarial examples, categorizing the different approaches and discussing the underlying principles. While much of the focus has been on image classification tasks, the paper also explores adversarial examples in other vision tasks and beyond.

The paper highlights the importance of fortifying DNNs against these adversarial vulnerabilities as the field continues to evolve. Understanding and addressing these issues is crucial for ensuring the safe and reliable deployment of machine learning systems.

Technical Explanation

The paper explores the phenomenon of adversarial transferability, where perturbations crafted to fool one machine learning model can also deceive other models, often with different architectures.

The authors categorize the existing methodologies for enhancing adversarial transferability and discuss the fundamental principles guiding each approach. These include:

Transferability Ranking of Adversarial Examples: Techniques that aim to identify the most transferable adversarial examples by analyzing their properties and characteristics.
Adversarial Example Soups: Methods that combine multiple adversarial examples to create "soups" that are more transferable across models.
Probing Unlearned Diffusion Models for Transferable Adversarial Attacks: Approaches that leverage unlearned components of diffusion models to generate transferable adversarial examples.
Efficiently Adversarial Examples Generation for Visual-Language Models: Techniques for generating transferable adversarial examples that target multimodal models combining vision and language.

The paper also discusses the extension of adversarial transferability beyond image classification tasks, exploring its implications in other vision tasks and beyond.

Critical Analysis

The paper provides a comprehensive survey of the research on adversarial transferability, highlighting the significant progress made in understanding and enhancing this phenomenon. However, it also acknowledges several challenges and limitations that warrant further investigation.

One key limitation is the predominant focus on image classification tasks. While the paper touches on other vision tasks and areas beyond, a more in-depth exploration of adversarial transferability in these domains would be valuable. Additionally, the paper does not delve into the potential societal implications and ethical concerns surrounding the misuse of adversarial attacks, which is an important consideration as these techniques become more advanced.

Furthermore, the paper does not address the long-term sustainability of the proposed defense mechanisms against adversarial examples. As machine learning models continue to evolve, it is essential to explore more robust and adaptable approaches to ensure the ongoing security and reliability of these systems.

Despite these limitations, the paper provides a solid foundation for understanding the current state of adversarial transferability research and highlights the need for continued efforts to address this critical vulnerability in machine learning systems.

Conclusion

This survey paper explores the intriguing phenomenon of adversarial transferability, where adversarial examples crafted for one machine learning model can often deceive other models with different architectures. The authors categorize the existing methodologies for enhancing adversarial transferability and discuss the fundamental principles guiding each approach.

While the paper primarily focuses on image classification tasks, it also examines the implications of adversarial transferability in other vision tasks and beyond. The survey highlights the importance of fortifying Deep Neural Networks (DNNs) against these adversarial vulnerabilities as the field of machine learning continues to evolve.

As the use of machine learning models becomes more widespread, understanding and addressing the risks posed by adversarial examples is crucial for ensuring the safe and reliable deployment of these systems in safety-critical applications. This survey serves as a valuable resource for researchers and practitioners in the field, providing a comprehensive overview of the current landscape and paving the way for future advancements in this important area of study.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🌿

Transferability Ranking of Adversarial Examples

Mosh Levy, Guy Amit, Yuval Elovici, Yisroel Mirsky

Adversarial transferability in black-box scenarios presents a unique challenge: while attackers can employ surrogate models to craft adversarial examples, they lack assurance on whether these examples will successfully compromise the target model. Until now, the prevalent method to ascertain success has been trial and error-testing crafted samples directly on the victim model. This approach, however, risks detection with every attempt, forcing attackers to either perfect their first try or face exposure. Our paper introduces a ranking strategy that refines the transfer attack process, enabling the attacker to estimate the likelihood of success without repeated trials on the victim's system. By leveraging a set of diverse surrogate models, our method can predict transferability of adversarial examples. This strategy can be used to either select the best sample to use in an attack or the best perturbation to apply to a specific sample. Using our strategy, we were able to raise the transferability of adversarial examples from a mere 20% - akin to random selection-up to near upper-bound levels, with some scenarios even witnessing a 100% success rate. This substantial improvement not only sheds light on the shared susceptibilities across diverse architectures but also demonstrates that attackers can forego the detectable trial-and-error tactics raising increasing the threat of surrogate-based attacks.

4/19/2024

cs.LG cs.CR

Properties that allow or prohibit transferability of adversarial attacks among quantized networks

Abhishek Shrestha, Jurgen Gro{ss}mann

Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples. Further, these adversarial examples are found to be transferable from the source network in which they are crafted to a black-box target network. As the trend of using deep learning on embedded devices grows, it becomes relevant to study the transferability properties of adversarial examples among compressed networks. In this paper, we consider quantization as a network compression technique and evaluate the performance of transfer-based attacks when the source and target networks are quantized at different bitwidths. We explore how algorithm specific properties affect transferability by considering various adversarial example generation algorithms. Furthermore, we examine transferability in a more realistic scenario where the source and target networks may differ in bitwidth and other model-related properties like capacity and architecture. We find that although quantization reduces transferability, certain attack types demonstrate an ability to enhance it. Additionally, the average transferability of adversarial examples among quantized versions of a network can be used to estimate the transferability to quantized target networks with varying capacity and architecture.

5/17/2024

cs.LG cs.AI

Adversarial Example Soups: Improving Transferability and Stealthiness for Free

Bo Yang, Hengwei Zhang, Jindong Wang, Yulong Yang, Chenhao Lin, Chao Shen, Zhengyu Zhao

Transferable adversarial examples cause practical security risks since they can mislead a target model without knowing its internal knowledge. A conventional recipe for maximizing transferability is to keep only the optimal adversarial example from all those obtained in the optimization pipeline. In this paper, for the first time, we question this convention and demonstrate that those discarded, sub-optimal adversarial examples can be reused to boost transferability. Specifically, we propose ``Adversarial Example Soups'' (AES), with AES-tune for averaging discarded adversarial examples in hyperparameter tuning and AES-rand for stability testing. In addition, our AES is inspired by ``model soups'', which averages weights of multiple fine-tuned models for improved accuracy without increasing inference time. Extensive experiments validate the global effectiveness of our AES, boosting 10 state-of-the-art transfer attacks and their combinations by up to 13% against 10 diverse (defensive) target models. We also show the possibility of generalizing AES to other types, e.g., directly averaging multiple in-the-wild adversarial examples that yield comparable success. A promising byproduct of AES is the improved stealthiness of adversarial examples since the perturbation variances are naturally reduced.

5/1/2024

cs.CV

Towards Transferable Targeted 3D Adversarial Attack in the Physical World

Yao Huang, Yinpeng Dong, Shouwei Ruan, Xiao Yang, Hang Su, Xingxing Wei

Compared with transferable untargeted attacks, transferable targeted adversarial attacks could specify the misclassification categories of adversarial samples, posing a greater threat to security-critical tasks. In the meanwhile, 3D adversarial samples, due to their potential of multi-view robustness, can more comprehensively identify weaknesses in existing deep learning systems, possessing great application value. However, the field of transferable targeted 3D adversarial attacks remains vacant. The goal of this work is to develop a more effective technique that could generate transferable targeted 3D adversarial examples, filling the gap in this field. To achieve this goal, we design a novel framework named TT3D that could rapidly reconstruct from few multi-view images into Transferable Targeted 3D textured meshes. While existing mesh-based texture optimization methods compute gradients in the high-dimensional mesh space and easily fall into local optima, leading to unsatisfactory transferability and distinct distortions, TT3D innovatively performs dual optimization towards both feature grid and Multi-layer Perceptron (MLP) parameters in the grid-based NeRF space, which significantly enhances black-box transferability while enjoying naturalness. Experimental results show that TT3D not only exhibits superior cross-model transferability but also maintains considerable adaptability across different renders and vision tasks. More importantly, we produce 3D adversarial examples with 3D printing techniques in the real world and verify their robust performance under various scenarios.

6/11/2024

cs.CV