Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization

Read original: arXiv:2405.20584 - Published 7/29/2024 by Yisu Liu, Jinyang An, Wanqian Zhang, Dayan Wu, Jingzi Gu, Zheng Lin, Weiping Wang

Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization

Overview

• This paper explores a new type of adversarial attack called a "token-level attention erasure attack" that can disrupt diffusion-based customization models.

• The researchers show how this attack can be used to degrade the performance of diffusion models on various tasks, including text-to-image generation, image-to-image translation, and inpainting.

• The paper provides insights into the vulnerabilities of diffusion models and how they can be exploited, which could have important implications for the security and robustness of these models.

Plain English Explanation

Diffusion models are a powerful type of machine learning model that can be used for a variety of tasks, such as generating images, translating images, and filling in missing parts of an image. These models work by gradually transforming a random input into a desired output, such as a realistic-looking image.

However, the researchers of this paper have discovered a new way to trick these diffusion models. They developed a "token-level attention erasure attack" that can degrade the performance of diffusion models on various tasks. This attack works by selectively erasing or modifying certain parts of the input data, which can cause the diffusion model to produce significantly different and often lower-quality outputs.

For example, if you were using a diffusion model to generate an image of a cat, this attack could "erase" the attention paid to the cat's face, causing the model to generate an image that doesn't look like a cat at all. Or if you were using a diffusion model to translate an image from one style to another, the attack could disrupt the model's ability to accurately perform the translation.

The implications of this research are important for the security and robustness of diffusion models. It shows that these models can be vulnerable to targeted attacks that can significantly degrade their performance, which could have serious consequences in real-world applications. The researchers hope that their findings will inspire further research into developing more secure and reliable diffusion models that are resistant to this type of attack.

Technical Explanation

The key innovation in this paper is the development of a "token-level attention erasure attack" that can be used to disrupt the performance of diffusion-based customization models. The researchers show how this attack can be applied to various tasks, including text-to-image generation, image-to-image translation, and inpainting.

At a high level, the attack works by selectively erasing or modifying certain "tokens" (i.e., small, discrete units of input data) in the input to the diffusion model. By targeting the attention paid to these tokens, the attack can cause the model to produce significantly different and often lower-quality outputs.

The researchers conduct a series of experiments to evaluate the effectiveness of their attack. They show that it can degrade the performance of diffusion models on a range of benchmark tasks, with the degree of degradation depending on factors like the number of tokens erased and the specific task being performed.

The paper also provides insights into the vulnerabilities of diffusion models and the mechanisms by which the token-level attention erasure attack can exploit these vulnerabilities. For example, the researchers find that the attack is particularly effective at disrupting the model's ability to capture long-range dependencies in the input data, which is a key strength of diffusion models.

Overall, this research highlights the importance of developing more secure and robust diffusion models that are resistant to this type of targeted attack. The findings could have significant implications for the real-world deployment of diffusion models, particularly in applications where security and reliability are critical.

Critical Analysis

The token-level attention erasure attack presented in this paper is a compelling and potentially impactful contribution to the field of adversarial machine learning. The researchers have demonstrated the ability to significantly degrade the performance of diffusion-based models across a range of tasks, which raises important questions about the security and robustness of these models.

One potential limitation of the research is that the experiments were conducted on a relatively limited set of tasks and datasets. It would be valuable to see the attack evaluated on a broader range of diffusion models and applications to better understand its generalizability. Additionally, the paper does not explore potential mitigations or defenses against the attack, which would be an important area for further research.

Another area for further investigation is the underlying mechanisms by which the token-level attention erasure attack achieves its effects. While the paper provides some insights into the vulnerabilities of diffusion models, a more detailed analysis of the specific ways in which the attention mechanism can be exploited could lead to a deeper understanding of the attack and potentially inform the development of more robust models.

Overall, this research represents a significant contribution to the understanding of diffusion model security and opens up new avenues for exploration in the field of adversarial machine learning. As diffusion models continue to gain prominence in a wide range of applications, the insights and techniques presented in this paper will likely become increasingly important for ensuring the reliability and trustworthiness of these models in real-world settings.

Conclusion

This paper introduces a novel "token-level attention erasure attack" that can disrupt the performance of diffusion-based customization models on a variety of tasks, including text-to-image generation, image-to-image translation, and inpainting. The researchers demonstrate the effectiveness of this attack and provide insights into the vulnerabilities of diffusion models that can be exploited.

The implications of this research are significant, as it highlights the need for developing more secure and robust diffusion models that are resistant to targeted attacks. As these models continue to gain prominence in a wide range of applications, understanding and addressing their potential weaknesses will be crucial for ensuring their reliability and trustworthiness in real-world settings.

The insights and techniques presented in this paper could also have broader implications for the field of adversarial machine learning, inspiring further research into the security and robustness of other types of AI models. Overall, this work represents an important contribution to the ongoing effort to build more secure and reliable machine learning systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization

Yisu Liu, Jinyang An, Wanqian Zhang, Dayan Wu, Jingzi Gu, Zheng Lin, Weiping Wang

With the development of diffusion-based customization methods like DreamBooth, individuals now have access to train the models that can generate their personalized images. Despite the convenience, malicious users have misused these techniques to create fake images, thereby triggering a privacy security crisis. In light of this, proactive adversarial attacks are proposed to protect users against customization. The adversarial examples are trained to distort the customization model's outputs and thus block the misuse. In this paper, we propose DisDiff (Disrupting Diffusion), a novel adversarial attack method to disrupt the diffusion model outputs. We first delve into the intrinsic image-text relationships, well-known as cross-attention, and empirically find that the subject-identifier token plays an important role in guiding image generation. Thus, we propose the Cross-Attention Erasure module to explicitly erase the indicated attention maps and disrupt the text guidance. Besides,we analyze the influence of the sampling process of the diffusion model on Projected Gradient Descent (PGD) attack and introduce a novel Merit Sampling Scheduler to adaptively modulate the perturbation updating amplitude in a step-aware manner. Our DisDiff outperforms the state-of-the-art methods by 12.75% of FDFR scores and 7.25% of ISM scores across two facial benchmarks and two commonly used prompts on average.

7/29/2024

🧠

Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

Jingyao Xu, Yuetong Lu, Yandong Li, Siyang Lu, Dongdong Wang, Xiang Wei

Diffusion models (DMs) embark a new era of generative modeling and offer more opportunities for efficient generating high-quality and realistic data samples. However, their widespread use has also brought forth new challenges in model security, which motivates the creation of more effective adversarial attackers on DMs to understand its vulnerability. We propose CAAT, a simple but generic and efficient approach that does not require costly training to effectively fool latent diffusion models (LDMs). The approach is based on the observation that cross-attention layers exhibits higher sensitivity to gradient change, allowing for leveraging subtle perturbations on published images to significantly corrupt the generated images. We show that a subtle perturbation on an image can significantly impact the cross-attention layers, thus changing the mapping between text and image during the fine-tuning of customized diffusion models. Extensive experiments demonstrate that CAAT is compatible with diverse diffusion models and outperforms baseline attack methods in a more effective (more noise) and efficient (twice as fast as Anti-DreamBooth and Mist) manner.

6/17/2024

🧪

Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective

Xiaoxuan Han, Songlin Yang, Wei Wang, Yang Li, Jing Dong

Advanced text-to-image diffusion models raise safety concerns regarding identity privacy violation, copyright infringement, and Not Safe For Work content generation. Towards this, unlearning methods have been developed to erase these involved concepts from diffusion models. However, these unlearning methods only shift the text-to-image mapping and preserve the visual content within the generative space of diffusion models, leaving a fatal flaw for restoring these erased concepts. This erasure trustworthiness problem needs probe, but previous methods are sub-optimal from two perspectives: (1) Lack of transferability: Some methods operate within a white-box setting, requiring access to the unlearned model. And the learned adversarial input often fails to transfer to other unlearned models for concept restoration; (2) Limited attack: The prompt-level methods struggle to restore narrow concepts from unlearned models, such as celebrity identity. Therefore, this paper aims to leverage the transferability of the adversarial attack to probe the unlearning robustness under a black-box setting. This challenging scenario assumes that the unlearning method is unknown and the unlearned model is inaccessible for optimization, requiring the attack to be capable of transferring across different unlearned models. Specifically, we employ an adversarial search strategy to search for the adversarial embedding which can transfer across different unlearned models. This strategy adopts the original Stable Diffusion model as a surrogate model to iteratively erase and search for embeddings, enabling it to find the embedding that can restore the target concept for different unlearning methods. Extensive experiments demonstrate the transferability of the searched adversarial embedding across several state-of-the-art unlearning methods and its effectiveness for different levels of concepts.

5/1/2024

Disrupting Diffusion-based Inpainters with Semantic Digression

Geonho Son, Juhun Lee, Simon S. Woo

The fabrication of visual misinformation on the web and social media has increased exponentially with the advent of foundational text-to-image diffusion models. Namely, Stable Diffusion inpainters allow the synthesis of maliciously inpainted images of personal and private figures, and copyrighted contents, also known as deepfakes. To combat such generations, a disruption framework, namely Photoguard, has been proposed, where it adds adversarial noise to the context image to disrupt their inpainting synthesis. While their framework suggested a diffusion-friendly approach, the disruption is not sufficiently strong and it requires a significant amount of GPU and time to immunize the context image. In our work, we re-examine both the minimal and favorable conditions for a successful inpainting disruption, proposing DDD, a Digression guided Diffusion Disruption framework. First, we identify the most adversarially vulnerable diffusion timestep range with respect to the hidden space. Within this scope of noised manifold, we pose the problem as a semantic digression optimization. We maximize the distance between the inpainting instance's hidden states and a semantic-aware hidden state centroid, calibrated both by Monte Carlo sampling of hidden states and a discretely projected optimization in the token space. Effectively, our approach achieves stronger disruption and a higher success rate than Photoguard while lowering the GPU memory requirement, and speeding the optimization up to three times faster.

7/16/2024