Stable Signature is Unstable: Removing Image Watermark from Diffusion Models

Read original: arXiv:2405.07145 - Published 5/14/2024 by Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong

🖼️

Overview

The paper discusses a new attack that can effectively remove the watermark from a diffusion model, despite the model using a watermarking technique called "Stable Signature" proposed by Meta.
Watermarking is a widely used industry technique to detect AI-generated images.
Stable Signature aims to root the watermark into the parameters of a diffusion model's decoder, making the generated images inherently watermarked.
The paper proposes a new attack that can remove the watermark while maintaining the visual quality of the generated images.

Plain English Explanation

Watermarking is a technique used by many companies to identify if an image was created by an AI system. Stable Signature is a new watermarking method developed by Meta that tries to "embed" the watermark directly into the AI model itself, so that any images generated by the model will automatically have the watermark.

The paper in question shows that this Stable Signature watermark can actually be removed by "fine-tuning" the AI model. Fine-tuning is a technique where you take an existing AI model and train it a bit more on some new data. The researchers found that by fine-tuning the diffusion model with the Stable Signature watermark, they could remove the watermark completely, while still keeping the quality of the generated images high.

This suggests that the Stable Signature watermark may not be as "stable" or difficult to remove as claimed. The researchers have demonstrated a simple way to bypass this watermarking technique, which could be concerning for companies relying on it to identify AI-generated content.

Technical Explanation

The paper proposes a new attack to remove the watermark from a diffusion model that uses the Stable Signature watermarking technique. Diffusion models are a type of AI system that can generate realistic images.

The Stable Signature watermarking approach embeds the watermark directly into the parameters of the diffusion model's decoder, rather than just adding it as a post-processing step. This is claimed to make the watermark more robust against removal attacks.

However, the researchers in this paper show that they can effectively remove the Stable Signature watermark by fine-tuning the diffusion model. Fine-tuning involves further training the model on some new data, which the researchers found could remove the watermark while still maintaining the visual quality of the generated images.

The paper provides experimental results demonstrating the effectiveness of this fine-tuning attack against the Stable Signature watermarking technique. This suggests that the Stable Signature watermark may not be as stable or difficult to remove as previously thought.

Critical Analysis

The paper provides a convincing demonstration that the Stable Signature watermarking technique is vulnerable to a fine-tuning attack. This is an important finding, as Stable Signature was claimed to be a robust watermarking approach for diffusion models.

However, the paper does not explore the full scope of potential attacks or countermeasures. It is possible that there could be ways to make the Stable Signature watermark more resistant to fine-tuning or other removal techniques. The paper also does not discuss the broader implications of this vulnerability for the use of watermarking to detect AI-generated content.

Additionally, the paper focuses only on diffusion models, and it is unclear if the fine-tuning attack would be equally effective against other types of AI systems that might use Stable Signature watermarking. Further research would be needed to understand the generalizability of this attack.

Overall, this paper makes a valuable contribution by highlighting a significant weakness in the Stable Signature watermarking approach. However, more work is likely needed to fully understand the implications and potential countermeasures for this type of attack.

Conclusion

This paper demonstrates that the Stable Signature watermarking technique, proposed by Meta as a robust way to detect AI-generated images, can be effectively removed through a simple fine-tuning attack. The researchers show that they can remove the watermark from a diffusion model while still maintaining the visual quality of the generated images.

This finding challenges the claim that Stable Signature is a stable and difficult-to-remove watermarking approach. It suggests that companies and researchers relying on Stable Signature to identify AI-generated content may need to reevaluate its effectiveness and explore alternative watermarking or detection methods.

The paper highlights the ongoing arms race between those developing techniques to detect AI-generated content and those seeking to bypass such detection methods. As AI systems become more advanced, the need for robust and reliable watermarking and verification techniques will only continue to grow. This research contributes to that broader dialogue and underscores the importance of critical analysis and continued innovation in this space.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

Stable Signature is Unstable: Removing Image Watermark from Diffusion Models

Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong

Watermark has been widely deployed by industry to detect AI-generated images. A recent watermarking framework called emph{Stable Signature} (proposed by Meta) roots watermark into the parameters of a diffusion model's decoder such that its generated images are inherently watermarked. Stable Signature makes it possible to watermark images generated by emph{open-source} diffusion models and was claimed to be robust against removal attacks. In this work, we propose a new attack to remove the watermark from a diffusion model by fine-tuning it. Our results show that our attack can effectively remove the watermark from a diffusion model such that its generated images are non-watermarked, while maintaining the visual quality of the generated images. Our results highlight that Stable Signature is not as stable as previously thought.

5/14/2024

A Training-Free Plug-and-Play Watermark Framework for Stable Diffusion

Guokai Zhang, Lanjun Wang, Yuting Su, An-An Liu

Nowadays, the family of Stable Diffusion (SD) models has gained prominence for its high quality outputs and scalability. This has also raised security concerns on social media, as malicious users can create and disseminate harmful content. Existing approaches involve training components or entire SDs to embed a watermark in generated images for traceability and responsibility attribution. However, in the era of AI-generated content (AIGC), the rapid iteration of SDs renders retraining with watermark models costly. To address this, we propose a training-free plug-and-play watermark framework for SDs. Without modifying any components of SDs, we embed diverse watermarks in the latent space, adapting to the denoising process. Our experimental findings reveal that our method effectively harmonizes image quality and watermark invisibility. Furthermore, it performs robustly under various attacks. We also have validated that our method is generalized to multiple versions of SDs, even without retraining the watermark model.

4/9/2024

🔍

Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space

Zheling Meng, Bo Peng, Jing Dong

Watermarking is a tool for actively identifying and attributing the images generated by latent diffusion models. Existing methods face the dilemma of image quality and watermark robustness. Watermarks with superior image quality usually have inferior robustness against attacks such as blurring and JPEG compression, while watermarks with superior robustness usually significantly damage image quality. This dilemma stems from the traditional paradigm where watermarks are injected and detected in pixel space, relying on pixel perturbation for watermark detection and resilience against attacks. In this paper, we highlight that an effective solution to the problem is to both inject and detect watermarks in the latent diffusion space, and propose Latent Watermark with a progressive training strategy. It weakens the direct connection between quality and robustness and thus alleviates their contradiction. We conduct evaluations on two datasets and against 10 watermark attacks. 6 metrics measure the image quality and watermark robustness. Results show that compared to the recently proposed methods such as StegaStamp, StableSignature, RoSteALS, and TreeRing, LW not only surpasses them in terms of robustness but also offers superior image quality. Our code will be available at https://github.com/RichardSunnyMeng/LatentWatermark.

7/15/2024

Certifiably Robust Image Watermark

Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Jinyuan Jia, Neil Zhenqiang Gong

Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns. Watermarking AI-generated content is a key technology to address these concerns and has been widely deployed in industry. However, watermarking is vulnerable to removal attacks and forgery attacks. In this work, we propose the first image watermarks with certified robustness guarantees against removal and forgery attacks. Our method leverages randomized smoothing, a popular technique to build certifiably robust classifiers and regression models. Our major technical contributions include extending randomized smoothing to watermarking by considering its unique characteristics, deriving the certified robustness guarantees, and designing algorithms to estimate them. Moreover, we extensively evaluate our image watermarks in terms of both certified and empirical robustness. Our code is available at url{https://github.com/zhengyuan-jiang/Watermark-Library}.

7/8/2024