Stealing Image-to-Image Translation Models With a Single Query

2406.00828

Published 6/4/2024 by Nurit Spingarn-Eliezer, Tomer Michaeli

Stealing Image-to-Image Translation Models With a Single Query

Abstract

Training deep neural networks requires significant computational resources and large datasets that are often confidential or expensive to collect. As a result, owners tend to protect their models by allowing access only via an API. Many works demonstrated the possibility of stealing such protected models by repeatedly querying the API. However, to date, research has predominantly focused on stealing classification models, for which a very large number of queries has been found necessary. In this paper, we study the possibility of stealing image-to-image models. Surprisingly, we find that many such models can be stolen with as little as a single, small-sized, query image using simple distillation. We study this phenomenon on a wide variety of model architectures, datasets, and tasks, including denoising, deblurring, deraining, super-resolution, and biological image-to-image translation. Remarkably, we find that the vulnerability to stealing attacks is shared by CNNs and by models with attention mechanisms, and that stealing is commonly possible even without knowing the architecture of the target model.

Create account to get full access

Overview

• This paper explores a novel attack called "Stealing Image-to-Image Translation Models With a Single Query" that can extract the underlying model of an image-to-image translation service with just a single query.

• The attack leverages the inherent structure and weaknesses of image-to-image translation models to effectively "steal" the model by observing its behavior on a single input.

• This research builds on previous work on model inversion attacks and dataset stealing attacks, demonstrating the vulnerability of these types of AI systems.

Plain English Explanation

Image-to-image translation models are a type of artificial intelligence that can transform one image into another, like turning a sketch into a photo-realistic painting. These models are often used in creative applications and are valuable intellectual property.

The researchers behind this paper discovered a way to effectively "steal" the inner workings of an image-to-image translation model by just sending it a single image and observing how it responds. This allows a bad actor to replicate the model without the owner's permission.

The key insight is that these translation models have an inherent structure that leaks information about their inner workings, even when only a single input is provided. The researchers found ways to exploit this weakness to extract the model's parameters and architecture.

This is a significant finding, as it shows that current image-to-image translation models may not be as secure as previously thought. The researchers hope this work will inspire the development of more robust and secure AI systems in the future.

Technical Explanation

The paper proposes a novel attack called "Stealing Image-to-Image Translation Models With a Single Query" that can extract the underlying model of an image-to-image translation service with just a single query.

The attack leverages the fact that image-to-image translation models have an inherent structure that leaks information about their inner workings, even when only a single input is provided. Specifically, the researchers found that the input-output mapping of these models exhibits a certain pattern that can be used to infer the model's parameters and architecture.

The attack works by first sending a carefully crafted input image to the target model and observing its output. The researchers then analyze the output to extract features that are characteristic of the model's structure. This information is then used to construct a surrogate model that closely approximates the target model.

The researchers evaluated their attack on several state-of-the-art image-to-image translation models, including pix2pix, CycleGAN, and UNIT. They found that their attack was able to successfully steal the target models with high fidelity, even when the models were trained on different datasets.

This work builds on previous research on model inversion attacks and dataset stealing attacks, which have shown the vulnerability of various AI systems to unauthorized access and replication.

Critical Analysis

The researchers provide a thorough analysis of their attack and its implications. They acknowledge that their attack relies on certain assumptions about the target models, such as their underlying architecture and training procedures. If these assumptions are violated, the attack may not be as effective.

Additionally, the researchers note that their attack may be mitigated by techniques like model watermarking or adversarial training. These approaches could make it more difficult for an attacker to extract the model's parameters and architecture.

It's also worth considering the broader ethical and legal implications of this research. While the researchers aim to raise awareness about the security vulnerabilities of image-to-image translation models, their work could potentially be misused by bad actors to steal and misuse valuable intellectual property.

Conclusion

This paper presents a novel attack that can effectively "steal" image-to-image translation models with just a single query. The researchers have demonstrated the vulnerability of these types of AI systems and hope that their work will inspire the development of more secure and robust AI models in the future.

The findings of this research have significant implications for the AI community, as they highlight the need for continued research into model security and the development of effective countermeasures against unauthorized model extraction and replication.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations

Ofir Shifman, Yair Weiss

Deep neural networks that achieve remarkable performance in image classification have previously been shown to be easily fooled by tiny transformations such as a one pixel translation of the input image. In order to address this problem, two approaches have been proposed in recent years. The first approach suggests using huge datasets together with data augmentation in the hope that a highly varied training set will teach the network to learn to be invariant. The second approach suggests using architectural modifications based on sampling theory to deal explicitly with image translations. In this paper, we show that these approaches still fall short in robustly handling 'natural' image translations that simulate a subtle change in camera orientation. Our findings reveal that a mere one-pixel translation can result in a significant change in the predicted image representation for approximately 40% of the test images in state-of-the-art models (e.g. open-CLIP trained on LAION-2B or DINO-v2) , while models that are explicitly constructed to be robust to cyclic translations can still be fooled with 1 pixel realistic (non-cyclic) translations 11% of the time. We present Robust Inference by Crop Selection: a simple method that can be proven to achieve any desired level of consistency, although with a modest tradeoff with the model's accuracy. Importantly, we demonstrate how employing this method reduces the ability to fool state-of-the-art models with a 1 pixel translation to less than 5% while suffering from only a 1% drop in classification accuracy. Additionally, we show that our method can be easy adjusted to deal with circular shifts as well. In such case we achieve 100% robustness to integer shifts with state-of-the-art accuracy, and with no need for any further training.

4/11/2024

cs.CV

🏋️

Transpose Attack: Stealing Datasets with Bidirectional Training

Guy Amit, Mosh Levy, Yisroel Mirsky

Deep neural networks are normally executed in the forward direction. However, in this work, we identify a vulnerability that enables models to be trained in both directions and on different tasks. Adversaries can exploit this capability to hide rogue models within seemingly legitimate models. In addition, in this work we show that neural networks can be taught to systematically memorize and retrieve specific samples from datasets. Together, these findings expose a novel method in which adversaries can exfiltrate datasets from protected learning environments under the guise of legitimate models. We focus on the data exfiltration attack and show that modern architectures can be used to secretly exfiltrate tens of thousands of samples with high fidelity, high enough to compromise data privacy and even train new models. Moreover, to mitigate this threat we propose a novel approach for detecting infected models.

5/20/2024

cs.LG cs.CR

Beyond Labeling Oracles: What does it mean to steal ML models?

Avital Shafran, Ilia Shumailov, Murat A. Erdogdu, Nicolas Papernot

Model extraction attacks are designed to steal trained models with only query access, as is often provided through APIs that ML-as-a-Service providers offer. Machine Learning (ML) models are expensive to train, in part because data is hard to obtain, and a primary incentive for model extraction is to acquire a model while incurring less cost than training from scratch. Literature on model extraction commonly claims or presumes that the attacker is able to save on both data acquisition and labeling costs. We thoroughly evaluate this assumption and find that the attacker often does not. This is because current attacks implicitly rely on the adversary being able to sample from the victim model's data distribution. We thoroughly research factors influencing the success of model extraction. We discover that prior knowledge of the attacker, i.e., access to in-distribution data, dominates other factors like the attack policy the adversary follows to choose which queries to make to the victim model API. Our findings urge the community to redefine the adversarial goals of ME attacks as current evaluation methods misinterpret the ME performance.

6/14/2024

cs.LG cs.CR

📊

DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models

Zhenting Wang, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas, Shiqing Ma

Recent text-to-image diffusion models have shown surprising performance in generating high-quality images. However, concerns have arisen regarding the unauthorized data usage during the training or fine-tuning process. One example is when a model trainer collects a set of images created by a particular artist and attempts to train a model capable of generating similar images without obtaining permission and giving credit to the artist. To address this issue, we propose a method for detecting such unauthorized data usage by planting the injected memorization into the text-to-image diffusion models trained on the protected dataset. Specifically, we modify the protected images by adding unique contents on these images using stealthy image warping functions that are nearly imperceptible to humans but can be captured and memorized by diffusion models. By analyzing whether the model has memorized the injected content (i.e., whether the generated images are processed by the injected post-processing function), we can detect models that had illegally utilized the unauthorized data. Experiments on Stable Diffusion and VQ Diffusion with different model training or fine-tuning methods (i.e, LoRA, DreamBooth, and standard training) demonstrate the effectiveness of our proposed method in detecting unauthorized data usages. Code: https://github.com/ZhentingWang/DIAGNOSIS.

4/10/2024

cs.CV cs.CR cs.LG