On the Impact of Dataset Properties on Membership Privacy of Deep Learning

2402.06674

Published 6/13/2024 by Marlon Tobaben, Joonas Jalko, Gauri Pradhan, Yuan He, Antti Honkela

On the Impact of Dataset Properties on Membership Privacy of Deep Learning

Abstract

We apply a state-of-the-art membership inference attack (MIA) to systematically test the practical privacy vulnerability of fine-tuning large image classification models. We focus on understanding the properties of data sets and samples that make them vulnerable to membership inference. In terms of data set properties, we find a strong power law dependence between the number of examples per class in the data and the MIA vulnerability, as measured by true positive rate of the attack at a low false positive rate. We train a linear model to predict true positive rate based on data set properties and observe good fit for MIA vulnerability on unseen data. To analyse the phenomenon theoretically, we reproduce the result on a simplified model of membership inference that behaves similarly to our experimental data. We prove that in this model, the logarithm of the difference of true and false positive rates depends linearly on the logarithm of the number of examples per class.For an individual sample, the gradient norm is predictive of its vulnerability.

Create account to get full access

Overview

This paper explores the practical aspects of membership privacy in deep learning models, examining how well they can protect the privacy of the data used to train them.
The researchers investigate different methods for attacking membership privacy and assess their effectiveness in real-world scenarios.
They also propose new techniques to enhance membership privacy and evaluate their performance.

Plain English Explanation

Machine learning models, like those used for image recognition or language processing, are trained on large datasets. These datasets often contain sensitive information about the individuals or entities used to create them. The paper explores the risk that this sensitive information could be extracted from the trained model, even by someone who doesn't have access to the original dataset.

The researchers looked at different ways that an attacker could try to determine whether a specific data point was used to train a model. This is known as a "membership inference attack." They tested these attacks in realistic scenarios to see how effective they could be. The researchers also developed new techniques to make it harder for attackers to figure out if a data point was part of the training set, improving the model's "membership privacy."

By understanding the strengths and weaknesses of membership privacy in deep learning, the researchers hope to help develop more secure and privacy-preserving machine learning systems.

Technical Explanation

The paper first provides background on membership inference attacks, which aim to determine whether a given data point was part of the training set used to create a machine learning model. The researchers review prior work on these attacks and the theoretical limits of membership privacy.

They then propose several new methods for attacking membership privacy in deep learning models. These include:

Techniques based on model outputs, which try to detect anomalies in the model's predictions for a given input.
Difficulty-calibrated attacks, which adjust the attack strategy based on the model's confidence in its predictions.
Low-cost, high-performance attacks that can be executed efficiently.

The researchers evaluate these new attack methods on several real-world deep learning models and datasets, measuring their effectiveness at identifying members of the training set. They also propose techniques to enhance membership privacy, such as adding noise to model outputs, and test their impact.

Critical Analysis

The paper provides a comprehensive and rigorous analysis of membership privacy in deep learning. The researchers cover a wide range of attack strategies and evaluate them in realistic settings, which is important for understanding the practical implications.

However, the analysis is limited to specific model architectures and datasets. It's unclear how the results would generalize to other deep learning applications or more complex models, such as large language models. The researchers acknowledge this as a limitation and suggest further research is needed.

Additionally, the paper focuses on membership privacy but does not address other privacy concerns in machine learning, such as the potential for models to leak sensitive information about individuals in their outputs. Exploring these broader privacy challenges could be an area for future work.

Conclusion

This paper makes significant contributions to our understanding of membership privacy in deep learning. By developing new attack methods and evaluating them thoroughly, the researchers have shed light on the practical risks and limitations of protecting training data privacy.

The findings suggest that membership privacy is a serious concern that needs to be addressed as machine learning systems become more widely deployed. The techniques proposed in the paper for enhancing membership privacy could help mitigate these risks, but continued research is needed to ensure the security and privacy of deep learning applications, particularly for sensitive domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤯

Fundamental Limits of Membership Inference Attacks on Machine Learning Models

Eric Aubinais, Elisabeth Gassiat, Pablo Piantanida

Membership inference attacks (MIA) can reveal whether a particular data point was part of the training dataset, potentially exposing sensitive information about individuals. This article provides theoretical guarantees by exploring the fundamental statistical limitations associated with MIAs on machine learning models. More precisely, we first derive the statistical quantity that governs the effectiveness and success of such attacks. We then theoretically prove that in a non-linear regression setting with overfitting algorithms, attacks may have a high probability of success. Finally, we investigate several situations for which we provide bounds on this quantity of interest. Interestingly, our findings indicate that discretizing the data might enhance the algorithm's security. Specifically, it is demonstrated to be limited by a constant, which quantifies the diversity of the underlying data distribution. We illustrate those results through two simple simulations.

6/12/2024

stat.ML cs.AI cs.LG

Lost in the Averages: A New Specific Setup to Evaluate Membership Inference Attacks Against Machine Learning Models

Florent Gu'epin (Department of Computing, Imperial College London, United Kingdom), Natav{s}a Krv{c}o (Department of Computing, Imperial College London, United Kingdom), Matthieu Meeus (Department of Computing, Imperial College London, United Kingdom), Yves-Alexandre de Montjoye (Department of Computing, Imperial College London, United Kingdom)

Membership Inference Attacks (MIAs) are widely used to evaluate the propensity of a machine learning (ML) model to memorize an individual record and the privacy risk releasing the model poses. MIAs are commonly evaluated similarly to ML models: the MIA is performed on a test set of models trained on datasets unseen during training, which are sampled from a larger pool, $D_{eval}$. The MIA is evaluated across all datasets in this test set, and is thus evaluated across the distribution of samples from $D_{eval}$. While this was a natural extension of ML evaluation to MIAs, recent work has shown that a record's risk heavily depends on its specific dataset. For example, outliers are particularly vulnerable, yet an outlier in one dataset may not be one in another. The sources of randomness currently used to evaluate MIAs may thus lead to inaccurate individual privacy risk estimates. We propose a new, specific evaluation setup for MIAs against ML models, using weight initialization as the sole source of randomness. This allows us to accurately evaluate the risk associated with the release of a model trained on a specific dataset. Using SOTA MIAs, we empirically show that the risk estimates given by the current setup lead to many records being misclassified as low risk. We derive theoretical results which, combined with empirical evidence, suggest that the risk calculated in the current setup is an average of the risks specific to each sampled dataset, validating our use of weight initialization as the only source of randomness. Finally, we consider an MIA with a stronger adversary leveraging information about the target dataset to infer membership. Taken together, our results show that current MIA evaluation is averaging the risk across datasets leading to inaccurate risk estimates, and the risk posed by attacks leveraging information about the target dataset to be potentially underestimated.

5/27/2024

cs.LG cs.CR

Learning-Based Difficulty Calibration for Enhanced Membership Inference Attacks

Haonan Shi, Tu Ouyang, An Wang

Machine learning models, in particular deep neural networks, are currently an integral part of various applications, from healthcare to finance. However, using sensitive data to train these models raises concerns about privacy and security. One method that has emerged to verify if the trained models are privacy-preserving is Membership Inference Attacks (MIA), which allows adversaries to determine whether a specific data point was part of a model's training dataset. While a series of MIAs have been proposed in the literature, only a few can achieve high True Positive Rates (TPR) in the low False Positive Rate (FPR) region (0.01%~1%). This is a crucial factor to consider for an MIA to be practically useful in real-world settings. In this paper, we present a novel approach to MIA that is aimed at significantly improving TPR at low FPRs. Our method, named learning-based difficulty calibration for MIA(LDC-MIA), characterizes data records by their hardness levels using a neural network classifier to determine membership. The experiment results show that LDC-MIA can improve TPR at low FPR by up to 4x compared to the other difficulty calibration based MIAs. It also has the highest Area Under ROC curve (AUC) across all datasets. Our method's cost is comparable with most of the existing MIAs, but is orders of magnitude more efficient than one of the state-of-the-art methods, LiRA, while achieving similar performance.

5/9/2024

cs.CR cs.AI cs.LG

LLM Dataset Inference: Did you train on my dataset?

Pratyush Maini, Hengrui Jia, Nicolas Papernot, Adam Dziedzic

The proliferation of large language models (LLMs) in the real world has come with a rise in copyright cases against companies for training their models on unlicensed data from the internet. Recent works have presented methods to identify if individual text sequences were members of the model's training data, known as membership inference attacks (MIAs). We demonstrate that the apparent success of these MIAs is confounded by selecting non-members (text sequences not used for training) belonging to a different distribution from the members (e.g., temporally shifted recent Wikipedia articles compared with ones used to train the model). This distribution shift makes membership inference appear successful. However, most MIA methods perform no better than random guessing when discriminating between members and non-members from the same distribution (e.g., in this case, the same period of time). Even when MIAs work, we find that different MIAs succeed at inferring membership of samples from different distributions. Instead, we propose a new dataset inference method to accurately identify the datasets used to train large language models. This paradigm sits realistically in the modern-day copyright landscape, where authors claim that an LLM is trained over multiple documents (such as a book) written by them, rather than one particular paragraph. While dataset inference shares many of the challenges of membership inference, we solve it by selectively combining the MIAs that provide positive signal for a given distribution, and aggregating them to perform a statistical test on a given dataset. Our approach successfully distinguishes the train and test sets of different subsets of the Pile with statistically significant p-values < 0.1, without any false positives.

6/11/2024

cs.LG cs.CL cs.CR