Lost in the Averages: A New Specific Setup to Evaluate Membership Inference Attacks Against Machine Learning Models

2405.15423

Published 5/27/2024 by Florent Gu'epin (Department of Computing, Imperial College London, United Kingdom), Natav{s}a Krv{c}o (Department of Computing, Imperial College London, United Kingdom), Matthieu Meeus (Department of Computing, Imperial College London, United Kingdom), Yves-Alexandre de Montjoye (Department of Computing and 2 others

cs.LG cs.CR

Lost in the Averages: A New Specific Setup to Evaluate Membership Inference Attacks Against Machine Learning Models

Abstract

Membership Inference Attacks (MIAs) are widely used to evaluate the propensity of a machine learning (ML) model to memorize an individual record and the privacy risk releasing the model poses. MIAs are commonly evaluated similarly to ML models: the MIA is performed on a test set of models trained on datasets unseen during training, which are sampled from a larger pool, $D_{eval}$. The MIA is evaluated across all datasets in this test set, and is thus evaluated across the distribution of samples from $D_{eval}$. While this was a natural extension of ML evaluation to MIAs, recent work has shown that a record's risk heavily depends on its specific dataset. For example, outliers are particularly vulnerable, yet an outlier in one dataset may not be one in another. The sources of randomness currently used to evaluate MIAs may thus lead to inaccurate individual privacy risk estimates. We propose a new, specific evaluation setup for MIAs against ML models, using weight initialization as the sole source of randomness. This allows us to accurately evaluate the risk associated with the release of a model trained on a specific dataset. Using SOTA MIAs, we empirically show that the risk estimates given by the current setup lead to many records being misclassified as low risk. We derive theoretical results which, combined with empirical evidence, suggest that the risk calculated in the current setup is an average of the risks specific to each sampled dataset, validating our use of weight initialization as the only source of randomness. Finally, we consider an MIA with a stronger adversary leveraging information about the target dataset to infer membership. Taken together, our results show that current MIA evaluation is averaging the risk across datasets leading to inaccurate risk estimates, and the risk posed by attacks leveraging information about the target dataset to be potentially underestimated.

Create account to get full access

Overview

This paper introduces a new specific setup for evaluating membership inference attacks against machine learning models.
Membership inference attacks aim to determine whether a given data sample was used to train a machine learning model.
The authors argue that existing evaluation setups may miss important aspects of these attacks, and propose a new approach to address this.

Plain English Explanation

The paper focuses on a type of attack called a "membership inference attack" against machine learning models. These attacks try to figure out whether a specific data sample was used to train a particular machine learning model.

The authors explain that current ways of evaluating these attacks may be missing important details. They propose a new way to set up these evaluations that they believe provides a more accurate and comprehensive assessment.

The key idea is to look at the performance of membership inference attacks in a more targeted and specific way, rather than just looking at average performance across many different samples. This allows them to uncover nuances that could be important for understanding and defending against these types of attacks.

For example, the paper on "Learning-Based Difficulty Calibration Enhanced Membership Inference" shows how the difficulty of membership inference can vary depending on the specific data samples involved. The new evaluation approach in this paper aims to capture those kinds of details more effectively.

Technical Explanation

The core contribution of this paper is a new evaluation setup for membership inference attacks against machine learning models. Existing setups often rely on average performance metrics across a broad range of data samples. The authors argue that this can miss important details about the attack's performance.

Their new approach involves defining a "specific setup" that focuses on a particular subset of the training data. This allows them to analyze how the attack performs on these specific samples, rather than just looking at overall averages.

The key elements of their specific setup include:

Partitioning the training data into "target" and "non-target" subsets.
Training the machine learning model only on the non-target subset.
Evaluating the membership inference attack's ability to correctly identify samples from the target subset.

This targeted evaluation provides more granular insights into the attack's performance compared to a general, averaged approach. The authors demonstrate their setup using several real-world datasets and machine learning models.

Critical Analysis

The authors make a compelling case that existing evaluation setups for membership inference attacks may overlook important nuances. Their specific setup provides a more detailed and contextual assessment of attack performance.

However, one potential limitation is that the specific setup may be more labor-intensive or require more a priori knowledge about the data and model. Practitioners may need to put in additional effort to define the appropriate target and non-target subsets for their particular use case.

Additionally, while the authors show the benefits of their approach, there could be further work to understand how the specific setup relates to real-world attack scenarios. For example, the paper on "Towards a Game-Theoretic Understanding of Explanation-Based Membership Inference" explores membership inference in the context of model explanations, which could be an interesting angle to consider within the specific setup framework.

Overall, this paper makes a valuable contribution by highlighting the limitations of average-based evaluation metrics and proposing a more targeted approach. Further research could build on this foundation to develop robust and comprehensive methodologies for assessing membership inference risks.

Conclusion

This paper introduces a new specific setup for evaluating membership inference attacks against machine learning models. The authors argue that existing evaluation approaches may miss important details by relying too heavily on average performance metrics.

Their specific setup focuses on a targeted subset of the training data, allowing for a more granular analysis of the attack's performance. This provides insights that could be important for understanding and defending against these types of attacks in real-world applications.

While the specific setup may require additional effort, it represents a valuable step forward in developing more comprehensive and meaningful evaluation methodologies for membership inference. As machine learning models become increasingly prevalent, tools like this will be crucial for ensuring their privacy and security.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤯

Fundamental Limits of Membership Inference Attacks on Machine Learning Models

Eric Aubinais, Elisabeth Gassiat, Pablo Piantanida

Membership inference attacks (MIA) can reveal whether a particular data point was part of the training dataset, potentially exposing sensitive information about individuals. This article provides theoretical guarantees by exploring the fundamental statistical limitations associated with MIAs on machine learning models. More precisely, we first derive the statistical quantity that governs the effectiveness and success of such attacks. We then theoretically prove that in a non-linear regression setting with overfitting algorithms, attacks may have a high probability of success. Finally, we investigate several situations for which we provide bounds on this quantity of interest. Interestingly, our findings indicate that discretizing the data might enhance the algorithm's security. Specifically, it is demonstrated to be limited by a constant, which quantifies the diversity of the underlying data distribution. We illustrate those results through two simple simulations.

6/12/2024

stat.ML cs.AI cs.LG

On the Impact of Dataset Properties on Membership Privacy of Deep Learning

Marlon Tobaben, Joonas Jalko, Gauri Pradhan, Yuan He, Antti Honkela

We apply a state-of-the-art membership inference attack (MIA) to systematically test the practical privacy vulnerability of fine-tuning large image classification models. We focus on understanding the properties of data sets and samples that make them vulnerable to membership inference. In terms of data set properties, we find a strong power law dependence between the number of examples per class in the data and the MIA vulnerability, as measured by true positive rate of the attack at a low false positive rate. We train a linear model to predict true positive rate based on data set properties and observe good fit for MIA vulnerability on unseen data. To analyse the phenomenon theoretically, we reproduce the result on a simplified model of membership inference that behaves similarly to our experimental data. We prove that in this model, the logarithm of the difference of true and false positive rates depends linearly on the logarithm of the number of examples per class.For an individual sample, the gradient norm is predictive of its vulnerability.

6/13/2024

cs.CR cs.LG

Learning-Based Difficulty Calibration for Enhanced Membership Inference Attacks

Haonan Shi, Tu Ouyang, An Wang

Machine learning models, in particular deep neural networks, are currently an integral part of various applications, from healthcare to finance. However, using sensitive data to train these models raises concerns about privacy and security. One method that has emerged to verify if the trained models are privacy-preserving is Membership Inference Attacks (MIA), which allows adversaries to determine whether a specific data point was part of a model's training dataset. While a series of MIAs have been proposed in the literature, only a few can achieve high True Positive Rates (TPR) in the low False Positive Rate (FPR) region (0.01%~1%). This is a crucial factor to consider for an MIA to be practically useful in real-world settings. In this paper, we present a novel approach to MIA that is aimed at significantly improving TPR at low FPRs. Our method, named learning-based difficulty calibration for MIA(LDC-MIA), characterizes data records by their hardness levels using a neural network classifier to determine membership. The experiment results show that LDC-MIA can improve TPR at low FPR by up to 4x compared to the other difficulty calibration based MIAs. It also has the highest Area Under ROC curve (AUC) across all datasets. Our method's cost is comparable with most of the existing MIAs, but is orders of magnitude more efficient than one of the state-of-the-art methods, LiRA, while achieving similar performance.

5/9/2024

cs.CR cs.AI cs.LG

🤯

Low-Cost High-Power Membership Inference Attacks

Sajjad Zarifzadeh, Philippe Liu, Reza Shokri

Membership inference attacks aim to detect if a particular data point was used in training a model. We design a novel statistical test to perform robust membership inference attacks (RMIA) with low computational overhead. We achieve this by a fine-grained modeling of the null hypothesis in our likelihood ratio tests, and effectively leveraging both reference models and reference population data samples. RMIA has superior test power compared with prior methods, throughout the TPR-FPR curve (even at extremely low FPR, as low as 0). Under computational constraints, where only a limited number of pre-trained reference models (as few as 1) are available, and also when we vary other elements of the attack (e.g., data distribution), our method performs exceptionally well, unlike prior attacks that approach random guessing. RMIA lays the groundwork for practical yet accurate data privacy risk assessment in machine learning.

6/13/2024

stat.ML cs.CR cs.LG