Better Membership Inference Privacy Measurement through Discrepancy

Read original: arXiv:2405.15140 - Published 5/27/2024 by Ruihan Wu, Pengrun Huang, Kamalika Chaudhuri

Better Membership Inference Privacy Measurement through Discrepancy

Overview

This paper proposes a new method for measuring the privacy of machine learning models against membership inference attacks.
The key innovation is the use of "discrepancy" - the difference between a model's predictions on a member and non-member sample - as a privacy metric.
The authors show that this discrepancy-based metric provides better privacy evaluation than existing methods, which can be misleading.

Plain English Explanation

The paper is about how to measure the privacy of machine learning models. Machine learning models are often trained on private datasets, and there's a risk that an attacker could figure out if a specific data point was used to train the model. This is called a "membership inference attack."

The researchers introduce a new way to measure how well a model protects against these attacks. Instead of looking at the model's overall accuracy, they focus on the "discrepancy" - the difference between the model's predictions for a member of the training data versus a non-member.

The insight is that if there's a big discrepancy, it's easier for an attacker to tell if a data point was used in training. So by measuring this discrepancy, the researchers can get a better sense of the model's privacy protections.

This new discrepancy-based metric provides a more accurate way to evaluate privacy compared to existing methods, which can sometimes give misleading results.

Technical Explanation

The key technical contribution of the paper is a new metric called "discrepancy" for evaluating the privacy of machine learning models against membership inference attacks.

Membership inference attacks try to determine whether a specific data point was used to train a machine learning model. Prior work has used the model's overall prediction accuracy as a proxy for privacy, with the intuition that more accurate models leak more information.

However, the authors show that this accuracy-based approach can be misleading. Instead, they propose focusing on the "discrepancy" between the model's predictions on member versus non-member samples. If this discrepancy is large, it becomes easier for an attacker to infer membership.

The authors demonstrate that their discrepancy-based metric provides a more reliable and sensitive way to evaluate privacy defenses, compared to prior accuracy-based methods. They validate this claim through extensive experiments on a variety of machine learning models and datasets.

Critical Analysis

The paper makes a convincing case that discrepancy is a more accurate privacy metric than prior accuracy-based approaches. The experimental results show clear advantages of the discrepancy-based evaluation, and the authors carefully discuss the limitations and potential issues.

One area that could be explored further is the relationship between model architecture, training, and the resulting discrepancy. The authors note that different model configurations can lead to varied discrepancy levels, but a deeper analysis of these connections could provide additional insights.

Additionally, while the discrepancy metric is an improvement over accuracy-based approaches, it still relies on a specific adversarial setup to evaluate privacy. Exploring alternative methods that make fewer assumptions about the attacker's capabilities could further strengthen the research.

Overall, this paper makes an important contribution to the field of machine learning privacy by introducing a more robust evaluation technique. The insights and methodology could have significant implications for how we measure and improve the privacy protections of AI systems.

Conclusion

This paper presents a novel approach for measuring the privacy of machine learning models against membership inference attacks. By focusing on the "discrepancy" between a model's predictions on member versus non-member samples, the researchers have developed a more reliable and sensitive privacy metric than prior accuracy-based methods.

The experimental results demonstrate the advantages of the discrepancy-based evaluation, and the authors discuss the limitations and potential areas for future research. This work represents an important step forward in developing robust techniques for assessing and improving the privacy of AI systems, which will be crucial as machine learning becomes more pervasive in our lives.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Better Membership Inference Privacy Measurement through Discrepancy

Ruihan Wu, Pengrun Huang, Kamalika Chaudhuri

Membership Inference Attacks have emerged as a dominant method for empirically measuring privacy leakage from machine learning models. Here, privacy is measured by the {em{advantage}} or gap between a score or a function computed on the training and the test data. A major barrier to the practical deployment of these attacks is that they do not scale to large well-generalized models -- either the advantage is relatively low, or the attack involves training multiple models which is highly compute-intensive. In this work, inspired by discrepancy theory, we propose a new empirical privacy metric that is an upper bound on the advantage of a family of membership inference attacks. We show that this metric does not involve training multiple models, can be applied to large Imagenet classification models in-the-wild, and has higher advantage than existing metrics on models trained with more recent and sophisticated training recipes. Motivated by our empirical results, we also propose new membership inference attacks tailored to these training losses.

5/27/2024

🤯

When Fairness Meets Privacy: Exploring Privacy Threats in Fair Binary Classifiers via Membership Inference Attacks

Huan Tian, Guangsheng Zhang, Bo Liu, Tianqing Zhu, Ming Ding, Wanlei Zhou

Previous studies have developed fairness methods for biased models that exhibit discriminatory behaviors towards specific subgroups. While these models have shown promise in achieving fair predictions, recent research has identified their potential vulnerability to score-based membership inference attacks (MIAs). In these attacks, adversaries can infer whether a particular data sample was used during training by analyzing the model's prediction scores. However, our investigations reveal that these score-based MIAs are ineffective when targeting fairness-enhanced models in binary classifications. The attack models trained to launch the MIAs degrade into simplistic threshold models, resulting in lower attack performance. Meanwhile, we observe that fairness methods often lead to prediction performance degradation for the majority subgroups of the training data. This raises the barrier to successful attacks and widens the prediction gaps between member and non-member data. Building upon these insights, we propose an efficient MIA method against fairness-enhanced models based on fairness discrepancy results (FD-MIA). It leverages the difference in the predictions from both the original and fairness-enhanced models and exploits the observed prediction gaps as attack clues. We also explore potential strategies for mitigating privacy leakages. Extensive experiments validate our findings and demonstrate the efficacy of the proposed method.

8/28/2024

🤯

Improved Membership Inference Attacks Against Language Classification Models

Shlomit Shachor, Natalia Razinkov, Abigail Goldsteen

Artificial intelligence systems are prevalent in everyday life, with use cases in retail, manufacturing, health, and many other fields. With the rise in AI adoption, associated risks have been identified, including privacy risks to the people whose data was used to train models. Assessing the privacy risks of machine learning models is crucial to enabling knowledgeable decisions on whether to use, deploy, or share a model. A common approach to privacy risk assessment is to run one or more known attacks against the model and measure their success rate. We present a novel framework for running membership inference attacks against classification models. Our framework takes advantage of the ensemble method, generating many specialized attack models for different subsets of the data. We show that this approach achieves higher accuracy than either a single attack model or an attack model per class label, both on classical and language classification tasks.

7/19/2024

Auditing Privacy Mechanisms via Label Inference Attacks

R'obert Istv'an Busa-Fekete, Travis Dick, Claudio Gentile, Andr'es Mu~noz Medina, Adam Smith, Marika Swanberg

We propose reconstruction advantage measures to audit label privatization mechanisms. A reconstruction advantage measure quantifies the increase in an attacker's ability to infer the true label of an unlabeled example when provided with a private version of the labels in a dataset (e.g., aggregate of labels from different users or noisy labels output by randomized response), compared to an attacker that only observes the feature vectors, but may have prior knowledge of the correlation between features and labels. We consider two such auditing measures: one additive, and one multiplicative. These incorporate previous approaches taken in the literature on empirical auditing and differential privacy. The measures allow us to place a variety of proposed privatization schemes -- some differentially private, some not -- on the same footing. We analyze these measures theoretically under a distributional model which encapsulates reasonable adversarial settings. We also quantify their behavior empirically on real and simulated prediction tasks. Across a range of experimental settings, we find that differentially private schemes dominate or match the privacy-utility tradeoff of more heuristic approaches.

6/6/2024