Dissecting Distribution Inference

2212.07591

YC

0

Reddit

0

Published 4/9/2024 by Anshuman Suri, Yifu Lu, Yanjin Chen, David Evans

🤯

Abstract

A distribution inference attack aims to infer statistical properties of data used to train machine learning models. These attacks are sometimes surprisingly potent, but the factors that impact distribution inference risk are not well understood and demonstrated attacks often rely on strong and unrealistic assumptions such as full knowledge of training environments even in supposedly black-box threat scenarios. To improve understanding of distribution inference risks, we develop a new black-box attack that even outperforms the best known white-box attack in most settings. Using this new attack, we evaluate distribution inference risk while relaxing a variety of assumptions about the adversary's knowledge under black-box access, like known model architectures and label-only access. Finally, we evaluate the effectiveness of previously proposed defenses and introduce new defenses. We find that although noise-based defenses appear to be ineffective, a simple re-sampling defense can be highly effective. Code is available at https://github.com/iamgroot42/dissecting_distribution_inference

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper explores a type of attack called a "distribution inference attack," which aims to infer statistical properties of data used to train machine learning models.
  • The authors develop a new black-box attack that outperforms the best known white-box attack in most settings.
  • They evaluate distribution inference risk while relaxing various assumptions about the adversary's knowledge under black-box access.
  • The paper also evaluates the effectiveness of previously proposed defenses and introduces new defenses.

Plain English Explanation

Machine learning models are trained on data, and the characteristics of that data can impact the model's performance. A distribution inference attack is a way for an attacker to try to figure out the statistical properties of the data used to train a machine learning model, even if they don't have full access to that data.

The authors of this paper created a new way to do these kinds of attacks, even when the attacker doesn't have a lot of information about the model or the training process (a "black-box" scenario). Their new attack method was able to outperform the best known attacks that require more information about the model (a "white-box" scenario).

Using this new attack, the researchers looked at how much risk there is of an attacker being able to infer information about the training data, even when the attacker has limited knowledge. They also tested different ways to defend against these kinds of attacks, and found that a simple technique of re-sampling the data can be very effective at preventing the attacker from learning about the original training data.

Robust Federated Learning Mitigates Client-Side Training and Meta-Invariance Defense Towards Generalizable Robustness to are other papers that explore ways to make machine learning models more secure and robust against different types of attacks.

Technical Explanation

The paper develops a new black-box attack for distribution inference, which outperforms the best known white-box attack in most settings. The authors evaluate distribution inference risk by relaxing various assumptions about the adversary's knowledge, such as not knowing the model architecture or only having access to model outputs without labels.

The core of the new attack is a technique called "implicit differentiation," which allows the attacker to estimate gradients without full access to the model internals. This enables the attacker to optimize an objective function to recover properties of the training data distribution.

The paper also evaluates the effectiveness of previously proposed defenses like adding noise to model outputs. They find that while noise-based defenses are largely ineffective, a simple re-sampling defense can be highly effective at preventing distribution inference.

Increasing Fairness in Classification on Out-of-Distribution Data for Facial and Hidden in Plain Sight: Undetectable Adversarial Bias Attacks are other papers that explore challenges and defenses related to the security and fairness of machine learning models.

Critical Analysis

The paper makes a significant contribution by developing a new, powerful black-box attack for distribution inference and using it to systematically evaluate the factors that impact this type of attack risk. However, the authors acknowledge some limitations:

  • The attack is evaluated on relatively simple, synthetic datasets, so its effectiveness on more complex real-world datasets is unclear.
  • The proposed re-sampling defense may be difficult to implement in practice, as it requires knowing the true data distribution.
  • The paper does not explore the broader implications of distribution inference attacks, such as potential misuse by bad actors.

Probabilistic Dataset Reconstruction from Interpretable Models is another paper that examines the risks of inferring training data properties, highlighting the need for further research in this area.

Conclusion

This paper advances the understanding of distribution inference attacks, a type of threat to the security and privacy of machine learning systems. By developing a powerful new black-box attack and using it to evaluate a range of defense strategies, the authors provide valuable insights that can help researchers and practitioners better protect their models against these types of attacks. The findings suggest that simple re-sampling techniques may be an effective defense, but further work is needed to address the broader implications and deploy practical defenses in real-world settings.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation

GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation

Andrey V. Galichin, Mikhail Pautov, Alexey Zhavoronkin, Oleg Y. Rogov, Ivan Oseledets

YC

0

Reddit

0

While Deep Neural Networks (DNNs) have demonstrated remarkable performance in tasks related to perception and control, there are still several unresolved concerns regarding the privacy of their training data, particularly in the context of vulnerability to Membership Inference Attacks (MIAs). In this paper, we explore a connection between the susceptibility to membership inference attacks and the vulnerability to distillation-based functionality stealing attacks. In particular, we propose {GLiRA}, a distillation-guided approach to membership inference attack on the black-box neural network. We observe that the knowledge distillation significantly improves the efficiency of likelihood ratio of membership inference attack, especially in the black-box setting, i.e., when the architecture of the target model is unknown to the attacker. We evaluate the proposed method across multiple image classification datasets and models and demonstrate that likelihood ratio attacks when guided by the knowledge distillation, outperform the current state-of-the-art membership inference attacks in the black-box setting.

Read more

5/14/2024

Inference Attacks in Machine Learning as a Service: A Taxonomy, Review, and Promising Directions

Inference Attacks in Machine Learning as a Service: A Taxonomy, Review, and Promising Directions

Feng Wu, Lei Cui, Shaowen Yao, Shui Yu

YC

0

Reddit

0

The prosperity of machine learning has also brought people's concerns about data privacy. Among them, inference attacks can implement privacy breaches in various MLaaS scenarios and model training/prediction phases. Specifically, inference attacks can perform privacy inference on undisclosed target training sets based on outputs of the target model, including but not limited to statistics, membership, semantics, data representation, etc. For instance, infer whether the target data has the characteristics of AIDS. In addition, the rapid development of the machine learning community in recent years, especially the surge of model types and application scenarios, has further stimulated the inference attacks' research. Thus, studying inference attacks and analyzing them in depth is urgent and significant. However, there is still a gap in the systematic discussion of inference attacks from taxonomy, global perspective, attack, and defense perspectives. This survey provides an in-depth and comprehensive inference of attacks and corresponding countermeasures in ML-as-a-service based on taxonomy and the latest researches. Without compromising researchers' intuition, we first propose the 3MP taxonomy based on the community research status, trying to normalize the confusing naming system of inference attacks. Also, we analyze the pros and cons of each type of inference attack, their workflow, countermeasure, and how they interact with other attacks. In the end, we point out several promising directions for researchers from a more comprehensive and novel perspective.

Read more

6/28/2024

Data Reconstruction Attacks and Defenses: A Systematic Evaluation

New!Data Reconstruction Attacks and Defenses: A Systematic Evaluation

Sheng Liu, Zihan Wang, Yuxiao Chen, Qi Lei

YC

0

Reddit

0

Reconstruction attacks and defenses are essential in understanding the data leakage problem in machine learning. However, prior work has centered around empirical observations of gradient inversion attacks, lacks theoretical justifications, and cannot disentangle the usefulness of defending methods from the computational limitation of attacking methods. In this work, we propose to view the problem as an inverse problem, enabling us to theoretically, quantitatively, and systematically evaluate the data reconstruction problem. On various defense methods, we derived the algorithmic upper bound and the matching (in feature dimension and model width) information-theoretical lower bound on the reconstruction error for two-layer neural networks. To complement the theoretical results and investigate the utility-privacy trade-off, we defined a natural evaluation metric of the defense methods with similar utility loss among the strongest attacks. We further propose a strong reconstruction attack that helps update some previous understanding of the strength of defense methods under our proposed evaluation metric.

Read more

6/28/2024

Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning

Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning

Huan Bao, Kaimin Wei, Yongdong Wu, Jin Qian, Robert H. Deng

YC

0

Reddit

0

A Model Inversion (MI) attack based on Generative Adversarial Networks (GAN) aims to recover the private training data from complex deep learning models by searching codes in the latent space. However, they merely search a deterministic latent space such that the found latent code is usually suboptimal. In addition, the existing distributional MI schemes assume that an attacker can access the structures and parameters of the target model, which is not always viable in practice. To overcome the above shortcomings, this paper proposes a novel Distributional Black-Box Model Inversion (DBB-MI) attack by constructing the probabilistic latent space for searching the target privacy data. Specifically, DBB-MI does not need the target model parameters or specialized GAN training. Instead, it finds the latent probability distribution by combining the output of the target model with multi-agent reinforcement learning techniques. Then, it randomly chooses latent codes from the latent probability distribution for recovering the private data. As the latent probability distribution closely aligns with the target privacy data in latent space, the recovered data will leak the privacy of training samples of the target model significantly. Abundant experiments conducted on diverse datasets and networks show that the present DBB-MI has better performance than state-of-the-art in attack accuracy, K-nearest neighbor feature distance, and Peak Signal-to-Noise Ratio.

Read more

4/23/2024