UIFV: Data Reconstruction Attack in Vertical Federated Learning

2406.12588

Published 6/19/2024 by Jirui Yang, Peng Chen, Zhihui Lu, Qiang Duan, Yubing Bao

UIFV: Data Reconstruction Attack in Vertical Federated Learning

Abstract

Vertical Federated Learning (VFL) facilitates collaborative machine learning without the need for participants to share raw private data. However, recent studies have revealed privacy risks where adversaries might reconstruct sensitive features through data leakage during the learning process. Although data reconstruction methods based on gradient or model information are somewhat effective, they reveal limitations in VFL application scenarios. This is because these traditional methods heavily rely on specific model structures and/or have strict limitations on application scenarios. To address this, our study introduces the Unified InverNet Framework into VFL, which yields a novel and flexible approach (dubbed UIFV) that leverages intermediate feature data to reconstruct original data, instead of relying on gradients or model details. The intermediate feature data is the feature exchanged by different participants during the inference phase of VFL. Experiments on four datasets demonstrate that our methods significantly outperform state-of-the-art techniques in attack precision. Our work exposes severe privacy vulnerabilities within VFL systems that pose real threats to practical VFL applications and thus confirms the necessity of further enhancing privacy protection in the VFL architecture.

Create account to get full access

Overview

This paper examines a data reconstruction attack in the context of vertical federated learning, a machine learning approach where multiple parties collaborate by sharing model parameters rather than raw data.
The authors demonstrate how an attacker can leverage the shared model parameters to reconstruct sensitive data belonging to individual participants, posing a significant security and privacy risk.
The findings have important implications for the adoption and deployment of vertical federated learning systems, which are increasingly being explored for applications such as healthcare and finance where data privacy is paramount.

Plain English Explanation

In vertical federated learning, multiple organizations or parties work together to train a machine learning model without directly sharing their private data. Instead, they share only the parameters (or settings) of the model, which is a more privacy-preserving approach compared to sharing the raw data.

However, this paper shows that even with this approach, an attacker could potentially use the shared model parameters to reconstruct sensitive information about the individual participants' data. This is a significant security and privacy risk that needs to be addressed for vertical federated learning to be widely adopted, especially in areas like healthcare and finance where data privacy is critical.

The authors demonstrate how an attacker could exploit the shared model parameters to infer details about the underlying data, effectively undoing the privacy protections that vertical federated learning is meant to provide. This is an important finding that highlights the need for additional safeguards and security measures to be developed for these types of collaborative machine learning systems.

Technical Explanation

The paper presents a data reconstruction attack in the context of vertical federated learning, where multiple parties collaborate by sharing model parameters rather than raw data. The authors show that an attacker can leverage the shared model parameters to reconstruct sensitive data belonging to individual participants.

The attack exploits the fact that the model parameters encode information about the underlying data distribution. By analyzing the shared parameters, the attacker can infer details about the individual data samples used during training, posing a significant threat to the privacy of the participants.

The authors evaluate the effectiveness of the attack on both synthetic and real-world datasets, demonstrating its feasibility and the potential for privacy breaches in vertical federated learning systems. They also propose several countermeasures, such as data augmentation and hybrid local pre-training, to mitigate the risk of such attacks.

Critical Analysis

The paper provides a comprehensive analysis of the data reconstruction attack in vertical federated learning, highlighting a significant security and privacy concern that needs to be addressed. The authors demonstrate the feasibility of the attack and its potential impact, which is an important contribution to the field.

However, the paper does not fully explore the limitations of the proposed attack. For example, it's not clear how the attack would perform in more complex, real-world scenarios with larger datasets and more sophisticated privacy-preserving techniques. Additionally, the effectiveness of the proposed countermeasures is not evaluated in depth, and their practical implications for the deployment of vertical federated learning systems are not discussed.

Further research is needed to better understand the trade-offs between the benefits of vertical federated learning and the security and privacy risks identified in this paper. Exploring more advanced defense mechanisms, such as differential privacy or secure multi-party computation, could be a fruitful direction for future work.

Conclusion

This paper uncovers a critical security and privacy vulnerability in vertical federated learning, where an attacker can potentially reconstruct sensitive data from the shared model parameters. The findings have important implications for the adoption and deployment of these collaborative machine learning systems, particularly in domains where data privacy is of utmost importance.

The authors demonstrate the feasibility of the attack and propose some countermeasures, but further research is needed to fully address the security and privacy challenges in vertical federated learning. As the field continues to evolve, it will be crucial for researchers and practitioners to prioritize the development of robust privacy-preserving techniques to ensure the safe and widespread adoption of these collaborative machine learning approaches.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey

Mang Ye, Wei Shen, Bo Du, Eduard Snezhko, Vassili Kovalev, Pong C. Yuen

Vertical Federated Learning (VFL) is a privacy-preserving distributed learning paradigm where different parties collaboratively learn models using partitioned features of shared samples, without leaking private data. Recent research has shown promising results addressing various challenges in VFL, highlighting its potential for practical applications in cross-domain collaboration. However, the corresponding research is scattered and lacks organization. To advance VFL research, this survey offers a systematic overview of recent developments. First, we provide a history and background introduction, along with a summary of the general training protocol of VFL. We then revisit the taxonomy in recent reviews and analyze limitations in-depth. For a comprehensive and structured discussion, we synthesize recent research from three fundamental perspectives: effectiveness, security, and applicability. Finally, we discuss several critical future research directions in VFL, which will facilitate the developments in this field. We provide a collection of research lists and periodically update them at https://github.com/shentt67/VFL_Survey.

6/5/2024

cs.LG cs.CR

📉

VFLAIR: A Research Library and Benchmark for Vertical Federated Learning

Tianyuan Zou, Zixuan Gu, Yu He, Hideaki Takahashi, Yang Liu, Ya-Qin Zhang

Vertical Federated Learning (VFL) has emerged as a collaborative training paradigm that allows participants with different features of the same group of users to accomplish cooperative training without exposing their raw data or model parameters. VFL has gained significant attention for its research potential and real-world applications in recent years, but still faces substantial challenges, such as in defending various kinds of data inference and backdoor attacks. Moreover, most of existing VFL projects are industry-facing and not easily used for keeping track of the current research progress. To address this need, we present an extensible and lightweight VFL framework VFLAIR (available at https://github.com/FLAIR-THU/VFLAIR), which supports VFL training with a variety of models, datasets and protocols, along with standardized modules for comprehensive evaluations of attacks and defense strategies. We also benchmark 11 attacks and 8 defenses performance under different communication and model partition settings and draw concrete insights and recommendations on the choice of defense strategies for different practical VFL deployment scenarios.

4/17/2024

cs.LG

📊

Scalable Vertical Federated Learning via Data Augmentation and Amortized Inference

Conor Hassan, Matthew Sutton, Antonietta Mira, Kerrie Mengersen

Vertical federated learning (VFL) has emerged as a paradigm for collaborative model estimation across multiple clients, each holding a distinct set of covariates. This paper introduces the first comprehensive framework for fitting Bayesian models in the VFL setting. We propose a novel approach that leverages data augmentation techniques to transform VFL problems into a form compatible with existing Bayesian federated learning algorithms. We present an innovative model formulation for specific VFL scenarios where the joint likelihood factorizes into a product of client-specific likelihoods. To mitigate the dimensionality challenge posed by data augmentation, which scales with the number of observations and clients, we develop a factorized amortized variational approximation that achieves scalability independent of the number of observations. We showcase the efficacy of our framework through extensive numerical experiments on logistic regression, multilevel regression, and a novel hierarchical Bayesian split neural net model. Our work paves the way for privacy-preserving, decentralized Bayesian inference in vertically partitioned data scenarios, opening up new avenues for research and applications in various domains.

5/8/2024

cs.LG stat.ML

Vertical Federated Learning Hybrid Local Pre-training

Wenguo Li, Xinling Guo, Xu Jiao, Tiancheng Huang, Xiaoran Yan, Yao Yang

Vertical Federated Learning (VFL), which has a broad range of real-world applications, has received much attention in both academia and industry. Enterprises aspire to exploit more valuable features of the same users from diverse departments to boost their model prediction skills. VFL addresses this demand and concurrently secures individual parties from exposing their raw data. However, conventional VFL encounters a bottleneck as it only leverages aligned samples, whose size shrinks with more parties involved, resulting in data scarcity and the waste of unaligned data. To address this problem, we propose a novel VFL Hybrid Local Pre-training (VFLHLP) approach. VFLHLP first pre-trains local networks on the local data of participating parties. Then it utilizes these pre-trained networks to adjust the sub-model for the labeled party or enhance representation learning for other parties during downstream federated learning on aligned data, boosting the performance of federated models. The experimental results on real-world advertising datasets, demonstrate that our approach achieves the best performance over baseline methods by large margins. The ablation study further illustrates the contribution of each technique in VFLHLP to its overall performance.

5/22/2024

cs.LG cs.DC