VFLAIR: A Research Library and Benchmark for Vertical Federated Learning

2310.09827

Published 4/17/2024 by Tianyuan Zou, Zixuan Gu, Yu He, Hideaki Takahashi, Yang Liu, Ya-Qin Zhang

📉

Abstract

Vertical Federated Learning (VFL) has emerged as a collaborative training paradigm that allows participants with different features of the same group of users to accomplish cooperative training without exposing their raw data or model parameters. VFL has gained significant attention for its research potential and real-world applications in recent years, but still faces substantial challenges, such as in defending various kinds of data inference and backdoor attacks. Moreover, most of existing VFL projects are industry-facing and not easily used for keeping track of the current research progress. To address this need, we present an extensible and lightweight VFL framework VFLAIR (available at https://github.com/FLAIR-THU/VFLAIR), which supports VFL training with a variety of models, datasets and protocols, along with standardized modules for comprehensive evaluations of attacks and defense strategies. We also benchmark 11 attacks and 8 defenses performance under different communication and model partition settings and draw concrete insights and recommendations on the choice of defense strategies for different practical VFL deployment scenarios.

Create account to get full access

Overview

Vertical Federated Learning (VFL) is a collaborative training approach that allows different parties to train a model without exposing their raw data or model parameters.
VFL has gained significant attention for its research potential and real-world applications, but still faces challenges in defending against various attacks.
Most existing VFL projects are industry-focused and not easily used for tracking current research progress.
To address this, the authors present an extensible and lightweight VFL framework called VFLAIR, which supports VFL training with various models, datasets, and protocols, and includes modules for evaluating attacks and defenses.

Plain English Explanation

Vertical Federated Learning (VFL) is a way for different organizations to work together on a machine learning model without sharing their raw data or the details of their model. This is useful because it allows companies or institutions to collaborate without revealing sensitive information.

VFL has become an important area of research in recent years, but it still has some challenges, like defending against different types of attacks that try to steal data or sabotage the model. Additionally, most existing VFL projects are focused on industry use and aren't easy for researchers to use to track the latest progress in the field.

To help address these issues, the researchers have created a flexible VFL framework called VFLAIR. This tool allows users to experiment with different machine learning models, datasets, and communication protocols for VFL. It also includes standard ways to test how well the system can defend against various attacks.

Technical Explanation

The paper presents VFLAIR, an extensible and lightweight VFL framework that supports training with a variety of models, datasets, and protocols, along with standardized modules for comprehensive evaluations of attacks and defense strategies.

The researchers benchmark 11 different attacks and 8 defenses under different communication and model partition settings using VFLAIR. This allows them to draw concrete insights and recommendations on the choice of defense strategies for different practical VFL deployment scenarios.

For example, the authors find that simple defenses like gradient clipping can be effective against some attacks, while more sophisticated defenses like VFLGAN are needed to protect against more advanced threats. They also provide guidance on how to configure the VFL system to minimize the impact of attacks.

Critical Analysis

The paper provides a valuable contribution by introducing a flexible VFL framework that can be used to advance research in this area. By including standardized attack and defense evaluation modules, the authors make it easier for other researchers to build upon their work and compare different approaches.

However, the paper does not delve deeply into the theoretical foundations or limitations of VFL. For example, it does not discuss the inherent trade-offs between model performance, privacy, and communication efficiency that arise in VFL settings. Researchers interested in exploring these more fundamental aspects of VFL may need to supplement the insights provided in this paper with additional literature, such as PFL, FLEX, or communication-efficient hybrid federated learning.

Additionally, the paper's focus on attack and defense evaluation, while valuable, may limit its broader appeal to researchers interested in other aspects of VFL, such as novel model architectures or distributed optimization techniques.

Conclusion

The VFLAIR framework presented in this paper is a helpful tool for advancing research in Vertical Federated Learning. By providing a standardized platform for evaluating attacks and defenses, the authors have made it easier for the research community to build upon previous work and develop more robust and secure VFL systems.

While the paper does not cover all aspects of VFL research, it serves as a valuable resource for researchers and practitioners interested in understanding the current state of the art and identifying promising directions for future work in this rapidly evolving field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey

Mang Ye, Wei Shen, Bo Du, Eduard Snezhko, Vassili Kovalev, Pong C. Yuen

Vertical Federated Learning (VFL) is a privacy-preserving distributed learning paradigm where different parties collaboratively learn models using partitioned features of shared samples, without leaking private data. Recent research has shown promising results addressing various challenges in VFL, highlighting its potential for practical applications in cross-domain collaboration. However, the corresponding research is scattered and lacks organization. To advance VFL research, this survey offers a systematic overview of recent developments. First, we provide a history and background introduction, along with a summary of the general training protocol of VFL. We then revisit the taxonomy in recent reviews and analyze limitations in-depth. For a comprehensive and structured discussion, we synthesize recent research from three fundamental perspectives: effectiveness, security, and applicability. Finally, we discuss several critical future research directions in VFL, which will facilitate the developments in this field. We provide a collection of research lists and periodically update them at https://github.com/shentt67/VFL_Survey.

6/5/2024

cs.LG cs.CR

UIFV: Data Reconstruction Attack in Vertical Federated Learning

Jirui Yang, Peng Chen, Zhihui Lu, Qiang Duan, Yubing Bao

Vertical Federated Learning (VFL) facilitates collaborative machine learning without the need for participants to share raw private data. However, recent studies have revealed privacy risks where adversaries might reconstruct sensitive features through data leakage during the learning process. Although data reconstruction methods based on gradient or model information are somewhat effective, they reveal limitations in VFL application scenarios. This is because these traditional methods heavily rely on specific model structures and/or have strict limitations on application scenarios. To address this, our study introduces the Unified InverNet Framework into VFL, which yields a novel and flexible approach (dubbed UIFV) that leverages intermediate feature data to reconstruct original data, instead of relying on gradients or model details. The intermediate feature data is the feature exchanged by different participants during the inference phase of VFL. Experiments on four datasets demonstrate that our methods significantly outperform state-of-the-art techniques in attack precision. Our work exposes severe privacy vulnerabilities within VFL systems that pose real threats to practical VFL applications and thus confirms the necessity of further enhancing privacy protection in the VFL architecture.

6/19/2024

cs.LG cs.AI cs.CR stat.ML

Vertical Federated Learning Hybrid Local Pre-training

Wenguo Li, Xinling Guo, Xu Jiao, Tiancheng Huang, Xiaoran Yan, Yao Yang

Vertical Federated Learning (VFL), which has a broad range of real-world applications, has received much attention in both academia and industry. Enterprises aspire to exploit more valuable features of the same users from diverse departments to boost their model prediction skills. VFL addresses this demand and concurrently secures individual parties from exposing their raw data. However, conventional VFL encounters a bottleneck as it only leverages aligned samples, whose size shrinks with more parties involved, resulting in data scarcity and the waste of unaligned data. To address this problem, we propose a novel VFL Hybrid Local Pre-training (VFLHLP) approach. VFLHLP first pre-trains local networks on the local data of participating parties. Then it utilizes these pre-trained networks to adjust the sub-model for the labeled party or enhance representation learning for other parties during downstream federated learning on aligned data, boosting the performance of federated models. The experimental results on real-world advertising datasets, demonstrate that our approach achieves the best performance over baseline methods by large margins. The ablation study further illustrates the contribution of each technique in VFLHLP to its overall performance.

5/22/2024

cs.LG cs.DC

📊

Scalable Vertical Federated Learning via Data Augmentation and Amortized Inference

Conor Hassan, Matthew Sutton, Antonietta Mira, Kerrie Mengersen

Vertical federated learning (VFL) has emerged as a paradigm for collaborative model estimation across multiple clients, each holding a distinct set of covariates. This paper introduces the first comprehensive framework for fitting Bayesian models in the VFL setting. We propose a novel approach that leverages data augmentation techniques to transform VFL problems into a form compatible with existing Bayesian federated learning algorithms. We present an innovative model formulation for specific VFL scenarios where the joint likelihood factorizes into a product of client-specific likelihoods. To mitigate the dimensionality challenge posed by data augmentation, which scales with the number of observations and clients, we develop a factorized amortized variational approximation that achieves scalability independent of the number of observations. We showcase the efficacy of our framework through extensive numerical experiments on logistic regression, multilevel regression, and a novel hierarchical Bayesian split neural net model. Our work paves the way for privacy-preserving, decentralized Bayesian inference in vertically partitioned data scenarios, opening up new avenues for research and applications in various domains.

5/8/2024

cs.LG stat.ML