Make Split, not Hijack: Preventing Feature-Space Hijacking Attacks in Split Learning

2404.09265

Published 4/16/2024 by Tanveer Khan, Mindaugas Budzys, Antonis Michalas

👀

Abstract

The popularity of Machine Learning (ML) makes the privacy of sensitive data more imperative than ever. Collaborative learning techniques like Split Learning (SL) aim to protect client data while enhancing ML processes. Though promising, SL has been proved to be vulnerable to a plethora of attacks, thus raising concerns about its effectiveness on data privacy. In this work, we introduce a hybrid approach combining SL and Function Secret Sharing (FSS) to ensure client data privacy. The client adds a random mask to the activation map before sending it to the servers. The servers cannot access the original function but instead work with shares generated using FSS. Consequently, during both forward and backward propagation, the servers cannot reconstruct the client's raw data from the activation map. Furthermore, through visual invertibility, we demonstrate that the server is incapable of reconstructing the raw image data from the activation map when using FSS. It enhances privacy by reducing privacy leakage compared to other SL-based approaches where the server can access client input information. Our approach also ensures security against feature space hijacking attack, protecting sensitive information from potential manipulation. Our protocols yield promising results, reducing communication overhead by over 2x and training time by over 7x compared to the same model with FSS, without any SL. Also, we show that our approach achieves >96% accuracy and remains equivalent to the plaintext models.

Create account to get full access

Overview

Machine Learning (ML) is growing in popularity, making the privacy of sensitive data more important than ever.
Collaborative learning techniques like Split Learning (SL) aim to protect client data while enhancing ML processes.
However, SL has been shown to be vulnerable to various attacks, raising concerns about its effectiveness in ensuring data privacy.
This work introduces a hybrid approach that combines SL and Function Secret Sharing (FSS) to enhance client data privacy.

Plain English Explanation

The paper proposes a new way to protect sensitive data in machine learning (ML) models. ML is becoming very popular, but this makes protecting people's private information more important than ever.

One approach to this problem is called "Split Learning" (SL). SL tries to protect client data while still improving the ML process. However, researchers have found that SL can still be vulnerable to various attacks, so its ability to truly protect privacy is questionable.

To address this, the researchers developed a new hybrid approach that combines SL with another technique called "Function Secret Sharing" (FSS). The key idea is that the client adds a random "mask" to the data before sending it to the servers. The servers then work with shares of the data generated using FSS, rather than the raw data. This means the servers can't reconstruct the original client data, even during the training process.

The researchers also show that this approach can prevent an attack called "feature space hijacking," which could potentially allow sensitive information to be manipulated. Overall, this new hybrid approach aims to provide better privacy protection compared to existing SL-based methods.

Technical Explanation

The proposed approach combines Split Learning (SL) and [Function Secret Sharing (FSS)] to enhance client data privacy.

In this method, the client adds a random mask to the activation map before sending it to the servers. The servers then work with shares of the data generated using FSS, rather than the original client data. This ensures that the servers cannot reconstruct the client's raw data during both the forward and backward propagation steps.

The paper also demonstrates that the server is unable to visually reconstruct the raw image data from the activation map when using FSS. This improves privacy by reducing the leakage of client input information compared to other SL-based approaches.

Additionally, the proposed approach is shown to be secure against the feature space hijacking attack, protecting sensitive information from potential manipulation.

The protocols developed in this work yield promising results, reducing communication overhead by over 2x and training time by over 7x compared to the same model with FSS but without SL. The approach also maintains >96% accuracy, equivalent to plaintext models.

Critical Analysis

The paper addresses an important problem in the field of federated learning and privacy-preserving decentralized learning, where protecting client data privacy is crucial as ML models become more widely adopted.

The proposed hybrid approach of combining SL and FSS appears to be a promising solution, as it mitigates several vulnerabilities identified in previous SL-based methods. The ability to prevent the feature space hijacking attack is a particularly valuable contribution.

However, the paper does not discuss potential limitations or areas for further research. For example, it would be interesting to understand the computational and memory overhead introduced by the additional FSS-related computations, especially for larger-scale models and datasets.

Additionally, the paper could have provided more context on the landscape of privacy-preserving techniques in federated and decentralized learning, allowing readers to better appreciate the novelty and trade-offs of the proposed approach.

Conclusion

This work introduces a novel hybrid approach that combines Split Learning (SL) and Function Secret Sharing (FSS) to enhance client data privacy in machine learning (ML) applications. By adding a random mask to the activation map and using FSS to generate shares of the data, the proposed method effectively prevents the servers from reconstructing the client's raw data during both forward and backward propagation.

The authors demonstrate the effectiveness of this approach in terms of improved privacy, reduced communication overhead, and minimal impact on model accuracy. This research represents an important step forward in addressing the growing need for privacy-preserving techniques as ML becomes more ubiquitous in various domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

A deep cut into Split Federated Self-supervised Learning

Marcin Przewik{e}'zlikowski, Marcin Osial, Bartosz Zieli'nski, Marek 'Smieja

Collaborative self-supervised learning has recently become feasible in highly distributed environments by dividing the network layers between client devices and a central server. However, state-of-the-art methods, such as MocoSFL, are optimized for network division at the initial layers, which decreases the protection of the client data and increases communication overhead. In this paper, we demonstrate that splitting depth is crucial for maintaining privacy and communication efficiency in distributed training. We also show that MocoSFL suffers from a catastrophic quality deterioration for the minimal communication overhead. As a remedy, we introduce Momentum-Aligned contrastive Split Federated Learning (MonAcoSFL), which aligns online and momentum client models during training procedure. Consequently, we achieve state-of-the-art accuracy while significantly reducing the communication overhead, making MonAcoSFL more practical in real-world scenarios.

6/13/2024

cs.LG cs.AI cs.DC

🏅

Exploring the Privacy-Energy Consumption Tradeoff for Split Federated Learning

Joohyung Lee, Mohamed Seif, Jungchan Cho, H. Vincent Poor

Split Federated Learning (SFL) has recently emerged as a promising distributed learning technology, leveraging the strengths of both federated and split learning. It emphasizes the advantages of rapid convergence while addressing privacy concerns. As a result, this innovation has received significant attention from both industry and academia. However, since the model is split at a specific layer, known as a cut layer, into both client-side and server-side models for the SFL, the choice of the cut layer in SFL can have a substantial impact on the energy consumption of clients and their privacy, as it influences the training burden and the output of the client-side models. In this article, we provide a comprehensive overview of the SFL process and thoroughly analyze energy consumption and privacy. This analysis considers the influence of various system parameters on the cut layer selection strategy. Additionally, we provide an illustrative example of the cut layer selection, aiming to minimize clients' risk of reconstructing the raw data at the server while sustaining energy consumption within the required energy budget, which involves trade-offs. Finally, we address open challenges in this field. These directions represent promising avenues for future research and development.

5/6/2024

cs.LG cs.AI cs.CR

🤯

Protecting Split Learning by Potential Energy Loss

Fei Zheng, Chaochao Chen, Lingjuan Lyu, Xinyi Fu, Xing Fu, Weiqiang Wang, Xiaolin Zheng, Jianwei Yin

As a practical privacy-preserving learning method, split learning has drawn much attention in academia and industry. However, its security is constantly being questioned since the intermediate results are shared during training and inference. In this paper, we focus on the privacy leakage from the forward embeddings of split learning. Specifically, since the forward embeddings contain too much information about the label, the attacker can either use a few labeled samples to fine-tune the top model or perform unsupervised attacks such as clustering to infer the true labels from the forward embeddings. To prevent such kind of privacy leakage, we propose the potential energy loss to make the forward embeddings become more 'complicated', by pushing embeddings of the same class towards the decision boundary. Therefore, it is hard for the attacker to learn from the forward embeddings. Experiment results show that our method significantly lowers the performance of both fine-tuning attacks and clustering attacks.

5/30/2024

cs.CR cs.AI cs.DC cs.LG

Heterogeneous Federated Learning with Splited Language Model

Yifan Shi, Yuhui Zhang, Ziyue Huang, Xiaofeng Yang, Li Shen, Wei Chen, Xueqian Wang

Federated Split Learning (FSL) is a promising distributed learning paradigm in practice, which gathers the strengths of both Federated Learning (FL) and Split Learning (SL) paradigms, to ensure model privacy while diminishing the resource overhead of each client, especially on large transformer models in a resource-constrained environment, e.g., Internet of Things (IoT). However, almost all works merely investigate the performance with simple neural network models in FSL. Despite the minor efforts focusing on incorporating Vision Transformers (ViT) as model architectures, they train ViT from scratch, thereby leading to enormous training overhead in each device with limited resources. Therefore, in this paper, we harness Pre-trained Image Transformers (PITs) as the initial model, coined FedV, to accelerate the training process and improve model robustness. Furthermore, we propose FedVZ to hinder the gradient inversion attack, especially having the capability compatible with black-box scenarios, where the gradient information is unavailable. Concretely, FedVZ approximates the server gradient by utilizing a zeroth-order (ZO) optimization, which replaces the backward propagation with just one forward process. Empirically, we are the first to provide a systematic evaluation of FSL methods with PITs in real-world datasets, different partial device participations, and heterogeneous data splits. Our experiments verify the effectiveness of our algorithms.

4/22/2024

cs.CV