SECO: Secure Inference With Model Splitting Across Multi-Server Hierarchy

Read original: arXiv:2404.16232 - Published 4/26/2024 by Shuangyi Chen, Ashish Khisti

SECO: Secure Inference With Model Splitting Across Multi-Server Hierarchy

Overview

This paper proposes a secure inference system called SECO that splits a machine learning model across a hierarchy of servers to protect the model and user privacy.
SECO divides the model into multiple components, each stored and executed on a different server, preventing any single server from accessing the full model or user data.
The system uses cryptographic techniques to enable secure communication and computation between the servers, ensuring the integrity and confidentiality of the inference process.

Plain English Explanation

The paper introduces a new system called SECO that aims to keep machine learning models and user data secure. Typically, a machine learning model is stored and used on a single server, which poses a privacy risk if that server is compromised. SECO solves this by splitting the model into different parts, each stored and run on a separate server.

This way, no single server has access to the full model or the user's data. The servers communicate with each other using encryption techniques to perform the machine learning task without revealing sensitive information. This multi-server approach helps protect the model and the user's privacy, even if one of the servers is attacked.

The key idea is to divide the complex machine learning model into smaller, more manageable components that can be distributed across a hierarchy of servers. This prevents any individual server from having complete access to the model or the user's data, making it much harder for attackers to steal or misuse the information.

Technical Explanation

The paper proposes a secure inference system called SECO that splits a machine learning model across a multi-server hierarchy. SECO divides the model into multiple components, each stored and executed on a different server. This prevents any single server from accessing the full model or user data.

The system uses cryptographic techniques, such as secure multi-party computation and homomorphic encryption, to enable secure communication and computation between the servers. This ensures the integrity and confidentiality of the inference process, even if some of the servers are compromised.

SECO's architecture consists of a hierarchy of servers, with a top-level server coordinating the inference task and lower-level servers performing specific computations on the model components. The paper describes protocols for secure model splitting, secure model update, and secure inference, all designed to protect the model and user privacy.

The authors evaluate SECO's performance and security properties through theoretical analysis and experiments. They compare SECO to centralized and federated learning approaches, demonstrating its advantages in terms of model protection, data privacy, and computational efficiency.

Critical Analysis

The paper presents a promising approach to secure machine learning inference, but it also acknowledges several limitations and areas for future research. One potential concern is the reliance on a trusted top-level server to coordinate the inference process. If this server is compromised, it could still pose a risk to the entire system.

Additionally, the paper does not address the challenges of client-side training in a federated learning setting, where individual clients may attempt to maliciously influence the model. Further research is needed to explore how SECO could be extended to handle such scenarios.

Overall, the SECO system represents an important step towards more secure and privacy-preserving machine learning, but there are still opportunities for improvement and further exploration of the trade-offs involved.

Conclusion

The SECO system presents a novel approach to secure machine learning inference by splitting a model across a hierarchy of servers. This multi-server architecture prevents any single entity from accessing the full model or user data, enhancing privacy and security.

The use of cryptographic techniques, such as secure multi-party computation and homomorphic encryption, ensures the integrity and confidentiality of the inference process, even in the face of potential server compromises. While the paper identifies some limitations, SECO's secure model splitting and inference protocols demonstrate promising advancements in the field of privacy-preserving machine learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

SECO: Secure Inference With Model Splitting Across Multi-Server Hierarchy

Shuangyi Chen, Ashish Khisti

In the context of prediction-as-a-service, concerns about the privacy of the data and the model have been brought up and tackled via secure inference protocols. These protocols are built up by using single or multiple cryptographic tools designed under a variety of different security assumptions. In this paper, we introduce SECO, a secure inference protocol that enables a user holding an input data vector and multiple server nodes deployed with a split neural network model to collaboratively compute the prediction, without compromising either party's data privacy. We extend prior work on secure inference that requires the entire neural network model to be located on a single server node, to a multi-server hierarchy, where the user communicates to a gateway server node, which in turn communicates to remote server nodes. The inference task is split across the server nodes and must be performed over an encrypted copy of the data vector. We adopt multiparty homomorphic encryption and multiparty garbled circuit schemes, making the system secure against dishonest majority of semi-honest servers as well as protecting the partial model structure from the user. We evaluate SECO on multiple models, achieving the reduction of computation and communication cost for the user, making the protocol applicable to user's devices with limited resources.

4/26/2024

Complete Security and Privacy for AI Inference in Decentralized Systems

Hongyang Zhang, Yue Zhao, Claudio Angione, Harry Yang, James Buban, Ahmad Farhan, Fielding Johnston, Patrick Colangelo

The need for data security and model integrity has been accentuated by the rapid adoption of AI and ML in data-driven domains including healthcare, finance, and security. Large models are crucial for tasks like diagnosing diseases and forecasting finances but tend to be delicate and not very scalable. Decentralized systems solve this issue by distributing the workload and reducing central points of failure. Yet, data and processes spread across different nodes can be at risk of unauthorized access, especially when they involve sensitive information. Nesa solves these challenges with a comprehensive framework using multiple techniques to protect data and model outputs. This includes zero-knowledge proofs for secure model verification. The framework also introduces consensus-based verification checks for consistent outputs across nodes and confirms model integrity. Split Learning divides models into segments processed by different nodes for data privacy by preventing full data access at any single point. For hardware-based security, trusted execution environments are used to protect data and computations within secure zones. Nesa's state-of-the-art proofs and principles demonstrate the framework's effectiveness, making it a promising approach for securely democratizing artificial intelligence.

7/30/2024

Privacy-Preserving Model-Distributed Inference at the Edge

Fatemeh Jafarian Dehkordi, Yasaman Keshtkarjahromi, Hulya Seferoglu

This paper focuses on designing a privacy-preserving Machine Learning (ML) inference protocol for a hierarchical setup, where clients own/generate data, model owners (cloud servers) have a pre-trained ML model, and edge servers perform ML inference on clients' data using the cloud server's ML model. Our goal is to speed up ML inference while providing privacy to both data and the ML model. Our approach (i) uses model-distributed inference (model parallelization) at the edge servers and (ii) reduces the amount of communication to/from the cloud server. Our privacy-preserving hierarchical model-distributed inference, privateMDI design uses additive secret sharing and linearly homomorphic encryption to handle linear calculations in the ML inference, and garbled circuit and a novel three-party oblivious transfer are used to handle non-linear functions. privateMDI consists of offline and online phases. We designed these phases in a way that most of the data exchange is done in the offline phase while the communication overhead of the online phase is reduced. In particular, there is no communication to/from the cloud server in the online phase, and the amount of communication between the client and edge servers is minimized. The experimental results demonstrate that privateMDI significantly reduces the ML inference time as compared to the baselines.

9/17/2024

VeriSplit: Secure and Practical Offloading of Machine Learning Inferences across IoT Devices

Han Zhang, Zifan Wang, Mihir Dhamankar, Matt Fredrikson, Yuvraj Agarwal

Many Internet-of-Things (IoT) devices rely on cloud computation resources to perform machine learning inferences. This is expensive and may raise privacy concerns for users. Consumers of these devices often have hardware such as gaming consoles and PCs with graphics accelerators that are capable of performing these computations, which may be left idle for significant periods of time. While this presents a compelling potential alternative to cloud offloading, concerns about the integrity of inferences, the confidentiality of model parameters, and the privacy of users' data mean that device vendors may be hesitant to offload their inferences to a platform managed by another manufacturer. We propose VeriSplit, a framework for offloading machine learning inferences to locally-available devices that address these concerns. We introduce masking techniques to protect data privacy and model confidentiality, and a commitment-based verification protocol to address integrity. Unlike much prior work aimed at addressing these issues, our approach does not rely on computation over finite field elements, which may interfere with floating-point computation supports on hardware accelerators and require modification to existing models. We implemented a prototype of VeriSplit and our evaluation results show that, compared to performing computation locally, our secure and private offloading solution can reduce inference latency by 28%--83%.

6/4/2024