ERASER: Machine Unlearning in MLaaS via an Inference Serving-Aware Approach

2311.16136

Published 6/19/2024 by Yuke Hu, Jian Lou, Jiaqi Liu, Wangze Ni, Feng Lin, Zhan Qin, Kui Ren

🤯

Abstract

Over the past years, Machine Learning-as-a-Service (MLaaS) has received a surging demand for supporting Machine Learning-driven services to offer revolutionized user experience across diverse application areas. MLaaS provides inference service with low inference latency based on an ML model trained using a dataset collected from numerous individual data owners. Recently, for the sake of data owners' privacy and to comply with the right to be forgotten (RTBF) as enacted by data protection legislation, many machine unlearning methods have been proposed to remove data owners' data from trained models upon their unlearning requests. However, despite their promising efficiency, almost all existing machine unlearning methods handle unlearning requests independently from inference requests, which unfortunately introduces a new security issue of inference service obsolescence and a privacy vulnerability of undesirable exposure for machine unlearning in MLaaS. In this paper, we propose the ERASER framework for machinE unleaRning in MLaAS via an inferencE seRving-aware approach. ERASER strategically choose appropriate unlearning execution timing to address the inference service obsolescence issue. A novel inference consistency certification mechanism is proposed to avoid the violation of RTBF principle caused by postponed unlearning executions, thereby mitigating the undesirable exposure vulnerability. ERASER offers three groups of design choices to allow for tailor-made variants that best suit the specific environments and preferences of various MLaaS systems. Extensive empirical evaluations across various settings confirm ERASER's effectiveness, e.g., it can effectively save up to 99% of inference latency and 31% of computation overhead over the inference-oblivion baseline.

Create account to get full access

Overview

In this paper, the researchers propose a framework called ERASER to address security and privacy issues in machine unlearning for Machine Learning-as-a-Service (MLaaS) platforms.
Machine unlearning is the process of removing an individual's data from a trained machine learning model, which is important for complying with data privacy regulations like the right to be forgotten (RTBF).
However, existing machine unlearning methods handle unlearning requests independently from inference requests, leading to two problems: inference service obsolescence and undesirable exposure of the unlearning process.

Plain English Explanation

The paper focuses on a problem that arises when machine learning models are used in services that provide predictions or recommendations to users (known as Machine Learning-as-a-Service or MLaaS). In these systems, individuals may request that their personal data be removed from the underlying machine learning model, a process called machine unlearning.

While existing machine unlearning methods are effective at removing data, they handle these unlearning requests separately from the actual process of providing predictions or recommendations to users. This can lead to two issues:

Inference service obsolescence: If an unlearning request is processed, the machine learning model used for making predictions may become outdated or "obsolete", leading to a degradation in the service quality.
Undesirable exposure: The process of removing an individual's data from the model can inadvertently reveal information about that person's data, which would violate their privacy and the RTBF principle.

To address these problems, the researchers propose a new framework called ERASER. ERASER aims to strategically time the execution of unlearning requests to minimize the impact on the prediction service, and also includes a mechanism to certify that the predictions made are consistent with the RTBF principle, even when unlearning requests are delayed.

Technical Explanation

The ERASER framework consists of several key components:

Unlearning execution timing: ERASER strategically chooses when to execute unlearning requests to minimize the impact on the inference service. This is done by monitoring the inference load and scheduling unlearning tasks accordingly to avoid service obsolescence.
Inference consistency certification: ERASER introduces a novel mechanism to ensure that the predictions made by the model are consistent with the RTBF principle, even when unlearning requests are delayed. This helps mitigate the undesirable exposure issue.
Tailored design choices: ERASER offers three groups of design choices that allow MLaaS providers to customize the framework to best suit their specific needs and preferences.

The researchers evaluate ERASER extensively across various settings and find that it can effectively save up to 99% of inference latency and 31% of computation overhead compared to a baseline that ignores the relationship between unlearning and inference requests.

Critical Analysis

The ERASER framework presents a promising approach to address the security and privacy issues in machine unlearning for MLaaS systems. However, the researchers acknowledge several limitations and areas for further research:

The framework assumes that unlearning requests can be identified and associated with specific individuals, which may not always be the case in real-world scenarios.
The inference consistency certification mechanism relies on certain assumptions about the underlying machine learning model and its behavior, which may not hold true in all cases.
The evaluation is primarily based on simulated experiments, and the authors call for further validation of the framework's effectiveness in real-world MLaaS deployments.

Additionally, one could question the extent to which ERASER can truly protect against undesirable exposure of the unlearning process, as the framework still requires some degree of interaction between unlearning and inference requests. Further research may be needed to explore more robust privacy-preserving mechanisms in this context.

Conclusion

The ERASER framework proposed in this paper represents an important step towards addressing the security and privacy challenges in machine unlearning for MLaaS platforms. By strategically managing the execution of unlearning requests and introducing an inference consistency certification mechanism, ERASER aims to mitigate the issues of inference service obsolescence and undesirable exposure of the unlearning process.

While the framework shows promising results in the authors' experiments, further research and real-world validation are needed to fully assess its effectiveness and identify potential areas for improvement. As machine learning models become increasingly ubiquitous in online services, developing robust and privacy-preserving unlearning mechanisms will only grow in importance.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

⛏️

Machine Unlearning: A Comprehensive Survey

Weiqi Wang, Zhiyi Tian, Shui Yu

As the right to be forgotten has been legislated worldwide, many studies attempt to design unlearning mechanisms to protect users' privacy when they want to leave machine learning service platforms. Specifically, machine unlearning is to make a trained model to remove the contribution of an erased subset of the training dataset. This survey aims to systematically classify a wide range of machine unlearning and discuss their differences, connections and open problems. We categorize current unlearning methods into four scenarios: centralized unlearning, distributed and irregular data unlearning, unlearning verification, and privacy and security issues in unlearning. Since centralized unlearning is the primary domain, we use two parts to introduce: firstly, we classify centralized unlearning into exact unlearning and approximate unlearning; secondly, we offer a detailed introduction to the techniques of these methods. Besides the centralized unlearning, we notice some studies about distributed and irregular data unlearning and introduce federated unlearning and graph unlearning as the two representative directions. After introducing unlearning methods, we review studies about unlearning verification. Moreover, we consider the privacy and security issues essential in machine unlearning and organize the latest related literature. Finally, we discuss the challenges of various unlearning scenarios and address the potential research directions.

5/14/2024

cs.CR cs.AI

New!Silver Linings in the Shadows: Harnessing Membership Inference for Machine Unlearning

Nexhi Sula, Abhinav Kumar, Jie Hou, Han Wang, Reza Tourani

With the continued advancement and widespread adoption of machine learning (ML) models across various domains, ensuring user privacy and data security has become a paramount concern. In compliance with data privacy regulations, such as GDPR, a secure machine learning framework should not only grant users the right to request the removal of their contributed data used for model training but also facilitates the elimination of sensitive data fingerprints within machine learning models to mitigate potential attack - a process referred to as machine unlearning. In this study, we present a novel unlearning mechanism designed to effectively remove the impact of specific data samples from a neural network while considering the performance of the unlearned model on the primary task. In achieving this goal, we crafted a novel loss function tailored to eliminate privacy-sensitive information from weights and activation values of the target model by combining target classification loss and membership inference loss. Our adaptable framework can easily incorporate various privacy leakage approximation mechanisms to guide the unlearning process. We provide empirical evidence of the effectiveness of our unlearning approach with a theoretical upper-bound analysis through a membership inference mechanism as a proof of concept. Our results showcase the superior performance of our approach in terms of unlearning efficacy and latency as well as the fidelity of the primary task, across four datasets and four deep learning architectures.

7/2/2024

cs.LG

Adversarial Machine Unlearning

Zonglin Di, Sixie Yu, Yevgeniy Vorobeychik, Yang Liu

This paper focuses on the challenge of machine unlearning, aiming to remove the influence of specific training data on machine learning models. Traditionally, the development of unlearning algorithms runs parallel with that of membership inference attacks (MIA), a type of privacy threat to determine whether a data instance was used for training. However, the two strands are intimately connected: one can view machine unlearning through the lens of MIA success with respect to removed data. Recognizing this connection, we propose a game-theoretic framework that integrates MIAs into the design of unlearning algorithms. Specifically, we model the unlearning problem as a Stackelberg game in which an unlearner strives to unlearn specific training data from a model, while an auditor employs MIAs to detect the traces of the ostensibly removed data. Adopting this adversarial perspective allows the utilization of new attack advancements, facilitating the design of unlearning algorithms. Our framework stands out in two ways. First, it takes an adversarial approach and proactively incorporates the attacks into the design of unlearning algorithms. Secondly, it uses implicit differentiation to obtain the gradients that limit the attacker's success, thus benefiting the process of unlearning. We present empirical results to demonstrate the effectiveness of the proposed approach for machine unlearning.

6/13/2024

cs.LG cs.CR

Erase to Enhance: Data-Efficient Machine Unlearning in MRI Reconstruction

Yuyang Xue, Jingshuai Liu, Steven McDonagh, Sotirios A. Tsaftaris

Machine unlearning is a promising paradigm for removing unwanted data samples from a trained model, towards ensuring compliance with privacy regulations and limiting harmful biases. Although unlearning has been shown in, e.g., classification and recommendation systems, its potential in medical image-to-image translation, specifically in image recon-struction, has not been thoroughly investigated. This paper shows that machine unlearning is possible in MRI tasks and has the potential to benefit for bias removal. We set up a protocol to study how much shared knowledge exists between datasets of different organs, allowing us to effectively quantify the effect of unlearning. Our study reveals that combining training data can lead to hallucinations and reduced image quality in the reconstructed data. We use unlearning to remove hallucinations as a proxy exemplar of undesired data removal. Indeed, we show that machine unlearning is possible without full retraining. Furthermore, our observations indicate that maintaining high performance is feasible even when using only a subset of retain data. We have made our code publicly accessible.

6/19/2024

eess.IV cs.CV cs.LG