Recovering Labels from Local Updates in Federated Learning

Read original: arXiv:2405.00955 - Published 5/3/2024 by Huancheng Chen, Haris Vikalo

🌿

Overview

The paper presents a novel label recovery scheme, Recovering Labels from Local Updates (RLU), which can accurately recover the labels of data used by clients in federated learning (FL) settings.
RLU achieves high performance even in realistic real-world FL settings where clients run multiple local epochs, train on heterogeneous data, and use various optimizers.
The proposed method consistently outperforms existing baselines and helps improve the quality of reconstructed images in gradient inversion (GI) attacks on FL systems.

Plain English Explanation

Federated learning (FL) is a way for multiple devices or clients to collaborate on training a machine learning model without sharing their private data. However, gradient inversion (GI) attacks can be used to try and reconstruct the clients' data from the updates they send to the central server.

One approach to speed up this data reconstruction is to first try to recover the labels of the data samples used by the clients for local training. But existing methods for extracting these labels often make assumptions that don't hold up in realistic FL settings.

The paper introduces a new technique called Recovering Labels from Local Updates (RLU) that can accurately recover the labels even when the clients run multiple local training epochs, use heterogeneous data, and different optimization methods. RLU works by analyzing the correlation between the labels of the data points used in a training round and the resulting update to the output layer of the model.

The authors show that RLU consistently outperforms other existing methods, and that it can also help improve the quality of the reconstructed images in gradient inversion attacks on FL systems.

Technical Explanation

The paper proposes a novel label recovery scheme called Recovering Labels from Local Updates (RLU) that can accurately recover the labels of data used by clients in federated learning (FL) settings, even in realistic scenarios where clients run multiple local epochs, train on heterogeneous data, and use various optimizers.

The key insight behind RLU is that there is a correlation between the labels of the data points used in a training round and the resulting update to the output layer of the model. RLU estimates the labels by solving a least-square problem that exploits this correlation.

The authors evaluate RLU on several datasets, architectures, and data heterogeneity scenarios, and demonstrate that it consistently outperforms existing baselines. They also show that using RLU to recover labels can help improve the quality of reconstructed images in gradient inversion (GI) attacks on FL systems, in terms of both PSNR and LPIPS metrics.

Critical Analysis

The paper makes a compelling case for the effectiveness of the proposed RLU method in recovering labels from local model updates in realistic FL settings. However, it's worth noting that the authors' evaluation is focused on specific datasets, architectures, and attack scenarios.

It would be valuable to see how RLU performs on a broader range of FL settings, including larger-scale systems, more diverse data distributions, and different types of gradient inversion attacks. Additionally, the paper does not address potential defenses or mitigation strategies that clients or the central server could employ to protect against such label recovery attacks.

Further research could also explore the trade-offs between the accuracy of label recovery and the level of data privacy provided by different federated learning approaches, as well as investigate the implications of label recovery attacks for other federated learning security and privacy challenges, such as data poisoning and robustness.

Conclusion

The paper presents a novel label recovery scheme, Recovering Labels from Local Updates (RLU), that can accurately reconstruct the labels of data used by clients in federated learning (FL) settings, even in realistic scenarios with multiple local epochs, heterogeneous data, and diverse optimization methods. The authors demonstrate that RLU consistently outperforms existing baselines and can help improve the quality of reconstructed images in gradient inversion attacks on FL systems.

This research highlights the importance of addressing privacy and security challenges in federated learning, as techniques like RLU can potentially undermine the privacy guarantees that FL aims to provide. Further work is needed to understand the broader implications of label recovery attacks and develop robust defenses to protect client data in real-world FL deployments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌿

Recovering Labels from Local Updates in Federated Learning

Huancheng Chen, Haris Vikalo

Gradient inversion (GI) attacks present a threat to the privacy of clients in federated learning (FL) by aiming to enable reconstruction of the clients' data from communicated model updates. A number of such techniques attempts to accelerate data recovery by first reconstructing labels of the samples used in local training. However, existing label extraction methods make strong assumptions that typically do not hold in realistic FL settings. In this paper we present a novel label recovery scheme, Recovering Labels from Local Updates (RLU), which provides near-perfect accuracy when attacking untrained (most vulnerable) models. More significantly, RLU achieves high performance even in realistic real-world settings where the clients in an FL system run multiple local epochs, train on heterogeneous data, and deploy various optimizers to minimize different objective functions. Specifically, RLU estimates labels by solving a least-square problem that emerges from the analysis of the correlation between labels of the data points used in a training round and the resulting update of the output layer. The experimental results on several datasets, architectures, and data heterogeneity scenarios demonstrate that the proposed method consistently outperforms existing baselines, and helps improve quality of the reconstructed images in GI attacks in terms of both PSNR and LPIPS.

5/3/2024

SoK: Gradient Leakage in Federated Learning

Jiacheng Du, Jiahui Hu, Zhibo Wang, Peng Sun, Neil Zhenqiang Gong, Kui Ren

Federated learning (FL) enables collaborative model training among multiple clients without raw data exposure. However, recent studies have shown that clients' private training data can be reconstructed from the gradients they share in FL, known as gradient inversion attacks (GIAs). While GIAs have demonstrated effectiveness under emph{ideal settings and auxiliary assumptions}, their actual efficacy against emph{practical FL systems} remains under-explored. To address this gap, we conduct a comprehensive study on GIAs in this work. We start with a survey of GIAs that establishes a milestone to trace their evolution and develops a systematization to uncover their inherent threats. Specifically, we categorize the auxiliary assumptions used by existing GIAs based on their practical accessibility to potential adversaries. To facilitate deeper analysis, we highlight the challenges that GIAs face in practical FL systems from three perspectives: textit{local training}, textit{model}, and textit{post-processing}. We then perform extensive theoretical and empirical evaluations of state-of-the-art GIAs across diverse settings, utilizing eight datasets and thirteen models. Our findings indicate that GIAs have inherent limitations when reconstructing data under practical local training settings. Furthermore, their efficacy is sensitive to the trained model, and even simple post-processing measures applied to gradients can be effective defenses. Overall, our work provides crucial insights into the limited effectiveness of GIAs in practical FL systems. By rectifying prior misconceptions, we hope to inspire more accurate and realistic investigations on this topic.

4/9/2024

Towards Eliminating Hard Label Constraints in Gradient Inversion Attacks

Yanbo Wang, Jian Liang, Ran He

Gradient inversion attacks aim to reconstruct local training data from intermediate gradients exposed in the federated learning framework. Despite successful attacks, all previous methods, starting from reconstructing a single data point and then relaxing the single-image limit to batch level, are only tested under hard label constraints. Even for single-image reconstruction, we still lack an analysis-based algorithm to recover augmented soft labels. In this work, we change the focus from enlarging batchsize to investigating the hard label constraints, considering a more realistic circumstance where label smoothing and mixup techniques are used in the training process. In particular, we are the first to initiate a novel algorithm to simultaneously recover the ground-truth augmented label and the input feature of the last fully-connected layer from single-input gradients, and provide a necessary condition for any analytical-based label recovery methods. Extensive experiments testify to the label recovery accuracy, as well as the benefits to the following image reconstruction. We believe soft labels in classification tasks are worth further attention in gradient inversion attacks.

4/16/2024

📈

Local Model Reconstruction Attacks in Federated Learning and their Uses

Ilias Driouich, Chuan Xu, Giovanni Neglia, Frederic Giroire, Eoin Thomas

In this paper, we initiate the study of local model reconstruction attacks for federated learning, where a honest-but-curious adversary eavesdrops the messages exchanged between a targeted client and the server, and then reconstructs the local/personalized model of the victim. The local model reconstruction attack allows the adversary to trigger other classical attacks in a more effective way, since the local model only depends on the client's data and can leak more private information than the global model learned by the server. Additionally, we propose a novel model-based attribute inference attack in federated learning leveraging the local model reconstruction attack. We provide an analytical lower-bound for this attribute inference attack. Empirical results using real world datasets confirm that our local reconstruction attack works well for both regression and classification tasks. Moreover, we benchmark our novel attribute inference attack against the state-of-the-art attacks in federated learning. Our attack results in higher reconstruction accuracy especially when the clients' datasets are heterogeneous. Our work provides a new angle for designing powerful and explainable attacks to effectively quantify the privacy risk in FL.

5/28/2024