DP-DyLoRA: Fine-Tuning Transformer-Based Models On-Device under Differentially Private Federated Learning using Dynamic Low-Rank Adaptation

2405.06368

Published 5/29/2024 by Jie Xu, Karthikeyan Saravanan, Rogier van Dalen, Haaris Mehmood, David Tuckey, Mete Ozay

🏷️

Abstract

Federated learning (FL) allows clients in an Internet of Things (IoT) system to collaboratively train a global model without sharing their local data with a server. However, clients' contributions to the server can still leak sensitive information. Differential privacy (DP) addresses such leakage by providing formal privacy guarantees, with mechanisms that add randomness to the clients' contributions. The randomness makes it infeasible to train large transformer-based models, common in modern IoT systems. In this work, we empirically evaluate the practicality of fine-tuning large scale on-device transformer-based models with differential privacy in a federated learning system. We conduct comprehensive experiments on various system properties for tasks spanning a multitude of domains: speech recognition, computer vision (CV) and natural language understanding (NLU). Our results show that full fine-tuning under differentially private federated learning (DP-FL) generally leads to huge performance degradation which can be alleviated by reducing the dimensionality of contributions through parameter-efficient fine-tuning (PEFT). Our benchmarks of existing DP-PEFT methods show that DP-Low-Rank Adaptation (DP-LoRA) consistently outperforms other methods. An even more promising approach, DyLoRA, which makes the low rank variable, when naively combined with FL would straightforwardly break differential privacy. We therefore propose an adaptation method that can be combined with differential privacy and call it DP-DyLoRA. Finally, we are able to reduce the accuracy degradation and word error rate (WER) increase due to DP to less than 2% and 7% respectively with 1 million clients and a stringent privacy budget of {epsilon}=2.

Create account to get full access

Overview

Federated learning (FL) allows devices in an Internet of Things (IoT) system to collaboratively train a global model without sharing their local data with a server.
Differential privacy (DP) can be used to provide formal privacy guarantees for the clients' contributions in FL, but this can degrade the performance of large transformer-based models common in modern IoT systems.
This paper empirically evaluates the practicality of fine-tuning large-scale on-device transformer-based models with differential privacy in a federated learning system.

Plain English Explanation

Federated learning is a way for devices in an Internet of Things (IoT) system to work together to train a shared machine learning model, without each device having to share its private data with a central server. This is useful for protecting people's privacy.

However, even with federated learning, the information that devices send to the server can potentially leak sensitive details about the users. Differential privacy is a technique that can be used to add random noise to the information, making it much harder to extract sensitive details.

The problem is that this added randomness can seriously degrade the performance of large, powerful machine learning models, like the transformer models commonly used in modern IoT systems. This paper looks at whether it's practical to use these large models with differential privacy in a federated learning system, and how to minimize the performance impact.

Technical Explanation

The researchers conducted comprehensive experiments on various tasks spanning speech recognition, computer vision, and natural language understanding. They found that fully fine-tuning large transformer-based models with differentially private federated learning (DP-FL) generally leads to significant performance degradation.

To address this, the researchers explored using "parameter-efficient fine-tuning" (PEFT) methods, which reduce the dimensionality of the model updates sent to the server. Of the PEFT methods tested, DP-Low-Rank Adaptation (DP-LoRA) consistently outperformed the others.

The researchers also proposed an even more promising approach called DP-DyLoRA, which makes the low-rank adaptation variable. While this would normally break differential privacy, the researchers developed an adaptation method that can maintain the privacy guarantees.

Overall, the researchers were able to reduce the accuracy degradation and word error rate increase due to differential privacy to less than 2% and 7% respectively, even with a stringent privacy budget and a large number of clients.

Critical Analysis

The paper provides a valuable empirical evaluation of the practical challenges in deploying large transformer-based models with differential privacy in a federated learning setting. The researchers acknowledge that their results may be specific to the particular tasks and models they evaluated, and that further research is needed to generalize the findings.

One potential limitation is that the paper does not deeply explore the theoretical underpinnings or broader implications of the DP-DyLoRA approach. While the results are promising, more analysis is needed to understand the broader applicability and potential drawbacks of this technique.

Additionally, the paper does not address the computational and communication overhead introduced by the differential privacy mechanisms, which could be a significant practical concern, especially for resource-constrained IoT devices. Further research is needed to quantify these tradeoffs.

Overall, this paper makes an important contribution by highlighting the practical challenges in this domain and proposing promising technical solutions. However, there are still many open questions and areas for further research to fully understand the feasibility and implications of deploying large-scale differentially private federated learning systems.

Conclusion

This paper empirically evaluates the practicality of fine-tuning large-scale transformer-based models with differential privacy in a federated learning system. The researchers found that differential privacy can significantly degrade model performance, but that techniques like DP-LoRA and DP-DyLoRA can help mitigate this issue. Overall, the results suggest that it is possible to deploy large-scale differentially private federated learning systems, but there are still important challenges and tradeoffs to be explored.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Xiao-Yang Liu, Rongyi Zhu, Daochen Zha, Jiechao Gao, Shan Zhong, Matt White, Meikang Qiu

The surge in interest and application of large language models (LLMs) has sparked a drive to fine-tune these models to suit specific applications, such as finance and medical science. However, concerns regarding data privacy have emerged, especially when multiple stakeholders aim to collaboratively enhance LLMs using sensitive data. In this scenario, federated learning becomes a natural choice, allowing decentralized fine-tuning without exposing raw data to central servers. Motivated by this, we investigate how data privacy can be ensured in LLM fine-tuning through practical federated learning approaches, enabling secure contributions from multiple parties to enhance LLMs. Yet, challenges arise: 1) despite avoiding raw data exposure, there is a risk of inferring sensitive information from model outputs, and 2) federated learning for LLMs incurs notable communication overhead. To address these challenges, this article introduces DP-LoRA, a novel federated learning algorithm tailored for LLMs. DP-LoRA preserves data privacy by employing a Gaussian mechanism that adds noise in weight updates, maintaining individual data privacy while facilitating collaborative model training. Moreover, DP-LoRA optimizes communication efficiency via low-rank adaptation, minimizing the transmission of updated weights during distributed training. The experimental results across medical, financial, and general datasets using various LLMs demonstrate that DP-LoRA effectively ensures strict privacy constraints while minimizing communication overhead.

6/4/2024

cs.LG cs.CR

Enhancing Federated Learning with Adaptive Differential Privacy and Priority-Based Aggregation

Mahtab Talaei, Iman Izadi

Federated learning (FL), a novel branch of distributed machine learning (ML), develops global models through a private procedure without direct access to local datasets. However, it is still possible to access the model updates (gradient updates of deep neural networks) transferred between clients and servers, potentially revealing sensitive local information to adversaries using model inversion attacks. Differential privacy (DP) offers a promising approach to addressing this issue by adding noise to the parameters. On the other hand, heterogeneities in data structure, storage, communication, and computational capabilities of devices can cause convergence problems and delays in developing the global model. A personalized weighted averaging of local parameters based on the resources of each device can yield a better aggregated model in each round. In this paper, to efficiently preserve privacy, we propose a personalized DP framework that injects noise based on clients' relative impact factors and aggregates parameters while considering heterogeneities and adjusting properties. To fulfill the DP requirements, we first analyze the convergence boundary of the FL algorithm when impact factors are personalized and fixed throughout the learning process. We then further study the convergence property considering time-varying (adaptive) impact factors.

6/27/2024

cs.LG cs.CR cs.DC

FDLoRA: Personalized Federated Learning of Large Language Model via Dual LoRA Tuning

Jiaxing QI, Zhongzhi Luan, Shaohan Huang, Carol Fung, Hailong Yang, Depei Qian

Large language models (LLMs) have emerged as important components across various fields, yet their training requires substantial computation resources and abundant labeled data. It poses a challenge to robustly training LLMs for individual users (clients). To tackle this challenge, the intuitive idea is to introduce federated learning (FL), which can collaboratively train models on distributed private data. However, existing methods suffer from the challenges of data heterogeneity, system heterogeneity, and model size, resulting in suboptimal performance and high costs. In this work, we proposed a variant of personalized federated learning (PFL) framework, namely FDLoRA, which allows the client to be a single device or a cluster and adopts low-rank adaptation (LoRA) tuning. FDLoRA sets dual LoRA modules on each client to capture personalized and global knowledge, respectively, and only the global LoRA module uploads parameters to the central server to aggregate cross-client knowledge. Finally, an adaptive fusion approach is employed to combine the parameters of the dual LoRAs. This enables FDLoRA to make effective use of private data distributed across different clients, thereby improving performance on the client without incurring high communication and computing costs. We conducted extensive experiments in two practice scenarios. The results demonstrate that FDLoRA outperforms six baselines in terms of performance, stability, robustness, computation cost, and communication cost.

6/13/2024

cs.DC

💬

Can Public Large Language Models Help Private Cross-device Federated Learning?

Boxin Wang, Yibo Jacky Zhang, Yuan Cao, Bo Li, H. Brendan McMahan, Sewoong Oh, Zheng Xu, Manzil Zaheer

We study (differentially) private federated learning (FL) of language models. The language models in cross-device FL are relatively small, which can be trained with meaningful formal user-level differential privacy (DP) guarantees when massive parallelism in training is enabled by the participation of a moderate size of users. Recently, public data has been used to improve privacy-utility trade-offs for both large and small language models. In this work, we provide a systematic study of using large-scale public data and LLMs to help differentially private training of on-device FL models, and further improve the privacy-utility tradeoff by techniques of distillation. Moreover, we propose a novel distribution matching algorithm with theoretical grounding to sample public data close to private data distribution, which significantly improves the sample efficiency of (pre-)training on public data. The proposed method is efficient and effective for training private models by taking advantage of public data, especially for customized on-device architectures that do not have ready-to-use pre-trained models.

4/16/2024

cs.LG