Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

Read original: arXiv:2405.18890 - Published 5/30/2024 by Ziqing Fan, Shengchao Hu, Jiangchao Yao, Gang Niu, Ya Zhang, Masashi Sugiyama, Yanfeng Wang

Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

Overview

This paper proposes a new approach called Locally Estimated Global Perturbations (LEGP) for federated learning, which aims to improve upon existing sharpness-aware minimization (SAM) techniques.
The key idea is to estimate global perturbations locally at each client, rather than using local perturbations, which can lead to issues like overfitting.
The authors demonstrate that LEGP outperforms local perturbations in terms of test accuracy and robustness on several benchmark datasets.

Plain English Explanation

In machine learning, training models often involves a step called optimization, where the model parameters are adjusted to minimize the error on the training data. One common optimization technique is called sharpness-aware minimization (SAM), which tries to find parameters that are robust to small changes in the input data.

Federated learning is a way of training machine learning models where the data is distributed across many different devices, like phones or computers, and the model is trained by aggregating updates from all the devices. This can be more privacy-preserving than centralized approaches.

This paper proposes a new way of doing SAM in federated learning, called Locally Estimated Global Perturbations (LEGP). The key idea is to estimate the global perturbations (the small changes to the input data) locally on each device, rather than just using local perturbations. This can help the model learn more robust features that generalize better to new data.

The authors show that LEGP outperforms existing federated learning techniques that use local perturbations, in terms of both test accuracy (how well the model performs on new data) and robustness (how well the model performs when the input data is slightly changed).

Technical Explanation

The paper introduces a new approach called Locally Estimated Global Perturbations (LEGP) for federated sharpness-aware minimization. In traditional federated learning, each client performs local updates using local perturbations. However, the authors argue that this can lead to overfitting and poor generalization.

Instead, LEGP estimates the global perturbations (the small changes to the input data) locally on each client, and then uses these estimated global perturbations to update the model. This allows the model to learn more robust features that generalize better to new data.

The authors formulate the LEGP optimization problem and provide theoretical analysis to show its advantages over local perturbations. They also propose an efficient algorithm to solve the LEGP optimization problem in a federated setting.

The paper evaluates LEGP on several benchmark datasets, including CIFAR-10, CIFAR-100, and ImageNet, and compares it to existing federated learning methods that use local perturbations. The results show that LEGP consistently outperforms these baselines in terms of test accuracy and robustness to corruptions.

Critical Analysis

The paper presents a novel and well-motivated approach to federated learning, and the experimental results are quite compelling. However, a few potential limitations and areas for future research are worth considering:

Computational Overhead: Estimating the global perturbations locally on each client may introduce additional computational overhead compared to simpler local perturbation methods. The authors do not provide a detailed analysis of the computational complexity of LEGP.
Communication Costs: Federated learning typically aims to reduce communication costs between clients and the server. The additional information required to estimate global perturbations locally may increase the communication load, which could be a concern for resource-constrained devices.
Heterogeneous Clients: The paper assumes that all clients have access to the same global perturbation estimate. In realistic federated settings, client data and computational capabilities may be more heterogeneous, which could impact the efficacy of LEGP.
Scalability: The experiments in the paper are conducted on relatively small-scale datasets. It would be valuable to see how LEGP performs on larger, more realistic federated learning problems with a larger number of clients.
Practical Deployment: While the theoretical and experimental results are promising, the authors do not discuss the practical challenges of deploying LEGP in real-world federated learning systems, such as device failures, data drift, and privacy considerations.

Overall, the paper presents an innovative approach to federated learning that merits further investigation. Future research could explore ways to address the potential limitations and expand the applicability of LEGP to more realistic federated learning scenarios.

Conclusion

This paper introduces a new technique called Locally Estimated Global Perturbations (LEGP) for federated sharpness-aware minimization, which aims to improve upon existing federated learning methods that use local perturbations. The key idea is to estimate the global perturbations locally on each client, rather than relying solely on local perturbations.

The authors demonstrate that LEGP outperforms local perturbation-based methods in terms of both test accuracy and robustness to data corruptions on several benchmark datasets. This suggests that LEGP could be a promising approach for building more robust and generalizable machine learning models in federated learning settings.

While the paper presents compelling theoretical and experimental results, there are still some potential limitations and areas for future research, such as computational overhead, communication costs, and scalability to larger-scale federated learning problems. Addressing these challenges could further enhance the practical applicability of LEGP in real-world federated learning deployments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

Ziqing Fan, Shengchao Hu, Jiangchao Yao, Gang Niu, Ya Zhang, Masashi Sugiyama, Yanfeng Wang

In federated learning (FL), the multi-step update and data heterogeneity among clients often lead to a loss landscape with sharper minima, degenerating the performance of the resulted global model. Prevalent federated approaches incorporate sharpness-aware minimization (SAM) into local training to mitigate this problem. However, the local loss landscapes may not accurately reflect the flatness of global loss landscape in heterogeneous environments; as a result, minimizing local sharpness and calculating perturbations on client data might not align the efficacy of SAM in FL with centralized training. To overcome this challenge, we propose FedLESAM, a novel algorithm that locally estimates the direction of global perturbation on client side as the difference between global models received in the previous active and current rounds. Besides the improved quality, FedLESAM also speed up federated SAM-based approaches since it only performs once backpropagation in each iteration. Theoretically, we prove a slightly tighter bound than its original FedSAM by ensuring consistent perturbation. Empirically, we conduct comprehensive experiments on four federated benchmark datasets under three partition strategies to demonstrate the superior performance and efficiency of FedLESAM.

5/30/2024

Neighborhood and Global Perturbations Supported SAM in Federated Learning: From Local Tweaks To Global Awareness

Boyuan Li, Zihao Peng, Yafei Li, Mingliang Xu, Shengbo Chen, Baofeng Ji, Cong Shen

Federated Learning (FL) can be coordinated under the orchestration of a central server to collaboratively build a privacy-preserving model without the need for data exchange. However, participant data heterogeneity leads to local optima divergence, subsequently affecting convergence outcomes. Recent research has focused on global sharpness-aware minimization (SAM) and dynamic regularization techniques to enhance consistency between global and local generalization and optimization objectives. Nonetheless, the estimation of global SAM introduces additional computational and memory overhead, while dynamic regularization suffers from bias in the local and global dual variables due to training isolation. In this paper, we propose a novel FL algorithm, FedTOGA, designed to consider optimization and generalization objectives while maintaining minimal uplink communication overhead. By linking local perturbations to global updates, global generalization consistency is improved. Additionally, global updates are used to correct local dynamic regularizers, reducing dual variables bias and enhancing optimization consistency. Global updates are passively received by clients, reducing overhead. We also propose neighborhood perturbation to approximate local perturbation, analyzing its strengths and limitations. Theoretical analysis shows FedTOGA achieves faster convergence $O(1/T)$ under non-convex functions. Empirical studies demonstrate that FedTOGA outperforms state-of-the-art algorithms, with a 1% accuracy increase and 30% faster convergence, achieving state-of-the-art.

8/30/2024

Enhancing Sharpness-Aware Minimization by Learning Perturbation Radius

Xuehao Wang, Weisen Jiang, Shuai Fu, Yu Zhang

Sharpness-aware minimization (SAM) is to improve model generalization by searching for flat minima in the loss landscape. The SAM update consists of one step for computing the perturbation and the other for computing the update gradient. Within the two steps, the choice of the perturbation radius is crucial to the performance of SAM, but finding an appropriate perturbation radius is challenging. In this paper, we propose a bilevel optimization framework called LEarning the perTurbation radiuS (LETS) to learn the perturbation radius for sharpness-aware minimization algorithms. Specifically, in the proposed LETS method, the upper-level problem aims at seeking a good perturbation radius by minimizing the squared generalization gap between the training and validation losses, while the lower-level problem is the SAM optimization problem. Moreover, the LETS method can be combined with any variant of SAM. Experimental results on various architectures and benchmark datasets in computer vision and natural language processing demonstrate the effectiveness of the proposed LETS method in improving the performance of SAM.

8/16/2024

🔮

Locally Adaptive Federated Learning

Sohom Mukherjee, Nicolas Loizou, Sebastian U. Stich

Federated learning is a paradigm of distributed machine learning in which multiple clients coordinate with a central server to learn a model, without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) ensure balance among the clients by using the same stepsize for local updates on all clients. However, this means that all clients need to respect the global geometry of the function which could yield slow convergence. In this work, we propose locally adaptive federated learning algorithms, that leverage the local geometric information for each client function. We show that such locally adaptive methods with uncoordinated stepsizes across all clients can be particularly efficient in interpolated (overparameterized) settings, and analyze their convergence in the presence of heterogeneous data for convex and strongly convex settings. We validate our theoretical claims by performing illustrative experiments for both i.i.d. non-i.i.d. cases. Our proposed algorithms match the optimization performance of tuned FedAvg in the convex setting, outperform FedAvg as well as state-of-the-art adaptive federated algorithms like FedAMS for non-convex experiments, and come with superior generalization performance.

5/15/2024