Adaptive Decentralized Federated Learning in Energy and Latency Constrained Wireless Networks

2403.20075

Published 4/1/2024 by Zhigang Yan, Dong Li

🐍

Abstract

In Federated Learning (FL), with parameter aggregated by a central node, the communication overhead is a substantial concern. To circumvent this limitation and alleviate the single point of failure within the FL framework, recent studies have introduced Decentralized Federated Learning (DFL) as a viable alternative. Considering the device heterogeneity, and energy cost associated with parameter aggregation, in this paper, the problem on how to efficiently leverage the limited resources available to enhance the model performance is investigated. Specifically, we formulate a problem that minimizes the loss function of DFL while considering energy and latency constraints. The proposed solution involves optimizing the number of local training rounds across diverse devices with varying resource budgets. To make this problem tractable, we first analyze the convergence of DFL with edge devices with different rounds of local training. The derived convergence bound reveals the impact of the rounds of local training on the model performance. Then, based on the derived bound, the closed-form solutions of rounds of local training in different devices are obtained. Meanwhile, since the solutions require the energy cost of aggregation as low as possible, we modify different graph-based aggregation schemes to solve this energy consumption minimization problem, which can be applied to different communication scenarios. Finally, a DFL framework which jointly considers the optimized rounds of local training and the energy-saving aggregation scheme is proposed. Simulation results show that, the proposed algorithm achieves a better performance than the conventional schemes with fixed rounds of local training, and consumes less energy than other traditional aggregation schemes.

Create account to get full access

Introduction

The paper discusses the challenges and opportunities presented by the proliferation of Internet of Things (IoT) devices and the need for efficient strategies to leverage the vast amounts of data generated. It highlights the emergence of federated learning (FL) as a distributed learning approach to address communication and privacy concerns. The authors then introduce the concept of decentralized federated learning (DFL), which aims to mitigate the limitations of the conventional FL paradigm by enabling direct communication between devices for parameter aggregation.

The paper identifies a gap in the existing research on DFL, specifically regarding device heterogeneity. This heterogeneity can involve statistical heterogeneity (non-i.i.d. data distribution across devices) and system heterogeneity (variability in computation and communication resources). The authors note that the impact of this heterogeneity on DFL performance has not been sufficiently explored, particularly in the context of limited energy budgets.

To address this research gap, the paper proposes a DFL framework that allows for adaptive local training rounds based on the individual resource budgets of the devices. The authors formulate an optimization problem to minimize the DFL loss function, considering joint energy and latency constraints. They then analyze the convergence of the proposed framework and use the derived convergence bound to reformulate the original problem as a tractable optimization problem. Additionally, the paper explores energy-saving aggregation schemes, such as those based on the Minimum Spanning Tree (MST) algorithm and the Ring-AllReduce algorithm, to reduce the energy consumption during the aggregation process under different communication conditions.

The paper concludes by stating that the simulation results confirm the theoretical analyses and demonstrate the benefits of the proposed adaptive local training rounds scheme and energy-efficient aggregation schemes compared to conventional approaches.

System Model and Problem Formulation

This paper proposes a decentralized federated learning (DFL) system consisting of N devices. Each device has its own dataset, and the full dataset is the union of all the individual datasets. The goal of DFL is to find an optimal parameter vector that minimizes the global loss function across all devices.

The DFL system is modeled as an undirected graph, where each node represents a device and the edges represent connections between devices. The DFL process involves two key steps: local training and parameter aggregation. In the local training step, each device independently updates its local model parameters using gradient descent or stochastic gradient descent on its local dataset. In the parameter aggregation step, the updated parameters from each device are averaged among neighboring devices to synchronize the model parameters across the system.

The paper also considers the computation and communication costs associated with the DFL system. The latency and energy costs of local training and inter-device communication are modeled. An optimization problem is formulated to minimize the global loss function while satisfying constraints on the overall system latency and the energy consumption of individual devices. The goal is to achieve a cost-efficient DFL system that balances performance and resource utilization.

Convergence Analysis of DFL

This paper analyzes the influence of variable rounds of local training on the global loss function in a Distributed Federated Learning (DFL) system. Due to the diversity of loss functions across different machine learning models and scenarios, obtaining a general closed-form expression for the loss function about the local training rounds is challenging.

To address this, the paper proposes an alternative approach by deriving the convergence bound of DFL. The convergence bound serves as an evaluation metric to assess model performance and offers insights into how the number of local training rounds affects the overall DFL system performance.

The analysis makes several assumptions about the local and global loss functions and their gradients. It then provides two key theorems:

Theorem 1 derives an upper bound on the gap between the global loss function after T iterations and the global optimum. This bound can be divided into three parts: the impact of the whole local training, the impact of local training in each iteration, and the impact of the aggregation.
Theorem 2 provides an upper bound on the impact of the aggregation, which is determined by the non-i.i.d. level and the number of participating devices. A higher non-i.i.d. level and more devices lead to greater performance damage.

Finally, the paper presents a corollary that characterizes the convergence rate of DFL in the best case scenario, showing that DFL converges linearly when the strongly convex and Lipschitz smooth assumptions hold in the i.i.d. case.

V Algorithm Design for Adaptive DFL

This section reformulates the original problem (15) by deriving a convergence bound and transforming it into N sets of problems, each containing two sub-problems. By solving these sub-problems and obtaining closed-form expressions, the optimized solutions of the original problem (15) and an energy-saving aggregation scheme can be obtained.

The section first shows that the upper bound of the original objective function is tight, so the problem can be reformulated as minimizing this upper bound. This reformulated problem has two parts - minimizing the communication energy cost between devices and minimizing the number of local training iterations.

For the first part, the section considers two cases - when channel information between devices is known and when it is not. When known, a graph-based minimum spanning tree (MST) algorithm is used to find the minimum energy communication links. When unknown, a ring-based aggregation scheme or randomized gossip algorithm is used.

For the second part, the section derives a closed-form solution for the optimal number of local training iterations on each device, which depends on the total training budget and the individual device constraints. This adaptive local training approach aims to enhance performance and utilize resources efficiently.

Finally, the section proposes the Adaptive DFL algorithm that incorporates the solutions from both parts, dynamically selecting the aggregation scheme based on communication conditions and adaptively setting the local training iterations on each device.

Simulation Results

The provided text evaluates the convergence, performance, and energy cost of a Distributed Federated Learning (DFL) system with an adaptive local training algorithm and energy-saving aggregation schemes through simulations.

Key findings:

Convergence:
- DFL exhibits consistent convergence behavior across different datasets and models.
- Aggregation schemes based on Minimum Spanning Tree (MST) or Ring-AllReduce algorithm achieve better performance after convergence.
- The impact of non-i.i.d. data distribution is observed, with DFL systems with fewer devices performing better.
Convergence rate:
- DFL with fewer devices converges faster compared to configurations with more devices, consistent with the theoretical analysis.
Aggregation energy cost:
- The proposed aggregation schemes based on MST and Ring-AllReduce demonstrate superior energy efficiency compared to the randomized gossip algorithm.
Model performance:
- The proposed adaptive local training scheme outperforms fixed local training schemes and several baselines, including Gossip Learning, Balanced DFL, and PENS.
- The advantages of the proposed scheme are more pronounced in complex tasks compared to simpler datasets.

In summary, the provided text evaluates the DFL system's convergence, performance, and energy efficiency, highlighting the benefits of the proposed adaptive local training algorithm and energy-saving aggregation schemes.

Conclusions

This paper presents a Distributed Federated Learning (DFL) framework that aims to enhance the global model performance by efficiently utilizing the limited computational and communication resources of each device. The key contributions are:

Formulating an optimization problem to minimize the global loss function by allocating the limited total rounds of local training across different iterations.
Deriving convergence bounds and rates for the DFL framework under different local training rounds, which allows reformulating the problem as a tractable one.
Proposing various aggregation schemes based on the device connection graph topology to reduce energy costs during aggregation under different communication conditions.
Deriving closed-form solutions for allocating the limited total local training rounds among different iterations and devices, providing guidelines for improving DFL performance through efficient resource utilization.

The theoretical analysis is validated through simulation results, which show the proposed adaptive local training scheme outperforms benchmarks and the graph-based aggregation scheme reduces energy costs compared to conventional randomized gossip algorithms.

Appendix A Proof of Theorem 1

The provided text proves Theorem 1 using two lemmas. Lemma 1 states that the L2 norm of the gradient of the loss function is lower bounded by a function of the loss value. Lemma 2 provides an upper bound on the expected L2 norm of the stochastic gradient.

The proof first considers a bound on the difference between the loss at the current iterate and the optimal loss. It then recursively applies this bound to obtain an overall bound on the difference between the final loss and the optimal loss. The key steps are:

Bound the loss difference using the lower bound from Lemma 1 and the upper bound from Lemma 2.
Repeatedly apply this bound across the iterates to get an overall bound.
Combine the bounds for all clients to obtain the final result.

The proof relies on several technical inequalities and manipulations to arrive at the final bound stated in equation (43).

Appendix B Proof of Theorem 2

The paper provides an analysis of the convergence of a stochastic optimization algorithm. The key result is an upper bound on the expected difference between the function value at the current iterate and the function value at the average of the past iterates.

The analysis relies on several assumptions, including Lipschitz continuity and bounded gradient assumptions. Using these assumptions, the authors derive the following upper bound:

E[F(w_T) - F(w̃_T)] ≤ (η/2)[(δ_m^2 + G)^2 + σ(1 + (η + δ_m√η)M)ϕ(N) + δ_m ϕ(N)] ⋅ Σ_t=1^T 1/t^2

Where:

η is the step size
δ_m is a bound on the gradient noise
G is a Lipschitz constant
σ is a bound on the variance of the stochastic gradients
M is a Lipschitz constant
N is the number of samples used in the stochastic gradient
ϕ(N) is a function of N that measures the quality of the preconditioner matrix P

The authors show this bound holds under certain choices of the step size η. The full proof is provided in the text.

Appendix C Proof of Corollary 1

The provided text presents a convergence bound for the objective function in an optimization problem. Key points are:

The expression for the bound is derived, involving terms related to the objective function and certain parameters.
The bound is shown to be tight, as the limit of the terms goes to 0 as the iteration count goes to infinity.
The bound can be rewritten in a simpler form that is equivalent to the original expression.
After some basic manipulations, the final convergence bound is arrived at, as stated in equation (25) of the paper.

The text does not provide a summary for any other sections.

Appendix D Proof of Theorem 4

Appendix E Proof of Lemma 1

Appendix F Proof of Lemma 2

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🛠️

Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

Shahryar Zehtabi, Dong-Jun Han, Rohit Parasnis, Seyyedali Hosseinalipour, Christopher G. Brinton

Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings where clients conduct a fixed number of local updates between local model exchanges, overlooking heterogeneity and dynamics in communication and computation capabilities. In this work, we propose Decentralized Sporadic Federated Learning (DSpodFL), a DFL methodology built on a generalized notion of sporadicity in both local gradient and aggregation processes. DSpodFL subsumes many existing decentralized optimization methods under a unified algorithmic framework by modeling the per-iteration (i) occurrence of gradient descent at each client and (ii) exchange of models between client pairs as arbitrary indicator random variables, thus capturing heterogeneous and time-varying computation/communication scenarios. We analytically characterize the convergence behavior of DSpodFL for both convex and non-convex models, for both constant and diminishing learning rates, under mild assumptions on the communication graph connectivity, data heterogeneity across clients, and gradient noises, and show how our bounds recover existing results as special cases. Experiments demonstrate that DSpodFL consistently achieves improved training speeds compared with baselines under various system settings.

6/4/2024

cs.LG cs.DC

Decentralized Personalized Federated Learning based on a Conditional Sparse-to-Sparser Scheme

Qianyu Long, Qiyuan Wang, Christos Anagnostopoulos, Daning Bi

Decentralized Federated Learning (DFL) has become popular due to its robustness and avoidance of centralized coordination. In this paradigm, clients actively engage in training by exchanging models with their networked neighbors. However, DFL introduces increased costs in terms of training and communication. Existing methods focus on minimizing communication often overlooking training efficiency and data heterogeneity. To address this gap, we propose a novel textit{sparse-to-sparser} training scheme: DA-DPFL. DA-DPFL initializes with a subset of model parameters, which progressively reduces during training via textit{dynamic aggregation} and leads to substantial energy savings while retaining adequate information during critical learning periods. Our experiments showcase that DA-DPFL substantially outperforms DFL baselines in test accuracy, while achieving up to $5$ times reduction in energy costs. We provide a theoretical analysis of DA-DPFL's convergence by solidifying its applicability in decentralized and personalized learning. The code is available at:https://github.com/EricLoong/da-dpfl

4/26/2024

cs.LG cs.AI

Federated Learning With Energy Harvesting Devices: An MDP Framework

Kai Zhang, Xuanyu Cao

Federated learning (FL) requires edge devices to perform local training and exchange information with a parameter server, leading to substantial energy consumption. A critical challenge in practical FL systems is the rapid energy depletion of battery-limited edge devices, which curtails their operational lifespan and affects the learning performance. To address this issue, we apply energy harvesting technique in FL systems to extract ambient energy for continuously powering edge devices. We first establish the convergence bound for the wireless FL system with energy harvesting devices, illustrating that the convergence is impacted by partial device participation and packet drops, both of which depend on the energy supply. To accelerate the convergence, we formulate a joint device scheduling and power control problem and model it as a Markov decision process (MDP). By solving this MDP, we derive the optimal transmission policy and demonstrate that it possesses a monotone structure with respect to the battery and channel states. To overcome the curse of dimensionality caused by the exponential complexity of computing the optimal policy, we propose a low-complexity algorithm, which is asymptotically optimal as the number of devices increases. Furthermore, for unknown channels and harvested energy statistics, we develop a structure-enhanced deep reinforcement learning algorithm that leverages the monotone structure of the optimal policy to improve the training performance. Finally, extensive numerical experiments on real-world datasets are presented to validate the theoretical results and corroborate the effectiveness of the proposed algorithms.

5/20/2024

cs.LG eess.SP

🤿

Decentralized Federated Learning Over Imperfect Communication Channels

Weicai Li, Tiejun Lv, Wei Ni, Jingbo Zhao, Ekram Hossain, H. Vincent Poor

This paper analyzes the impact of imperfect communication channels on decentralized federated learning (D-FL) and subsequently determines the optimal number of local aggregations per training round, adapting to the network topology and imperfect channels. We start by deriving the bias of locally aggregated D-FL models under imperfect channels from the ideal global models requiring perfect channels and aggregations. The bias reveals that excessive local aggregations can accumulate communication errors and degrade convergence. Another important aspect is that we analyze a convergence upper bound of D-FL based on the bias. By minimizing the bound, the optimal number of local aggregations is identified to balance a trade-off with accumulation of communication errors in the absence of knowledge of the channels. With this knowledge, the impact of communication errors can be alleviated, allowing the convergence upper bound to decrease throughout aggregations. Experiments validate our convergence analysis and also identify the optimal number of local aggregations on two widely considered image classification tasks. It is seen that D-FL, with an optimal number of local aggregations, can outperform its potential alternatives by over 10% in training accuracy.

5/22/2024

cs.DC cs.IT cs.LG