A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings

2405.15079

Published 5/27/2024 by Madison Threadgill, Andreas Gerstlauer

A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings

Abstract

In the era of deep learning (DL), convolutional neural networks (CNNs), and large language models (LLMs), machine learning (ML) models are becoming increasingly complex, demanding significant computational resources for both inference and training stages. To address this challenge, distributed learning has emerged as a crucial approach, employing parallelization across various devices and environments. This survey explores the landscape of distributed learning, encompassing cloud and edge settings. We delve into the core concepts of data and model parallelism, examining how models are partitioned across different dimensions and layers to optimize resource utilization and performance. We analyze various partitioning schemes for different layer types, including fully connected, convolutional, and recurrent layers, highlighting the trade-offs between computational efficiency, communication overhead, and memory constraints. This survey provides valuable insights for future research and development in this rapidly evolving field by comparing and contrasting distributed learning approaches across diverse contexts.

Create account to get full access

Overview

Provides a comprehensive survey of distributed learning techniques across cloud, mobile, and edge computing settings
Examines the challenges and opportunities in deploying machine learning models in decentralized environments
Covers a range of distributed learning approaches, from communication-efficient large-scale distributed deep learning to distributed threat intelligence at edge devices

Plain English Explanation

This paper explores how machine learning can be used in distributed computing environments, like cloud, mobile, and edge devices. In these settings, the data and computing resources are spread out across different locations, which presents both challenges and opportunities for training and deploying AI models.

The paper examines various distributed learning techniques that have been developed to address these challenges. For example, communication-efficient distributed deep learning methods aim to reduce the amount of data that needs to be shared between devices, which is important when bandwidth is limited. Similarly, distributed threat intelligence at the edge explores ways to leverage the compute power of edge devices (like smartphones or IoT sensors) to detect and respond to security threats in a more decentralized way.

By surveying this landscape of distributed learning approaches, the paper provides insights into the tradeoffs and considerations involved in bringing machine learning to these more fragmented computing environments. This is an important topic as AI becomes increasingly embedded in our everyday devices and infrastructure.

Technical Explanation

The paper begins by highlighting the growing importance of distributed computing settings, such as cloud, mobile, and edge environments, for machine learning applications. In these decentralized scenarios, data and computing resources are spread across different locations, which introduces unique challenges for training and deploying AI models.

The authors then provide an in-depth review of various distributed learning techniques that have been proposed to address these challenges. This includes approaches like communication-efficient distributed deep learning, which aim to reduce the amount of data that needs to be shared between devices, as well as distributed threat intelligence at the edge, which leverage the compute power of edge devices to detect and respond to security threats in a more decentralized way.

The paper also covers other distributed learning methods, such as embedded distributed inference for deep neural networks and distributed learning for WiFi AP load prediction. These approaches explore different strategies for partitioning the learning process, optimizing communication, and managing the trade-offs between model accuracy, latency, and resource constraints in these distributed settings.

Throughout the survey, the authors highlight the key challenges, design considerations, and empirical findings that have emerged from this body of research. This provides a comprehensive overview of the state-of-the-art in distributed learning and the trade-offs involved in deploying machine learning in cloud, mobile, and edge computing environments.

Critical Analysis

The paper provides a thorough and well-structured survey of the distributed learning landscape, covering a diverse range of techniques and use cases. The authors do a commendable job of synthesizing the key insights and trade-offs from this rapidly evolving field of research.

One potential limitation of the paper is that it does not delve deeply into the specific technical details or implementation challenges of the various distributed learning approaches. While the high-level descriptions are informative, readers looking for more hands-on guidance or implementation-level details may need to consult the original research papers.

Additionally, the paper does not address some of the broader societal implications and ethical considerations associated with the widespread deployment of distributed machine learning systems. As these technologies become more ubiquitous, it will be important to consider issues around data privacy, algorithmic bias, and the equitable distribution of the benefits and risks.

Overall, this survey serves as a valuable resource for researchers and practitioners working in the field of distributed machine learning. By highlighting the current state-of-the-art and identifying key areas for further exploration, the paper lays the groundwork for continued advancements in this rapidly evolving domain.

Conclusion

This comprehensive survey paper provides a detailed overview of the state-of-the-art in distributed learning across cloud, mobile, and edge computing settings. The authors examine a range of techniques, from communication-efficient distributed deep learning to distributed threat intelligence at the edge, highlighting the unique challenges and trade-offs involved in deploying machine learning in these more fragmented computing environments.

By surveying this broad landscape of distributed learning approaches, the paper offers valuable insights for researchers and practitioners working to bring the power of AI to a wide range of distributed applications, from smart city infrastructure to IoT-enabled devices. As the use of machine learning becomes increasingly ubiquitous, understanding how to effectively leverage distributed computing resources will be crucial for realizing the full potential of these transformative technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤯

Embedded Distributed Inference of Deep Neural Networks: A Systematic Review

Federico Nicol'as Peccia, Oliver Bringmann

Embedded distributed inference of Neural Networks has emerged as a promising approach for deploying machine-learning models on resource-constrained devices in an efficient and scalable manner. The inference task is distributed across a network of embedded devices, with each device contributing to the overall computation by performing a portion of the workload. In some cases, more powerful devices such as edge or cloud servers can be part of the system to be responsible of the most demanding layers of the network. As the demand for intelligent systems and the complexity of the deployed neural network models increases, this approach is becoming more relevant in a variety of applications such as robotics, autonomous vehicles, smart cities, Industry 4.0 and smart health. We present a systematic review of papers published during the last six years which describe techniques and methods to distribute Neural Networks across these kind of systems. We provide an overview of the current state-of-the-art by analysing more than 100 papers, present a new taxonomy to characterize them, and discuss trends and challenges in the field.

5/7/2024

cs.DC

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

Feng Liang, Zhen Zhang, Haifeng Lu, Victor C. M. Leung, Yanyi Guo, Xiping Hu

With the rapid growth in the volume of data sets, models, and devices in the domain of deep learning, there is increasing attention on large-scale distributed deep learning. In contrast to traditional distributed deep learning, the large-scale scenario poses new challenges that include fault tolerance, scalability of algorithms and infrastructures, and heterogeneity in data sets, models, and resources. Due to intensive synchronization of models and sharing of data across GPUs and computing nodes during distributed training and inference processes, communication efficiency becomes the bottleneck for achieving high performance at a large scale. This article surveys the literature over the period of 2018-2023 on algorithms and technologies aimed at achieving efficient communication in large-scale distributed deep learning at various levels, including algorithms, frameworks, and infrastructures. Specifically, we first introduce efficient algorithms for model synchronization and communication data compression in the context of large-scale distributed training. Next, we introduce efficient strategies related to resource allocation and task scheduling for use in distributed training and inference. After that, we present the latest technologies pertaining to modern communication infrastructures used in distributed deep learning with a focus on examining the impact of the communication overhead in a large-scale and heterogeneous setting. Finally, we conduct a case study on the distributed training of large language models at a large scale to illustrate how to apply these technologies in real cases. This article aims to offer researchers a comprehensive understanding of the current landscape of large-scale distributed deep learning and to reveal promising future research directions toward communication-efficient solutions in this scope.

4/10/2024

cs.DC cs.AI

💬

Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach

Syed Mhamudul Hasan, Alaa M. Alotaibi, Sajedul Talukder, Abdur R. Shahid

With the proliferation of edge devices, there is a significant increase in attack surface on these devices. The decentralized deployment of threat intelligence on edge devices, coupled with adaptive machine learning techniques such as the in-context learning feature of Large Language Models (LLMs), represents a promising paradigm for enhancing cybersecurity on resource-constrained edge devices. This approach involves the deployment of lightweight machine learning models directly onto edge devices to analyze local data streams, such as network traffic and system logs, in real-time. Additionally, distributing computational tasks to an edge server reduces latency and improves responsiveness while also enhancing privacy by processing sensitive data locally. LLM servers can enable these edge servers to autonomously adapt to evolving threats and attack patterns, continuously updating their models to improve detection accuracy and reduce false positives. Furthermore, collaborative learning mechanisms facilitate peer-to-peer secure and trustworthy knowledge sharing among edge devices, enhancing the collective intelligence of the network and enabling dynamic threat mitigation measures such as device quarantine in response to detected anomalies. The scalability and flexibility of this approach make it well-suited for diverse and evolving network environments, as edge devices only send suspicious information such as network traffic and system log changes, offering a resilient and efficient solution to combat emerging cyber threats at the network edge. Thus, our proposed framework can improve edge computing security by providing better security in cyber threat detection and mitigation by isolating the edge devices from the network.

5/28/2024

cs.CR cs.AI cs.LG

Federated Learning: A Cutting-Edge Survey of the Latest Advancements and Applications

Azim Akhtarshenas, Mohammad Ali Vahedifar, Navid Ayoobi, Behrouz Maham, Tohid Alizadeh, Sina Ebrahimi, David L'opez-P'erez

Robust machine learning (ML) models can be developed by leveraging large volumes of data and distributing the computational tasks across numerous devices or servers. Federated learning (FL) is a technique in the realm of ML that facilitates this goal by utilizing cloud infrastructure to enable collaborative model training among a network of decentralized devices. Beyond distributing the computational load, FL targets the resolution of privacy issues and the reduction of communication costs simultaneously. To protect user privacy, FL requires users to send model updates rather than transmitting large quantities of raw and potentially confidential data. Specifically, individuals train ML models locally using their own data and then upload the results in the form of weights and gradients to the cloud for aggregation into the global model. This strategy is also advantageous in environments with limited bandwidth or high communication costs, as it prevents the transmission of large data volumes. With the increasing volume of data and rising privacy concerns, alongside the emergence of large-scale ML models like Large Language Models (LLMs), FL presents itself as a timely and relevant solution. It is therefore essential to review current FL algorithms to guide future research that meets the rapidly evolving ML demands. This survey provides a comprehensive analysis and comparison of the most recent FL algorithms, evaluating them on various fronts including mathematical frameworks, privacy protection, resource allocation, and applications. Beyond summarizing existing FL methods, this survey identifies potential gaps, open areas, and future challenges based on the performance reports and algorithms used in recent studies. This survey enables researchers to readily identify existing limitations in the FL field for further exploration.

5/28/2024

cs.LG cs.AI cs.CR cs.DC