Flow Optimization at Inter-Datacenter Networks for Application Run-time Acceleration

2406.12567

Published 6/19/2024 by Berta Serracanta, Alberto Rodriguez-Natal, Fabio Maino, Albert Cabellos

Flow Optimization at Inter-Datacenter Networks for Application Run-time Acceleration

Abstract

In the present-day, distributed applications are commonly spread across multiple datacenters, reaching out to edge and fog computing locations. The transition away from single datacenter hosting is driven by capacity constraints in datacenters and the adoption of hybrid deployment strategies, combining on-premise and public cloud facilities. However, the performance of such applications is often limited by extended Flow Completion Times (FCT) for short flows due to queuing behind bursts of packets from concurrent long flows. To address this challenge, we propose a solution to prioritize short flows over long flows in the Software-Defined Wide-Area Network (SD-WAN) interconnecting the distributed computing platforms. Our solution utilizes eBPF to segregate short and long flows, transmitting them over separate tunnels with the same properties. By effectively mitigating queuing delays, we consistently achieve a 1.5 times reduction in FCT for short flows, resulting in improved application response times. The proposed solution works with encrypted traffic and is application-agnostic, making it deployable in diverse distributed environments without modifying the applications themselves. Our testbed evaluation demonstrates the effectiveness of our approach in accelerating the run-time of distributed applications, providing valuable insights for optimizing multi-datacenter and edge deployments.

Create account to get full access

Overview

This paper focuses on optimizing data flow between interconnected data centers to accelerate application runtime.
The proposed solution involves leveraging software-defined wide-area network (SD-WAN) technology to dynamically adjust network routing and capacity allocation.
Key goals include reducing application latency, improving throughput, and enhancing overall performance for distributed applications.

Plain English Explanation

When applications are spread across multiple data centers, the network connections between them can significantly impact the runtime and performance of those applications. The researchers in this paper explored ways to optimize the data flow across these inter-datacenter networks.

Their approach relies on software-defined wide-area network (SD-WAN) technology, which allows the network to be programmatically controlled and adjusted. By actively monitoring application traffic and network conditions, the system can dynamically route data and allocate network resources to minimize latency and maximize throughput. This helps accelerate the runtime of distributed applications that are running across multiple data centers.

For example, if an application needs to frequently transfer large data sets between sites, the system can identify that pattern and proactively provision more network capacity along those routes. Or if network congestion starts to build up in certain areas, the system can automatically re-route traffic to less congested paths. This adaptability is crucial for optimizing the performance of complex distributed applications.

Technical Explanation

The researchers propose a framework that leverages SD-WAN technology to dynamically optimize data flow between interconnected data centers. The key components include:

Traffic Monitoring: The system continuously monitors application traffic patterns and network conditions across the inter-datacenter links.
Flow Optimization: Based on the monitoring data, the system can intelligently adjust network routing and capacity allocation to minimize latency and maximize throughput for critical application flows.
Centralized Control: A logically centralized controller orchestrates the flow optimization decisions and configures the underlying SD-WAN infrastructure accordingly.

Through this approach, the system aims to prevent cross-host attacks that could disrupt application performance, while also enhancing overall runtime for collaborative edge computing scenarios.

Critical Analysis

The paper provides a promising approach for optimizing inter-datacenter network performance to accelerate distributed applications. However, some potential limitations and areas for further research include:

The evaluation is based on simulations and modeling, so real-world deployment challenges are not addressed.
The approach focuses on optimizing network-level performance, but does not consider other factors that could impact application runtime, such as resource provisioning or workload scheduling.
The centralized control architecture may introduce scalability concerns as the number of data centers and applications grows.

Exploring collaborative fog computing ecosystems that incorporate both network and compute resource optimization could be a valuable direction for future research in this area.

Conclusion

This paper presents a novel approach for optimizing data flow across inter-datacenter networks to accelerate the runtime of distributed applications. By leveraging SD-WAN technology, the proposed framework can dynamically adjust network routing and capacity allocation to minimize latency and maximize throughput for critical application flows.

While the evaluation is limited to simulations, the underlying concepts demonstrate the potential for software-defined networking to enhance the performance of complex, geographically-distributed application architectures. Further research exploring the integration of network and compute resource optimization could yield even greater benefits for collaborative edge computing and distributed application scheduling.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Queue-aware Network Control Algorithm with a High Quantum Computing Readiness-Evaluated in Discrete-time Flow Simulator for Fat-Pipe Networks

Arthur Witt

The emerging technology of quantum computing has the potential to change the way how problems will be solved in the future. This work presents a centralized network control algorithm executable on already existing quantum computer which are based on the principle of quantum annealing like the D-Wave Advantage. We introduce a resource reoccupation algorithm for traffic engineering in wide-area networks. The proposed optimization algorithm changes traffic steering and resource allocation in case of overloaded transceivers. Settings of active components like fiber amplifiers and transceivers are not changed for the reason of stability. This algorithm is beneficial in situations when the network traffic is fluctuating in time scales of seconds or spontaneous bursts occur. Further, we developed a discrete-time flow simulator to study the algorithm's performance in wide-area networks. Our network simulator considers backlog and loss modeling of buffered transmission lines. Concurring flows are handled equally in case of a backlog. This work provides an ILP-based network configuring algorithm that is applicable on quantum annealing computers. We showcase, that traffic losses can be reduced significantly by a factor of 2 if a resource reoccupation algorithm is applied in a network with bursty traffic. As resources are used more efficiently by reoccupation in heavy load situations, overprovisioning of networks can be reduced. Thus, this new form of network operation leads toward a zero-margin network. We show that our newly introduced network simulator enables analyses of short-time effects like buffering within fat-pipe networks. As the calculation of network configurations in real-sized networks is typically time-consuming, quantum computing can enable the proposed network configuration algorithm for application in real-sized wide-area networks.

5/21/2024

eess.SY cs.ET cs.SY

Multi-Source Coflow Scheduling in Collaborative Edge Computing with Multihop Network

Yuvraj Sahni, Jiannong Cao, Lei Yang, Shengwei Wang

Collaborative edge computing has become a popular paradigm where edge devices collaborate by sharing resources. Data dissemination is a fundamental problem in CEC to decide what data is transmitted from which device and how. Existing works on data dissemination have not focused on coflow scheduling in CEC, which involves deciding the order of flows within and across coflows at network links. Coflow implies a set of parallel flows with a shared objective. The existing works on coflow scheduling in data centers usually assume a non-blocking switch and do not consider congestion at different links in the multi-hop path in CEC, leading to increased coflow completion time (CCT). Furthermore, existing works do not consider multiple flow sources that cannot be ignored, as data can have duplicate copies at different edge devices. This work formulates the multi-source coflow scheduling problem in CEC, which includes jointly deciding the source and flow ordering for multiple coflows to minimize the sum of CCT. This problem is shown to be NP-hard and challenging as each flow can have multiple dependent conflicts at multiple links. We propose a source and coflow-aware search and adjust (SCASA) heuristic that first provides an initial solution considering the coflow characteristics. SCASA further improves the initial solution using the source search and adjust heuristic by leveraging the knowledge of both coflows and network congestion at links. Evaluation done using simulation experiments shows that SCASA leads to up to 83% reduction in the sum of CCT compared to benchmarks without a joint solution.

5/30/2024

cs.NI cs.DC

👁️

Scheduling of Distributed Applications on the Computing Continuum: A Survey

Narges Mehran, Dragi Kimovski, Hermann Hellwagner, Dumitru Roman, Ahmet Soylu, Radu Prodan

The demand for distributed applications has significantly increased over the past decade, with improvements in machine learning techniques fueling this growth. These applications predominantly utilize Cloud data centers for high-performance computing and Fog and Edge devices for low-latency communication for small-size machine learning model training and inference. The challenge of executing applications with different requirements on heterogeneous devices requires effective methods for solving NP-hard resource allocation and application scheduling problems. The state-of-the-art techniques primarily investigate conflicting objectives, such as the completion time, energy consumption, and economic cost of application execution on the Cloud, Fog, and Edge computing infrastructure. Therefore, in this work, we review these research works considering their objectives, methods, and evaluation tools. Based on the review, we provide a discussion on the scheduling methods in the Computing Continuum.

5/2/2024

cs.DC

A Paradigm For Collaborative Pervasive Fog Computing Ecosystems at the Network Edge

Abderrahmen Mtibaa

While the success of edge and fog computing increased with the proliferation of the Internet of Things (IoT) solutions, such novel computing paradigm, that moves compute resources closer to the source of data and services, must address many challenges such as reducing communication overhead to/from datacenters, the latency to compute and receive results, as well as energy consumption at the mobile and IoT devices. fog-to-fog (f2f) cooperation has recently been proposed to increase the computation capacity at the network edge through cooperation across multiple stakeholders. In this paper we adopt an analytical approach to studying f2f cooperation paradigm. We highlight the benefits of using such new paradigm in comparison with traditional three-tier fog computing paradigms. We use a Continuous Time Markov Chain (CTMC) model for the N f2f cooperating nodes and cast cooperation as an optimization problem, which we solve using the proposed model.

4/19/2024

cs.NI