FaaS Is Not Enough: Serverless Handling of Burst-Parallel Jobs

Read original: arXiv:2407.14331 - Published 7/22/2024 by Daniel Barcelona-Pons, Aitor Arjona, Pedro Garc'ia-L'opez, Enrique Molina-Gim'enez, Stepan Klymonchuk

FaaS Is Not Enough: Serverless Handling of Burst-Parallel Jobs

Overview

Serverless computing (Function-as-a-Service or FaaS) is a cloud computing model where developers can run code without managing servers.
The paper argues that FaaS is not enough for handling "burst-parallel" jobs, which are many short-lived tasks that need to be executed in parallel.
It proposes a new approach called "Serverless Handling of Burst-Parallel Jobs" to address the limitations of FaaS in this scenario.

Plain English Explanation

Serverless computing, also known as Function-as-a-Service (FaaS), allows developers to run code without having to worry about managing servers. This can be very useful for certain types of workloads. However, the paper argues that FaaS may not be sufficient for handling "burst-parallel" jobs - these are many short-lived tasks that need to be executed in parallel, such as processing a large number of images or analyzing sensor data from multiple sources.

The key issue is that FaaS platforms are not designed to efficiently handle this type of burst-parallel workload. They may struggle to quickly scale up and down to meet the sudden demand, leading to increased latency and cost. To address this, the paper proposes a new approach called "Serverless Handling of Burst-Parallel Jobs" that aims to provide better support for these types of workloads in a serverless environment.

Technical Explanation

The paper begins by motivating the need for better support for burst-parallel jobs in serverless computing. It explains that while FaaS is well-suited for event-driven, individual function invocations, it can struggle when faced with large numbers of short-lived, parallel tasks that need to be executed quickly.

To address this, the paper proposes a new architecture that combines FaaS with additional components to better handle burst-parallel workloads. This includes a job scheduler to coordinate the parallel tasks, a function pool to maintain a ready pool of pre-warmed function instances, and a locality manager to optimize the placement of tasks based on data locality.

The paper then describes the design and implementation of this proposed architecture, including details on how the various components interact to efficiently execute burst-parallel jobs. It also presents the results of experiments comparing this approach to a standard FaaS solution, demonstrating significant improvements in terms of latency, throughput, and cost.

Critical Analysis

The paper does a good job of identifying a real-world problem with serverless computing - the difficulty in handling burst-parallel workloads efficiently. The proposed architecture seems well-designed to address this issue, incorporating various techniques like function pooling and data locality optimization.

However, the paper does not delve into some potential limitations or areas for further research. For example, it does not discuss how the approach would scale to extremely large numbers of parallel tasks or how it would handle heterogeneous task types. Additionally, the paper does not explore potential challenges around maintaining a pool of pre-warmed function instances, such as the trade-offs between performance and cost.

It would also be interesting to see how this approach compares to other emerging serverless architectures that aim to address similar challenges, such as those incorporating edge computing or data pre-fetching techniques.

Conclusion

The paper presents a promising approach to addressing the limitations of FaaS in handling burst-parallel workloads. By combining FaaS with additional components like a job scheduler and function pool, the proposed architecture aims to provide better performance and efficiency for this type of workload.

While the paper does not cover all potential issues or alternative solutions, it makes a compelling case for the need to go beyond the basic FaaS model to support a wider range of serverless use cases. The insights and techniques presented here could inform the development of more robust and versatile serverless platforms in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

FaaS Is Not Enough: Serverless Handling of Burst-Parallel Jobs

Daniel Barcelona-Pons, Aitor Arjona, Pedro Garc'ia-L'opez, Enrique Molina-Gim'enez, Stepan Klymonchuk

Function-as-a-Service (FaaS) struggles with burst-parallel jobs due to needing multiple independent invocations to start a job. The lack of a group invocation primitive complicates application development and overlooks crucial aspects like locality and worker communication. We introduce a new serverless solution designed specifically for burst-parallel jobs. Unlike FaaS, our solution ensures job-level isolation using a group invocation primitive, allowing large groups of workers to be launched simultaneously. This method optimizes resource allocation by consolidating workers into fewer containers, speeding up their initialization and enhancing locality. Enhanced locality drastically reduces remote communication compared to FaaS, and combined with simultaneity, it enables workers to communicate synchronously via message passing and group collectives. This makes applications that are impractical with FaaS feasible. We implemented our solution on OpenWhisk, providing a communication middleware that efficiently uses locality with zero-copy messaging. Evaluations show that it reduces job invocation and communication latency, resulting in a 2$times$ speed-up for TeraSort and a 98.5% reduction in remote communication for PageRank (13$times$ speed-up) compared to traditional FaaS.

7/22/2024

🐍

Dirigent: Lightweight Serverless Orchestration

Lazar Cvetkovi'c, Franc{c}ois Costa, Mihajlo Djokic, Michal Friedman, Ana Klimovic

While Function as a Service (FaaS) platforms can initialize function sandboxes on worker nodes in 10-100s of milliseconds, the latency to schedule functions in real FaaS clusters can be orders of magnitude higher. We find that the current approach of building FaaS cluster managers on top of legacy orchestration systems like Kubernetes leads to high scheduling delay at high sandbox churn, which is typical in FaaS clusters. While generic cluster managers use hierarchical abstractions and multiple internal components to manage and reconcile state with frequent persistent updates, this becomes a bottleneck for FaaS, where cluster state frequently changes as sandboxes are created on the critical path of requests. Based on our root cause analysis of performance issues in existing FaaS cluster managers, we propose Dirigent, a clean-slate system architecture for FaaS orchestration with three key principles. First, Dirigent optimizes internal cluster manager abstractions to simplify state management. Second, it eliminates persistent state updates on the critical path of function invocations, leveraging the fact that FaaS abstracts sandboxes from users to relax exact state reconstruction guarantees. Finally, Dirigent runs monolithic control and data planes to minimize internal communication overheads and maximize throughput. We compare Dirigent to state-of-the-art FaaS platforms and show that Dirigent reduces 99th percentile per-function scheduling latency for a production workload by 2.79x compared to AWS Lambda and can spin up 2500 sandboxes per second at low latency, which is 1250x more than with Knative.

4/26/2024

📊

GeoFF: Federated Serverless Workflows with Data Pre-Fetching

Valentin Carl, Trever Schirmer, Tobias Pfandzelter, David Bermbach

Function-as-a-Service (FaaS) is a popular cloud computing model in which applications are implemented as work flows of multiple independent functions. While cloud providers usually offer composition services for such workflows, they do not support cross-platform workflows forcing developers to hardcode the composition logic. Furthermore, FaaS workflows tend to be slow due to cascading cold starts, inter-function latency, and data download latency on the critical path. In this paper, we propose GeoFF, a serverless choreography middleware that executes FaaS workflows across different public and private FaaS platforms, including ad-hoc workflow recomposition. Furthermore, GeoFF supports function pre-warming and data pre-fetching. This minimizes end-to-end workflow latency by taking cold starts and data download latency off the critical path. In experiments with our proof-of-concept prototype and a realistic application, we were able to reduce end-to-end latency by more than 50%.

5/24/2024

Increasing Efficiency and Result Reliability of Continuous Benchmarking for FaaS Applications

Tim C. Rese, Nils Japke, Sebastian Koch, Tobias Pfandzelter, David Bermbach

In a continuous deployment setting, Function-as-a-Service (FaaS) applications frequently receive updated releases, each of which can cause a performance regression. While continuous benchmarking, i.e., comparing benchmark results of the updated and the previous version, can detect such regressions, performance variability of FaaS platforms necessitates thousands of function calls, thus, making continuous benchmarking time-intensive and expensive. In this paper, we propose DuetFaaS, an approach which adapts duet benchmarking to FaaS applications. With DuetFaaS, we deploy two versions of FaaS function in a single cloud function instance and execute them in parallel to reduce the impact of platform variability. We evaluate our approach against state-of-the-art approaches, running on AWS Lambda. Overall, DuetFaaS requires fewer invocations to accurately detect performance regressions than other state-of-the-art approaches. In 98.41% of evaluated cases, our approach provides equal or smaller confidence interval size. DuetFaaS achieves an interval size reduction in 59.06% of all evaluated sample sizes when compared to the competitive approaches.

8/20/2024