Dirigent: Lightweight Serverless Orchestration

Read original: arXiv:2404.16393 - Published 4/26/2024 by Lazar Cvetkovi'c, Franc{c}ois Costa, Mihajlo Djokic, Michal Friedman, Ana Klimovic
Total Score

0

🐍

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • FaaS (Function as a Service) platforms can quickly initialize function sandboxes, but scheduling functions in real FaaS clusters can take much longer.
  • The current approach of building FaaS cluster managers on top of legacy orchestration systems like Kubernetes leads to high scheduling delays when there is a lot of sandbox creation.
  • Existing cluster managers use complex abstractions and components that become a bottleneck for the frequent state changes in FaaS environments.

Plain English Explanation

When you use a FaaS service like AWS Lambda, the platform can quickly create a tiny, isolated "sandbox" environment to run your code. However, the overall process of scheduling that function to run in a real FaaS cluster can take much longer, often orders of magnitude higher than the sandbox creation time.

The researchers found that the current way of building FaaS cluster management systems, using existing orchestration platforms like Kubernetes, leads to these high scheduling delays. These general-purpose cluster managers use complex internal structures and processes to manage and keep track of the state of the entire cluster. But in a FaaS environment, the cluster state is constantly changing as new function sandboxes are quickly created and destroyed. This frequent flux in the cluster state becomes a bottleneck for the cluster manager.

Technical Explanation

The researchers analyzed the root causes of performance issues in existing FaaS cluster managers and proposed a new system called Dirigent to address them. Dirigent has three key principles:

  1. Optimized Abstractions: Dirigent simplifies the internal cluster manager abstractions to streamline state management, unlike the hierarchical and complex designs of generic cluster managers.

  2. Eliminating Persistent Updates: Dirigent avoids making persistent state updates on the critical path of function invocations. This is possible because FaaS abstracts away the sandboxes from users, relaxing the need for exact state reconstruction.

  3. Monolithic Design: Dirigent runs a monolithic control and data plane, minimizing internal communication overhead and maximizing throughput, unlike the distributed architectures of existing platforms.

The researchers compared Dirigent to state-of-the-art FaaS platforms and found that it reduces the 99th percentile per-function scheduling latency by 2.79x compared to AWS Lambda. Dirigent can also spin up 2500 sandboxes per second at low latency, which is 1250x more than the Knative platform.

Critical Analysis

The paper provides a thorough analysis of the performance issues in current FaaS cluster management systems and proposes a thoughtful solution in Dirigent. However, the researchers acknowledge that Dirigent's design choices, such as the monolithic architecture, may have limitations in terms of scalability and fault tolerance compared to more distributed systems.

Additionally, the paper focuses on optimizing function scheduling latency, but does not address other important FaaS considerations, such as resource utilization, cost-efficiency, or support for advanced features like stateful functions. Further research may be needed to evaluate Dirigent's performance and applicability in a broader range of FaaS use cases.

Conclusion

The Dirigent system proposed in this paper presents a novel approach to FaaS cluster orchestration, addressing the performance bottlenecks of existing solutions. By simplifying internal abstractions, eliminating persistent state updates, and using a monolithic design, Dirigent achieves significant improvements in function scheduling latency and sandbox creation throughput.

These advancements in FaaS cluster management could have important implications for the adoption and performance of serverless computing, enabling more responsive and efficient execution of cloud-hosted functions. As the demand for serverless architectures and containerized microservices continues to grow, innovations like Dirigent could help accelerate microservices and optimize the scheduling of DNN workflows in FaaS environments.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🐍

Total Score

0

Dirigent: Lightweight Serverless Orchestration

Lazar Cvetkovi'c, Franc{c}ois Costa, Mihajlo Djokic, Michal Friedman, Ana Klimovic

While Function as a Service (FaaS) platforms can initialize function sandboxes on worker nodes in 10-100s of milliseconds, the latency to schedule functions in real FaaS clusters can be orders of magnitude higher. We find that the current approach of building FaaS cluster managers on top of legacy orchestration systems like Kubernetes leads to high scheduling delay at high sandbox churn, which is typical in FaaS clusters. While generic cluster managers use hierarchical abstractions and multiple internal components to manage and reconcile state with frequent persistent updates, this becomes a bottleneck for FaaS, where cluster state frequently changes as sandboxes are created on the critical path of requests. Based on our root cause analysis of performance issues in existing FaaS cluster managers, we propose Dirigent, a clean-slate system architecture for FaaS orchestration with three key principles. First, Dirigent optimizes internal cluster manager abstractions to simplify state management. Second, it eliminates persistent state updates on the critical path of function invocations, leveraging the fact that FaaS abstracts sandboxes from users to relax exact state reconstruction guarantees. Finally, Dirigent runs monolithic control and data planes to minimize internal communication overheads and maximize throughput. We compare Dirigent to state-of-the-art FaaS platforms and show that Dirigent reduces 99th percentile per-function scheduling latency for a production workload by 2.79x compared to AWS Lambda and can spin up 2500 sandboxes per second at low latency, which is 1250x more than with Knative.

Read more

4/26/2024

FaaS Is Not Enough: Serverless Handling of Burst-Parallel Jobs
Total Score

0

FaaS Is Not Enough: Serverless Handling of Burst-Parallel Jobs

Daniel Barcelona-Pons, Aitor Arjona, Pedro Garc'ia-L'opez, Enrique Molina-Gim'enez, Stepan Klymonchuk

Function-as-a-Service (FaaS) struggles with burst-parallel jobs due to needing multiple independent invocations to start a job. The lack of a group invocation primitive complicates application development and overlooks crucial aspects like locality and worker communication. We introduce a new serverless solution designed specifically for burst-parallel jobs. Unlike FaaS, our solution ensures job-level isolation using a group invocation primitive, allowing large groups of workers to be launched simultaneously. This method optimizes resource allocation by consolidating workers into fewer containers, speeding up their initialization and enhancing locality. Enhanced locality drastically reduces remote communication compared to FaaS, and combined with simultaneity, it enables workers to communicate synchronously via message passing and group collectives. This makes applications that are impractical with FaaS feasible. We implemented our solution on OpenWhisk, providing a communication middleware that efficiently uses locality with zero-copy messaging. Evaluations show that it reduces job invocation and communication latency, resulting in a 2$times$ speed-up for TeraSort and a 98.5% reduction in remote communication for PageRank (13$times$ speed-up) compared to traditional FaaS.

Read more

7/22/2024

FunLess: Functions-as-a-Service for Private Edge Cloud Systems
Total Score

0

FunLess: Functions-as-a-Service for Private Edge Cloud Systems

Giuseppe De Palma, Saverio Giallorenzo, Jacopo Mauro, Matteo Trentin, Gianluigi Zavattaro

We present FunLess, a Function-as-a-Service (FaaS) platform tailored for the private edge cloud system. FunLess responds to recent trends that advocate for extending the coverage of serverless computing to private edge cloud systems and enhancing latency, security, and privacy while improving resource usage. Unlike existing solutions that rely on containers for function invocation, FunLess leverages WebAssembly (Wasm) as its runtime environment. Wasm's lightweight, sandboxed runtime is crucial to have functions run on constrained devices at the edge. Moreover, the advantages of using Wasm in FunLess include a consistent development and deployment environment for users and function portability (write once, run everywhere) We validate FunLess under different deployment scenarios, characterised by the presence/absence of constrained-resource devices (Raspberry Pi 3B+) and the (in)accessibility of container orchestration technologies - Kubernetes. We compare FunLess with three production-ready, widely adopted open-source FaaS platforms - OpenFaaS, Fission, and Knative. Our benchmarks confirm that FunLess is a proper solution for FaaS private edge cloud systems since it achieves performance comparable to the considered FaaS alternatives while it is the only fully-deployable alternative on constrained-resource devices, thanks to its small memory footprint.

Read more

6/3/2024

🚀

Total Score

0

FaaSKeeper: Learning from Building Serverless Services with ZooKeeper as an Example

Marcin Copik, Alexandru Calotoiu, Pengyu Zhou, Konstantin Taranov, Torsten Hoefler

FaaS (Function-as-a-Service) revolutionized cloud computing by replacing persistent virtual machines with dynamically allocated resources. This shift trades locality and statefulness for a pay-as-you-go model more suited to variable and infrequent workloads. However, the main challenge is to adapt services to the serverless paradigm while meeting functional, performance, and consistency requirements. In this work, we push the boundaries of FaaS computing by designing a serverless variant of ZooKeeper, a centralized coordination service with a safe and wait-free consensus mechanism. We define synchronization primitives to extend the capabilities of scalable cloud storage and outline a set of requirements for efficient computing with serverless. In FaaSKeeper, the first coordination service built on serverless functions and cloud-native services, we explore the limitations of serverless offerings and propose improvements essential for complex and latency-sensitive applications. We share serverless design lessons based on our experiences of implementing a ZooKeeper model deployable to clouds today. FaaSKeeper maintains the same consistency guarantees and interface as ZooKeeper, with a serverless price model that lowers costs up to 110-719x on infrequent workloads.

Read more

5/2/2024