Tangram: High-resolution Video Analytics on Serverless Platform with SLO-aware Batching

Read original: arXiv:2404.09267 - Published 4/16/2024 by Haosong Peng, Yufeng Zhan, Peng Li, Yuanqing Xia

Tangram: High-resolution Video Analytics on Serverless Platform with SLO-aware Batching

Overview

Tangram is a serverless platform for high-resolution video analytics that uses SLO-aware batching to improve performance and reduce costs.
The paper presents the design and evaluation of Tangram, which aims to address the challenges of running large-scale video analytics on serverless infrastructure.
Key features include a batching mechanism that considers service-level objectives (SLOs) to optimize throughput and cost, and techniques to handle high-resolution video data efficiently on serverless platforms.

Plain English Explanation

Tangram is a new system designed to make it easier and cheaper to analyze high-quality video using cloud computing services. Analyzing high-resolution video can be challenging and expensive, but Tangram tries to solve this problem.

The core idea behind Tangram is to batch multiple video analysis tasks together before sending them to the cloud. This batching process is designed to optimize for both performance and cost. Tangram considers the service-level objectives (SLOs) - the target response times that users expect - and tries to group tasks in a way that meets those targets while also minimizing the overall computing costs.

By using this smart batching approach, Tangram can run video analytics more efficiently on serverless cloud platforms compared to traditional methods. Serverless platforms are attractive because you only pay for the computing resources you actually use, but they also come with some unique challenges that Tangram addresses.

Overall, Tangram aims to make high-quality video analysis more accessible and affordable by leveraging the benefits of serverless computing while overcoming its limitations through intelligent task batching.

Technical Explanation

Tangram is designed to address the challenges of running large-scale video analytics on serverless computing platforms. Serverless platforms, such as AWS Lambda or Azure Functions, offer the ability to automatically scale computing resources up and down based on demand, and only charge users for the resources they actually consume.

However, serverless platforms also have some limitations that make them less suitable for video analytics workloads, which tend to have high computational requirements and strict latency constraints. Tangram tackles these challenges through two key innovations:

SLO-aware Batching: Tangram employs a batching mechanism that groups multiple video analysis tasks together before sending them to the serverless platform. The batching process is designed to optimize for both performance (meeting service-level objectives) and cost. Tangram models the tradeoffs between batch size, response time, and computing costs, and uses this model to determine the optimal batch size that satisfies the target SLOs while minimizing the overall computing costs.
High-Resolution Video Handling: To efficiently handle high-resolution video data on serverless platforms, Tangram employs several techniques, including image pyramids to reduce the data size, and language-driven resampling to maintain visual quality while reducing computational requirements.

The paper presents the design and implementation of Tangram, as well as a comprehensive evaluation using real-world video analytics workloads. The results demonstrate that Tangram can improve the throughput of video analytics tasks by up to 2.5x and reduce the computing costs by up to 40% compared to traditional approaches.

Critical Analysis

The Tangram paper presents a well-designed and thoughtful approach to running high-resolution video analytics on serverless platforms. The SLO-aware batching mechanism is a particularly novel and promising contribution, as it addresses a key challenge in balancing performance and cost in serverless environments.

However, the paper could be strengthened by addressing a few potential limitations and areas for further research:

Generalizability: The paper focuses on a specific set of video analytics workloads and may not capture the full diversity of use cases that users might have. It would be valuable to understand how well Tangram's techniques would generalize to other types of video analytics tasks, or even to other data-intensive workloads beyond video.
Real-world Deployment: While the paper presents a thorough evaluation, it would be helpful to see evidence of Tangram being deployed and used in real-world production environments. Understanding the practical challenges and lessons learned from such deployments could provide valuable insights for improving the system.
Adaptability: The current version of Tangram relies on predefined models and thresholds for its SLO-aware batching. It may be worthwhile to explore techniques that allow the system to dynamically adapt to changes in workload characteristics or user requirements over time.

Overall, the Tangram paper presents a compelling approach to addressing an important problem in the field of video analytics. With further research and real-world validation, the techniques developed in this work could have a significant impact on making high-quality video analysis more accessible and cost-effective.

Conclusion

Tangram is a novel serverless platform designed to enable efficient and cost-effective high-resolution video analytics. By employing SLO-aware batching and techniques for handling large video data, Tangram addresses key challenges that have historically made video analytics workloads challenging to run on serverless infrastructure.

The evaluation results presented in the paper demonstrate the effectiveness of Tangram's approach, showing significant improvements in throughput and cost savings compared to traditional methods. While the paper identifies some areas for further research, the core ideas behind Tangram represent an important step forward in making high-quality video analytics more accessible and scalable in the cloud era.

As video data continues to grow in volume and importance across various domains, systems like Tangram will play a crucial role in unlocking the full potential of video-based applications and insights.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Tangram: High-resolution Video Analytics on Serverless Platform with SLO-aware Batching

Haosong Peng, Yufeng Zhan, Peng Li, Yuanqing Xia

Cloud-edge collaborative computing paradigm is a promising solution to high-resolution video analytics systems. The key lies in reducing redundant data and managing fluctuating inference workloads effectively. Previous work has focused on extracting regions of interest (RoIs) from videos and transmitting them to the cloud for processing. However, a naive Infrastructure as a Service (IaaS) resource configuration falls short in handling highly fluctuating workloads, leading to violations of Service Level Objectives (SLOs) and inefficient resource utilization. Besides, these methods neglect the potential benefits of RoIs batching to leverage parallel processing. In this work, we introduce Tangram, an efficient serverless cloud-edge video analytics system fully optimized for both communication and computation. Tangram adaptively aligns the RoIs into patches and transmits them to the scheduler in the cloud. The system employs a unique ``stitching'' method to batch the patches with various sizes from the edge cameras. Additionally, we develop an online SLO-aware batching algorithm that judiciously determines the optimal invoking time of the serverless function. Experiments on our prototype reveal that Tangram can reduce bandwidth consumption and computation cost up to 74.30% and 66.35%, respectively, while maintaining SLO violations within 5% and the accuracy loss negligible.

4/16/2024

🤯

HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions

Jiabin Chen, Fei Xu, Yikun Gu, Li Chen, Fangming Liu, Zhi Zhou

Deep Neural Network (DNN) inference on serverless functions is gaining prominence due to its potential for substantial budget savings. Existing works on serverless DNN inference solely optimize batching requests from one application with a single Service Level Objective (SLO) on CPU functions. However, production serverless DNN inference traces indicate that the request arrival rate of applications is surprisingly low, which inevitably causes a long batching time and SLO violations. Hence, there is an urgent need for batching multiple DNN inference requests with diverse SLOs (i.e., multi-SLO DNN inference) in serverless platforms. Moreover, the potential performance and cost benefits of deploying heterogeneous (i.e., CPU and GPU) functions for DNN inference have received scant attention. In this paper, we present HarmonyBatch, a cost-efficient resource provisioning framework designed to achieve predictable performance for multi-SLO DNN inference with heterogeneous serverless functions. Specifically, we construct an analytical performance and cost model of DNN inference on both CPU and GPU functions, by explicitly considering the GPU time-slicing scheduling mechanism and request arrival rate distribution. Based on such a model, we devise a two-stage merging strategy in HarmonyBatch to judiciously batch the multi-SLO DNN inference requests into application groups. It aims to minimize the budget of function provisioning for each application group while guaranteeing diverse performance SLOs of inference applications. We have implemented a prototype of HarmonyBatch on Alibaba Cloud Function Compute. Extensive prototype experiments with representative DNN inference workloads demonstrate that HarmonyBatch can provide predictable performance to serverless DNN inference workloads while reducing the monetary cost by up to 82.9% compared to the state-of-the-art methods.

5/10/2024

Tangram: A Challenging Benchmark for Geometric Element Recognizing

Jiamin Tang, Chao Zhang, Xudong Zhu, Mengchi Liu

Significant advancements in Large Multimodal Models (LMMs) have enabled them to tackle complex problems involving visual-mathematical reasoning. However, their ability to identify geometric elements remains understudied. To bridge this gap, we introduce Tangram, a novel benchmark designed to evaluate the performance of LMMs on geometric element recognition. Tangram includes 1,080 diverse geometric diagrams sourced from primary and secondary school exams, competitions, and textbooks, covering from simple basic geometric shapes to complex combinations. Each diagram is associated with four questions, resulting in a total of 4,320 visual-question-answer pairs. Unlike existing benchmarks that seek higher-level cognition and reasoning, Tangram focuses on the understanding of geometric elements, requiring models to perform a simple but interesting counting task. Systematic evaluation of 10 prominent LMMs, such as GPT-4o and Claude 3.5 Sonnet, shows that even in the seemingly simple task, these models still face significant challenges. Notably, the overall accuracy of the top performer across all tested models is only 56.8%, marking a significant gap when compared to human performance. These findings highlight the limitations of current multimodal artificial intelligence systems in handling basic perception tasks, and will inspire the development of the next generation of expert-level multimodal foundational models. The Tangram and evaluation code will be available soon.

8/27/2024

Palantir: Towards Efficient Super Resolution for Ultra-high-definition Live Streaming

Xinqi Jin, Zhui Zhu, Xikai Sun, Fan Dang, Jiangchuan Liu, Jingao Xu, Kebin Liu, Xinlei Chen, Yunhao Liu

Neural enhancement through super-resolution (SR) deep neural networks (DNNs) opens up new possibilities for ultra-high-definition (UHD) live streaming over existing encoding and networking infrastructure. Yet, the heavy SR DNN inference overhead leads to severe deployment challenges. To reduce the overhead, existing systems propose to apply DNN-based SR only on carefully selected anchor frames while upscaling non-anchor frames via the lightweight reusing-based SR approach. However, frame-level scheduling is coarse-grained and fails to deliver optimal efficiency. In this work, we propose Palantir, the first neural-enhanced UHD live streaming system with fine-grained patch-level scheduling. Two novel techniques are incorporated into Palantir to select the most beneficial anchor patches and support latency-sensitive UHD live streaming applications. Firstly, under the guidance of our pioneering and theoretical analysis, Palantir constructs a directed acyclic graph (DAG) for lightweight yet accurate SR quality estimation under any possible anchor patch set. Secondly, to further optimize the scheduling latency, Palantir improves parallelizability by refactoring the computation subprocedure of the estimation process into a sparse matrix-matrix multiplication operation. The evaluation results suggest that Palantir incurs a negligible scheduling latency accounting for less than 5.7% of the end-to-end latency requirement. When compared to the naive method of applying DNN-based SR on all the frames, Palantir can reduce the SR DNN inference overhead by 20 times (or 60 times) while preserving 54.0-82.6% (or 32.8-64.0%) of the quality gain. When compared to the state-of-the-art real-time frame-level scheduling strategy, Palantir can reduce the SR DNN inference overhead by 80.1% at most (and 38.4% on average) without sacrificing the video quality.

9/4/2024