An LLM Compiler for Parallel Function Calling

Read original: arXiv:2312.04511 - Published 6/6/2024 by Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

🎯

Overview

Recent large language models (LLMs) can overcome their limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data, by executing external function calls.
This allows LLMs to select and coordinate multiple functions based on the context to tackle more complex problems.
However, current methods for function calling often require sequential reasoning and acting for each function, leading to high latency, cost, and sometimes inaccurate behavior.
To address this, the researchers introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calls.

Plain English Explanation

Large language models (LLMs) are powerful AI systems that can understand and generate human-like text. However, they can have limitations, such as not having up-to-date knowledge or struggling with complex calculations. To overcome these limitations, researchers have found a way for LLMs to call on external functions, like searching the internet or using a calculator.

This allows LLMs to tackle more complex problems by selecting and coordinating multiple functions based on the context. For example, an LLM could use a search function to find relevant information, then a math function to perform calculations, and finally a writing function to generate a summary.

However, the current methods for calling these functions often require the LLM to do them one after the other, which can be slow, expensive, and sometimes lead to inaccurate results. To solve this, the researchers created a new system called LLMCompiler that can execute the functions in parallel, meaning at the same time. This allows the LLM to orchestrate multiple function calls more efficiently, leading to faster results, lower costs, and more accurate outputs.

Technical Explanation

The researchers introduce LLMCompiler, a system that enables parallel function calling for LLMs. Inspired by the principles of classical compilers, LLMCompiler consists of three key components:

Function Calling Planner: This component formulates an execution plan for the function calls, determining the optimal order and parallelization strategy.
Task Fetching Unit: This unit dispatches the function calling tasks to be executed in parallel.
Executor: This component actually executes the tasks in parallel, orchestrating the multiple function calls.

By automatically generating an optimized orchestration for the function calls, LLMCompiler can be used with both open-source and closed-source LLM models to improve their performance. The researchers have benchmarked LLMCompiler on a range of tasks and report consistent latency speedups of up to 3.7x, cost savings of up to 6.7x, and accuracy improvements of up to ~9% compared to the previous state-of-the-art ReAct approach.

Critical Analysis

The researchers have addressed an important challenge in the field of large language models by introducing a system that can efficiently orchestrate multiple function calls in parallel. This is a significant advancement, as it can help LLMs overcome their inherent limitations and tackle more complex problems.

However, the paper does not discuss the limitations of LLMCompiler or potential areas for further research. For example, the system may not work as well with certain types of tasks or function calls, or there may be scenarios where the parallel execution introduces new challenges or bottlenecks.

Additionally, the researchers do not address the potential security and privacy concerns that may arise from allowing LLMs to make arbitrary function calls, which could potentially expose sensitive information or enable malicious behavior. Further research is needed to understand and mitigate these risks.

Overall, the LLMCompiler system represents an important step forward in enhancing the capabilities of large language models, but there are still opportunities for improvement and further exploration of the implications of this technology.

Conclusion

The researchers have developed a novel system called LLMCompiler that enables large language models to efficiently orchestrate multiple function calls in parallel. This allows LLMs to overcome their inherent limitations and tackle more complex problems, leading to significant improvements in latency, cost, and accuracy compared to previous approaches.

While this is an important advancement in the field of large language models, further research is needed to address the potential limitations and security concerns of the system. Nonetheless, the LLMCompiler represents a promising step towards enhancing the capabilities of LLMs and paves the way for more powerful and versatile AI systems that can tackle a wide range of real-world challenges.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🎯

An LLM Compiler for Parallel Function Calling

Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for function calling often require sequential reasoning and acting for each function which can result in high latency, cost, and sometimes inaccurate behavior. To address this, we introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calls. Drawing inspiration from the principles of classical compilers, LLMCompiler enables parallel function calling with three components: (i) a Function Calling Planner, formulating execution plans for function calling; (ii) a Task Fetching Unit, dispatching function calling tasks; and (iii) an Executor, executing these tasks in parallel. LLMCompiler automatically generates an optimized orchestration for the function calls and can be used with both open-source and closed-source models. We have benchmarked LLMCompiler on a range of tasks with different patterns of function calling. We observe consistent latency speedup of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to ~9% compared to ReAct. Our code is available at https://github.com/SqueezeAILab/LLMCompiler.

6/6/2024

An LLM-Tool Compiler for Fused Parallel Function Calling

Simranjit Singh, Andreas Karatzas, Michael Fore, Iraklis Anagnostopoulos, Dimitrios Stamoulis

State-of-the-art sequential reasoning in Large Language Models (LLMs) has expanded the capabilities of Copilots beyond conversational tasks to complex function calling, managing thousands of API calls. However, the tendency of compositional prompting to segment tasks into multiple steps, each requiring a round-trip to the GPT APIs, leads to increased system latency and costs. Although recent advancements in parallel function calling have improved tool execution per API call, they may necessitate more detailed in-context instructions and task breakdown at the prompt level, resulting in higher engineering and production costs. Inspired by the hardware design principles of multiply-add (MAD) operations, which fuse multiple arithmetic operations into a single task from the compiler's perspective, we propose LLM-Tool Compiler, which selectively fuses similar types of tool operations under a single function at runtime, presenting them as a unified task to the LLM. This selective fusion inherently enhances parallelization and efficiency. Benchmarked on a large-scale Copilot platform, LLM-Tool Compiler achieves up to four times more parallel calls than existing methods, reducing token costs and latency by up to 40% and 12%, respectively.

5/29/2024

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu, Zijian Wang, David Lo, Binyuan Hui, Niklas Muennighoff, Daniel Fried, Xiaoning Du, Harm de Vries, Leandro Von Werra

Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires the capability of utilizing diverse function calls as tools to efficiently implement functionalities like data analysis and web development. In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions. Fulfilling both of these characteristics can pose a great challenge for LLMs. To assess how well LLMs can solve challenging and practical programming tasks, we introduce Bench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks. To evaluate LLMs rigorously, each programming task encompasses 5.6 test cases with an average branch coverage of 99%. In addition, we propose a natural-language-oriented variant of Bench, Benchi, that automatically transforms the original docstrings into short instructions only with essential information. Our extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%. The results underscore the need for further advancements in this area.

6/27/2024

➖

Performance-Aligned LLMs for Generating Fast Code

Daniel Nichols, Pranav Polasam, Harshitha Menon, Aniruddha Marathe, Todd Gamblin, Abhinav Bhatele

Optimizing scientific software is a difficult task because codebases are often large and complex, and performance can depend upon several factors including the algorithm, its implementation, and hardware among others. Causes of poor performance can originate from disparate sources and be difficult to diagnose. Recent years have seen a multitude of work that use large language models (LLMs) to assist in software development tasks. However, these tools are trained to model the distribution of code as text, and are not specifically designed to understand performance aspects of code. In this work, we introduce a reinforcement learning based methodology to align the outputs of code LLMs with performance. This allows us to build upon the current code modeling capabilities of LLMs and extend them to generate better performing code. We demonstrate that our fine-tuned model improves the expected speedup of generated code over base models for a set of benchmark tasks from 0.9 to 1.6 for serial code and 1.9 to 4.5 for OpenMP code.

4/30/2024