TinyAgent: Function Calling at the Edge

Read original: arXiv:2409.00608 - Published 9/4/2024 by Lutfi Eren Erdogan, Nicholas Lee, Siddharth Jha, Sehoon Kim, Ryan Tabrizi, Suhong Moon, Coleman Hooper, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami

Overview

The paper introduces TinyAgent, a system for function calling at the edge using large language models (LLMs).
TinyAgent enables LLMs to call external functions and APIs, allowing them to perform complex tasks beyond their training data.
The system uses a lightweight agent architecture to manage the function calling process and handle communication with the external functions.

Plain English Explanation

TinyAgent: Function Calling at the Edge presents a new approach for enhancing the capabilities of large language models (LLMs). LLMs are powerful AI systems trained on vast amounts of data, but they are often limited to the information and skills contained in their training data.

TinyAgent aims to address this limitation by enabling LLMs to call external functions and APIs. This allows the LLMs to perform complex tasks that go beyond their original training, such as accessing real-time data, executing complex computations, or interacting with external systems.

The key idea behind TinyAgent is to use a lightweight "agent" architecture to manage the function calling process. This agent acts as an intermediary between the LLM and the external functions, handling the communication and integration between the two. This allows the LLM to focus on its core language understanding and generation capabilities, while the agent handles the details of invoking the external functions and processing their outputs.

By combining the powerful language understanding of LLMs with the ability to call external functions, TinyAgent opens up new possibilities for these models to be applied in a wide range of real-world scenarios, from data analysis and task automation to interactive applications and specialized problem-solving.

Technical Explanation

TinyAgent: Function Calling at the Edge introduces a system for enabling large language models (LLMs) to call external functions and APIs, a capability known as "function calling."

The key components of the TinyAgent system include:

LLM Integration: TinyAgent is designed to work seamlessly with a variety of LLM architectures, allowing the language model to request the execution of external functions as part of its processing pipeline.
Lightweight Agent: The system employs a lightweight "agent" module that manages the function calling process. This agent handles the communication between the LLM and the external functions, translating requests and processing the resulting outputs.
Function Execution and Integration: TinyAgent supports the execution of a wide range of external functions, including APIs, custom code, and other systems. The agent is responsible for invoking these functions and integrating their outputs back into the LLM's processing.
Security and Sandboxing: To ensure the safety and integrity of the system, TinyAgent incorporates security measures, such as sandboxing and access control, to prevent unauthorized or malicious function calls.

The researchers evaluate the performance of TinyAgent on a variety of benchmark tasks, demonstrating the system's ability to enhance the capabilities of LLMs beyond their original training. The results show that TinyAgent can significantly improve task completion rates and accuracy compared to LLMs operating in isolation.

Critical Analysis

The TinyAgent: Function Calling at the Edge paper presents a promising approach for expanding the capabilities of large language models (LLMs). By enabling LLMs to call external functions and APIs, the system allows these models to perform tasks that go beyond their original training data and capabilities.

One potential limitation of the TinyAgent system is the potential for security and reliability concerns. While the paper mentions security measures like sandboxing, the integration of external functions could introduce new attack vectors or points of failure that need to be carefully managed. The authors should consider conducting more extensive security assessments and providing guidelines for deploying TinyAgent in production environments.

Additionally, the paper does not delve into the potential performance and scalability implications of the TinyAgent architecture. As the number of external functions and the complexity of the tasks increase, the overhead and latency introduced by the agent module could become a bottleneck. Further research is needed to understand the system's performance characteristics and optimize the agent design for different deployment scenarios.

Overall, the TinyAgent system represents an important step forward in enhancing the capabilities of LLMs. By bridging the gap between language understanding and external functionality, the system opens up new possibilities for these models to be applied in a wide range of real-world applications. However, more work is needed to address potential security and scalability concerns to ensure the robustness and reliability of the TinyAgent approach.

Conclusion

TinyAgent: Function Calling at the Edge introduces a novel system for enabling large language models (LLMs) to call external functions and APIs, expanding their capabilities beyond their original training data. The lightweight agent architecture at the core of TinyAgent manages the function calling process, allowing LLMs to seamlessly integrate external functionality into their processing pipeline.

The potential impact of TinyAgent is significant, as it could enable LLMs to be applied in a wide range of real-world scenarios, from data analysis and task automation to specialized problem-solving and interactive applications. By bridging the gap between language understanding and external functionality, TinyAgent represents an important step forward in the evolution of large language models and their ability to tackle complex, multi-faceted problems.

While the paper highlights the promising capabilities of the TinyAgent system, it also raises questions about potential security and scalability concerns that warrant further investigation. As the adoption of TinyAgent and similar function calling approaches grows, ongoing research and robust deployment strategies will be crucial to ensuring the safety, reliability, and performance of these systems in real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

TinyAgent: Function Calling at the Edge

Lutfi Eren Erdogan, Nicholas Lee, Siddharth Jha, Sehoon Kim, Ryan Tabrizi, Suhong Moon, Coleman Hooper, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami

Recent large language models (LLMs) have enabled the development of advanced agentic systems that can integrate various tools and APIs to fulfill user queries through function calling. However, the deployment of these LLMs on the edge has not been explored since they typically require cloud-based infrastructure due to their substantial model size and computational demands. To this end, we present TinyAgent, an end-to-end framework for training and deploying task-specific small language model agents capable of function calling for driving agentic systems at the edge. We first show how to enable accurate function calling for open-source models via the LLMCompiler framework. We then systematically curate a high-quality dataset for function calling, which we use to fine-tune two small language models, TinyAgent-1.1B and 7B. For efficient inference, we introduce a novel tool retrieval method to reduce the input prompt length and utilize quantization to further accelerate the inference speed. As a driving application, we demonstrate a local Siri-like system for Apple's MacBook that can execute user commands through text or voice input. Our results show that our models can achieve, and even surpass, the function-calling capabilities of larger models like GPT-4-Turbo, while being fully deployed at the edge. We open-source our dataset, models, and installable package and provide a demo video for our MacBook assistant agent.

9/4/2024

Octopus v2: On-device language model for super agent

Wei Chen, Zhiyuan Li

Language models have shown effectiveness in a variety of software applications, particularly in tasks related to automatic workflow. These models possess the crucial ability to call functions, which is essential in creating AI agents. Despite the high performance of large-scale language models in cloud environments, they are often associated with concerns over privacy and cost. Current on-device models for function calling face issues with latency and accuracy. Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency, and decrease the context length by 95%. When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold. This method reduces the latency to levels deemed suitable for deployment across a variety of edge devices in production environments, aligning with the performance requisites for real-world applications.

4/17/2024

Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal, Sadhana Kumaravel, Matthew Stallone, Rameswar Panda, Yara Rizk, GP Bhargav, Maxwell Crouse, Chulaka Gunasekara, Shajith Ikbal, Sachin Joshi, Hima Karanam, Vineet Kumar, Asim Munawar, Sumit Neelam, Dinesh Raghu, Udit Sharma, Adriana Meza Soria, Dheeraj Sreedhar, Praveen Venkateswaran, Merve Unuvar, David Cox, Salim Roukos, Luis Lastras, Pavan Kapanipathi

Large language models (LLMs) have recently shown tremendous promise in serving as the backbone to agentic systems, as demonstrated by their performance in multi-faceted, challenging benchmarks like SWE-Bench and Agent-Bench. However, to realize the true potential of LLMs as autonomous agents, they must learn to identify, call, and interact with external tools and application program interfaces (APIs) to complete complex tasks. These tasks together are termed function calling. Endowing LLMs with function calling abilities leads to a myriad of advantages, such as access to current and domain-specific information in databases and knowledge sources, and the ability to outsource tasks that can be reliably performed by tools, e.g., a Python interpreter or calculator. While there has been significant progress in function calling with LLMs, there is still a dearth of open models that perform on par with proprietary LLMs like GPT, Claude, and Gemini. Therefore, in this work, we introduce the GRANITE-20B-FUNCTIONCALLING model under an Apache 2.0 license. The model is trained using a multi-task training approach on seven fundamental tasks encompassed in function calling, those being Nested Function Calling, Function Chaining, Parallel Functions, Function Name Detection, Parameter-Value Pair Detection, Next-Best Function, and Response Generation. We present a comprehensive evaluation on multiple out-of-domain datasets comparing GRANITE-20B-FUNCTIONCALLING to more than 15 other best proprietary and open models. GRANITE-20B-FUNCTIONCALLING provides the best performance among all open models on the Berkeley Function Calling Leaderboard and fourth overall. As a result of the diverse tasks and datasets used for training our model, we show that GRANITE-20B-FUNCTIONCALLING has better generalizability on multiple tasks in seven different evaluation datasets.

7/2/2024

ToolACE: Winning the Points of LLM Function Calling

Weiwen Liu, Xu Huang, Xingshan Zeng, Xinlong Hao, Shuai Yu, Dexun Li, Shuai Wang, Weinan Gan, Zhengying Liu, Yuanqing Yu, Zezhong Wang, Yuxian Wang, Wu Ning, Yutai Hou, Bin Wang, Chuhan Wu, Xinzhi Wang, Yong Liu, Yasheng Wang, Duyu Tang, Dandan Tu, Lifeng Shang, Xin Jiang, Ruiming Tang, Defu Lian, Qun Liu, Enhong Chen

Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, guided by a formalized thinking process. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard, rivaling the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.

9/4/2024