Octopus v4: Graph of language models

2404.19296

Published 5/1/2024 by Wei Chen, Zhiyuan Li

💬

Abstract

Language models have been effective in a wide range of applications, yet the most sophisticated models are often proprietary. For example, GPT-4 by OpenAI and various models by Anthropic are expensive and consume substantial energy. In contrast, the open-source community has produced competitive models, like Llama3. Furthermore, niche-specific smaller language models, such as those tailored for legal, medical or financial tasks, have outperformed their proprietary counterparts. This paper introduces a novel approach that employs textit{functional tokens} to integrate textbf{multiple open-source models}, each optimized for particular tasks. Our newly developed Octopus v4 model leverages textit{functional tokens} to intelligently direct user queries to the most appropriate vertical model and reformat the query to achieve the best performance. Octopus v4, an evolution of the Octopus v1, v2, and v3 models, excels in selection and parameter understanding and reformatting. Additionally, we explore the use of graph as a versatile data structure that effectively coordinates multiple open-source models by harnessing the capabilities of the Octopus model and textit{functional tokens}. Use our open-sourced GitHub (url{https://www.nexa4ai.com/}) to try Octopus v4 models (url{https://huggingface.co/NexaAIDev/Octopus-v4}), and contrite to a larger graph of language models. By activating models less than 10B parameters, we achieved SOTA MMLU score of 74.8 among the same level models.

Get summaries of the top AI research delivered straight to your inbox:

Overview

Language models have become widely used, but the most advanced models are often proprietary and resource-intensive
Open-source language models like Llama3 can be competitive, and niche-specific models may outperform proprietary ones
This paper introduces a novel approach using "functional tokens" to integrate multiple open-source models, each optimized for particular tasks
The Octopus v4 model leverages functional tokens to direct queries to the most appropriate vertical model and reformat the query for best performance
Octopus v4 builds on previous Octopus models (v1, v2, v3) and excels at selection, parameter understanding, and reformatting
The paper also explores using a graph data structure to coordinate multiple open-source models through the Octopus model and functional tokens

Plain English Explanation

Language models are AI systems that can understand and generate human-like text. The most advanced language models, like GPT-4 from OpenAI and models from Anthropic, are often proprietary and expensive to use, requiring a lot of computing power and energy.

In contrast, the open-source community has developed competitive language models, like Llama3, that are freely available. Interestingly, smaller language models tailored for specific tasks, such as legal, medical, or financial applications, can sometimes outperform their larger, proprietary counterparts.

This paper introduces a new approach called Octopus v4 that aims to take advantage of multiple open-source language models. Octopus v4 uses "functional tokens" to intelligently route user queries to the most appropriate specialized model and reformats the query to get the best possible result.

Octopus v4 is an evolution of the previous Octopus models (v1, v2, v3) and is particularly good at selecting the right model, understanding the user's intent, and reformatting the query. The paper also explores using a graph data structure to coordinate the different open-source models through the Octopus system and functional tokens.

By leveraging multiple smaller, open-source models instead of a single large proprietary one, the researchers were able to achieve state-of-the-art performance on a benchmark test while using less computing power.

Technical Explanation

The paper introduces a novel approach that employs "functional tokens" to integrate multiple open-source language models, each optimized for particular tasks. The researchers developed the Octopus v4 model, which builds on the previous Octopus v1, v2, and v3 models (Octopus v1, Octopus v2, Octopus v3).

Octopus v4 leverages functional tokens to intelligently direct user queries to the most appropriate vertical model and reformat the query to achieve the best performance. The model excels at selection, parameter understanding, and reformatting, allowing it to effectively coordinate multiple open-source models.

The researchers also explore the use of a graph as a versatile data structure to coordinate the capabilities of the Octopus model and functional tokens, harnessing the capabilities of various open-source models. By activating models with less than 10 billion parameters, the Octopus v4 system achieved a state-of-the-art MMLU (Multimodel Language Understanding) score of 74.8 among models of the same size.

Critical Analysis

The paper presents a promising approach to leveraging multiple open-source language models to achieve state-of-the-art performance while using less computing power than large proprietary models. The use of functional tokens to intelligently route queries and reformat them is a novel and potentially effective strategy.

However, the paper does not provide extensive details on the architectural details or training process of the Octopus v4 model, making it difficult to fully assess the technical merits of the approach. Additionally, the paper does not address potential challenges or limitations, such as the scalability of the graph-based coordination system or the performance of the Octopus model on a broader range of tasks beyond the MMLU benchmark.

Further research and empirical evaluation would be needed to better understand the strengths, weaknesses, and broader applicability of the Octopus v4 model and the functional token-based approach to integrating multiple open-source language models.

Conclusion

This paper presents a novel approach to leveraging multiple open-source language models through the use of "functional tokens" and the Octopus v4 model. By intelligently routing queries to specialized models and reformatting them for optimal performance, the Octopus v4 system was able to achieve state-of-the-art results on a benchmark test while using less computing power than large proprietary models.

The exploration of a graph-based data structure to coordinate the different models is also an interesting direction that could have broader implications for the field of language model integration and optimization. While the technical details are not fully fleshed out, the paper introduces a promising approach that could pave the way for more efficient and accessible language models in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Octopus: On-device language model for function calling of software APIs

Wei Chen, Zhiyuan Li, Mingyuan Ma

In the rapidly evolving domain of artificial intelligence, Large Language Models (LLMs) play a crucial role due to their advanced text processing and generation abilities. This study introduces a new strategy aimed at harnessing on-device LLMs in invoking software APIs. We meticulously compile a dataset derived from software API documentation and apply fine-tuning to LLMs with capacities of 2B, 3B and 7B parameters, specifically to enhance their proficiency in software API interactions. Our approach concentrates on refining the models' grasp of API structures and syntax, significantly enhancing the accuracy of API function calls. Additionally, we propose textit{conditional masking} techniques to ensure outputs in the desired formats and reduce error rates while maintaining inference speeds. We also propose a novel benchmark designed to evaluate the effectiveness of LLMs in API interactions, establishing a foundation for subsequent research. Octopus, the fine-tuned model, is proved to have better performance than GPT-4 for the software APIs calling. This research aims to advance automated software development and API integration, representing substantial progress in aligning LLM capabilities with the demands of practical software engineering applications.

4/3/2024

cs.CL cs.SE

Octopus v2: On-device language model for super agent

Wei Chen, Zhiyuan Li

Language models have shown effectiveness in a variety of software applications, particularly in tasks related to automatic workflow. These models possess the crucial ability to call functions, which is essential in creating AI agents. Despite the high performance of large-scale language models in cloud environments, they are often associated with concerns over privacy and cost. Current on-device models for function calling face issues with latency and accuracy. Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency, and decrease the context length by 95%. When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold. This method reduces the latency to levels deemed suitable for deployment across a variety of edge devices in production environments, aligning with the performance requisites for real-world applications.

4/17/2024

cs.CL

Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent

Wei Chen, Zhiyuan Li

A multimodal AI agent is characterized by its ability to process and learn from various types of data, including natural language, visual, and audio inputs, to inform its actions. Despite advancements in large language models that incorporate visual data, such as GPT-4V, effectively translating image-based data into actionable outcomes for AI agents continues to be challenging. In this paper, we introduce a multimodal model that incorporates the concept of functional token specifically designed for AI agent applications. To ensure compatibility with edge devices, our model is optimized to a compact size of less than 1B parameters. Like GPT-4, our model can process both English and Chinese. We demonstrate that this model is capable of operating efficiently on a wide range of edge devices, including as constrained as a Raspberry Pi.

4/19/2024

cs.CL cs.CV

💬

Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge

Charles Koutcheme, Nicola Dainese, Sami Sarsa, Arto Hellas, Juho Leinonen, Paul Denny

Large language models (LLMs) have shown great potential for the automatic generation of feedback in a wide range of computing contexts. However, concerns have been voiced around the privacy and ethical implications of sending student work to proprietary models. This has sparked considerable interest in the use of open source LLMs in education, but the quality of the feedback that such open models can produce remains understudied. This is a concern as providing flawed or misleading generated feedback could be detrimental to student learning. Inspired by recent work that has utilised very powerful LLMs, such as GPT-4, to evaluate the outputs produced by less powerful models, we conduct an automated analysis of the quality of the feedback produced by several open source models using a dataset from an introductory programming course. First, we investigate the viability of employing GPT-4 as an automated evaluator by comparing its evaluations with those of a human expert. We observe that GPT-4 demonstrates a bias toward positively rating feedback while exhibiting moderate agreement with human raters, showcasing its potential as a feedback evaluator. Second, we explore the quality of feedback generated by several leading open-source LLMs by using GPT-4 to evaluate the feedback. We find that some models offer competitive performance with popular proprietary LLMs, such as ChatGPT, indicating opportunities for their responsible use in educational settings.

5/9/2024

cs.CL cs.AI cs.CY