Nexusflow

Models by this creator

🧠

NexusRaven-V2-13B

Nexusflow

Total Score

417

The NexusRaven-V2-13B is an open-source and commercially viable large language model (LLM) developed by Nexusflow that surpasses the state-of-the-art in function calling capabilities. It is capable of generating single function calls, nested calls, and parallel calls across many challenging cases. The model has been fine-tuned on a large corpus of function calls and can provide detailed explanations for the function calls it generates. Compared to the GPT-4 model, NexusRaven-V2-13B achieves a 7% higher function calling success rate on human-generated use cases involving nested and composite functions. Notably, the model has never been trained on the specific functions used in the evaluation, demonstrating strong generalization to the unseen. The training data for the model does not include any proprietary data from models like GPT-4, giving users full control when deploying it in commercial applications. Model Inputs and Outputs Inputs List of Python functions**: The model accepts a list of Python functions as input. The functions can perform any task, including sending GET/POST requests to external APIs. Function signatures and docstrings**: To enable the model to generate function calls, the input must include the Python function signature and an appropriate docstring. Function arguments**: The model performs best on functions that require arguments, so users should provide functions with arguments. Outputs Function calls**: The primary output of the model is function calls, which can be single, nested, or parallel. Detailed explanations**: The model can also generate detailed explanations for the function calls it produces, though this behavior can be turned off to save tokens during inference. Capabilities The NexusRaven-V2-13B model excels at zero-shot function calling, surpassing the performance of GPT-4 by a significant margin. It can handle a wide range of function call types, from simple single calls to complex nested and parallel calls. The model's ability to generalize to unseen functions is particularly impressive, as it demonstrates its versatility and potential for real-world applications. What Can I Use it For? The NexusRaven-V2-13B model is well-suited for a variety of applications that require function calling capabilities, such as: Automated software development**: The model can be used to assist developers in writing and orchestrating complex software systems by generating function calls on-the-fly. Intelligent virtual assistants**: The model's function calling abilities can be leveraged to build virtual assistants that can perform a wide range of tasks by dynamically calling relevant functions. Data processing and analysis**: The model's function calling capabilities can be used to build pipelines for data processing and analysis, automating complex workflows. Things to Try One interesting thing to try with the NexusRaven-V2-13B model is to provide it with a diverse set of custom functions and observe how it handles the function calling process. You can experiment with different types of functions, including those that interact with external APIs, to see the model's versatility and adaptability. Additionally, you can explore the model's ability to generate detailed explanations for the function calls it produces and how this feature can be leveraged in various applications.

Read more

Updated 5/28/2024

⛏️

Starling-LM-7B-beta

Nexusflow

Total Score

318

Starling-LM-7B-beta is an open large language model (LLM) developed by the Nexusflow team. It is trained using Reinforcement Learning from AI Feedback (RLAIF) and finetuned from the Openchat-3.5-0106 model, which is based on the Mistral-7B-v0.1 model. The model uses the berkeley-nest/Nectar ranking dataset and the Nexusflow/Starling-RM-34B reward model, along with the Fine-Tuning Language Models from Human Preferences (PPO) policy optimization method. This results in an improved score of 8.12 on the MT Bench evaluation with GPT-4 as the judge, compared to the 7.81 score of the original Openchat-3.5-0106 model. Model inputs and outputs Inputs A conversational prompt following the exact chat template provided for the Openchat-3.5-0106 model. Outputs A natural language response to the input prompt. Capabilities Starling-LM-7B-beta is a capable language model that can engage in open-ended conversations, provide informative responses, and assist with a variety of tasks. It has demonstrated strong performance on benchmarks like MT Bench, outperforming several other prominent language models. What can I use it for? Starling-LM-7B-beta can be used for a wide range of applications, such as: Conversational AI**: The model can be used to power chatbots and virtual assistants that engage in natural conversations. Content generation**: The model can be used to generate written content like articles, stories, or scripts. Question answering**: The model can be used to answer questions on a variety of topics. Task assistance**: The model can be used to help with tasks like summarization, translation, and code generation. Things to try One interesting aspect of Starling-LM-7B-beta is its ability to perform well while maintaining a consistent conversational format. By adhering to the prescribed chat template, the model is able to produce coherent and on-topic responses without deviating from the expected structure. This can be particularly useful in applications where a specific interaction style is required, such as in customer service or educational chatbots.

Read more

Updated 5/28/2024

🏷️

Athene-70B

Nexusflow

Total Score

148

Athene-70B is an open-source large language model developed by the Nexusflow team. It is based on the Llama-3-70B-Instruct model and is further trained using reinforcement learning with human feedback (RLHF) to achieve high performance on the Arena-Hard-Auto benchmark, a proxy for the Chatbot Arena. Compared to other open-source and proprietary models, Athene-70B demonstrates strong performance on the Arena-Hard benchmark, scoring 77.8% compared to 79.2% for the proprietary GPT-4o model and 46.6% for the open-source Llama-3-70B model. Model inputs and outputs Inputs Athene-70B takes in text-based conversational prompts, similar to the Llama-3-70B-Instruct model. Outputs The model generates natural language text responses, aiming to be helpful, informative and engaging in conversations. Capabilities Athene-70B is a capable chat model that can handle a variety of conversational tasks. It has been trained to engage in natural dialogue, answer questions, and assist with various information-seeking and task-completion queries. The model demonstrates strong performance on benchmarks that measure a model's ability to provide helpful and relevant responses in a conversational setting. What can I use it for? Athene-70B could be a useful tool for developers and researchers working on conversational AI applications, such as virtual assistants, chatbots, and dialogue systems. The model's strong performance on the Arena-Hard benchmark suggests it may be particularly well-suited for building engaging and user-friendly chat interfaces. Things to try Developers could experiment with Athene-70B in a variety of conversational scenarios, such as customer service, task planning, open-ended discussions, and information lookup. The model's flexibility and strong performance make it an interesting candidate for further exploration and development.

Read more

Updated 8/23/2024

👁️

NexusRaven-13B

Nexusflow

Total Score

97

NexusRaven-13B is an open-source and commercially viable function calling language model developed by Nexusflow that surpasses the state-of-the-art in function calling capabilities. It was fine-tuned from the codellama/CodeLlama-13b-Instruct-hf model. Compared to GPT-4, NexusRaven-13B achieves a 95% success rate in using cybersecurity tools like CVE/CPE Search and VirusTotal, while GPT-4 achieves 64%. It has significantly lower cost and faster inference speed. NexusRaven-13B also generalizes well to tools never seen during training, achieving performance comparable to GPT-3.5 in zero-shot settings, outperforming other open-source LLMs of similar sizes. Model inputs and outputs NexusRaven-13B is a function calling language model that takes in a list of Python functions with their docstrings and generates JSON outputs with the function name and arguments. The model works best when provided with well-documented functions that have arguments, whether required or optional. Inputs Functions**: A list of Python functions with their docstrings User Query**: A prompt for the model to generate a function call response to Outputs Function Call**: A JSON object with the function name and argument values Explanation (optional)**: A detailed explanation of the generated function call Capabilities NexusRaven-13B is capable of generating single function calls, nested calls, and parallel calls in many challenging cases. It can also provide detailed explanations for the function calls it generates, which can be turned off to save tokens during inference. What can I use it for? NexusRaven-13B can be used in a variety of applications that require interacting with APIs or executing functions based on user prompts. For example, you could use it to build a chatbot that can perform web scraping, make API calls, or execute other programmatic tasks on demand. The model's strong performance on cybersecurity tools makes it a promising candidate for building security-focused applications. Things to try One interesting thing to try with NexusRaven-13B is to provide it with a set of functions that interact with external APIs, such as fetching weather data or geolocating a city. You can then prompt the model to generate function calls that combine these capabilities to answer complex user queries, like "What's the weather like in Seattle right now?". The model's ability to chain together function calls and provide detailed explanations can make it a powerful tool for building conversational AI applications.

Read more

Updated 5/28/2024

🔎

Starling-RM-34B

Nexusflow

Total Score

67

The Starling-RM-34B is a reward model trained from the Yi-34B-Chat language model. Following the method of training reward models in the instructGPT paper, the last layer of Yi-34B-Chat was removed and replaced with a linear layer that outputs a scalar for any pair of input prompt and response. The reward model was trained on the berkeley-nest/Nectar preference dataset using the K-wise maximum likelihood estimator proposed in this paper. The reward model produces a scalar score indicating how helpful and non-harmful a given response is, with higher scores for more helpful and less harmful responses. Model inputs and outputs Inputs Prompt: The input text that the model will generate a response for. Response: The candidate response that will be scored by the reward model. Outputs Reward score: A scalar value indicating the helpfulness and lack of harm in the given response. Capabilities The Starling-RM-34B reward model can be used to evaluate the quality and safety of language model outputs. By scoring responses based on their helpfulness and lack of harm, the reward model can help identify potentially harmful or undesirable outputs. This can be particularly useful in the context of Reinforcement Learning from Human Feedback (RLHF), where the reward model is used to provide feedback to an language model during training. What can I use it for? The Starling-RM-34B reward model can be used for a variety of applications, including: Evaluating language model outputs**: By scoring responses based on their helpfulness and lack of harm, the reward model can be used to assess the quality and safety of outputs from large language models. Reinforcement Learning from Human Feedback (RLHF)**: The reward model can be used as part of an RLHF pipeline to provide feedback to a language model during training, helping to align the model's outputs with human preferences. Content moderation**: The reward model can be used to identify potentially harmful or undesirable content, which can be useful for content moderation tasks. Things to try One interesting aspect of the Starling-RM-34B reward model is that it was trained using a preference dataset based on GPT-4 outputs. This means that the model may be biased towards the types of responses and formatting that GPT-4 tends to produce. Researchers and developers could explore how the model's performance and biases change when used with language models other than GPT-4, or when applied to different types of tasks and domains. Additionally, the use of the K-wise maximum likelihood estimator for training the reward model is an interesting technical detail that could be explored further. Researchers could investigate how this training approach compares to other methods for training reward models, and whether it offers any unique advantages or challenges.

Read more

Updated 4/28/2024