Mlabonne

Models by this creator

🔄

phixtral-4x2_8

mlabonne

Total Score

204

The phixtral-4x2_8 is a Mixture of Experts (MoE) model made with four microsoft/phi-2 models, inspired by the mistralai/Mixtral-8x7B-v0.1 architecture. This model performs better than each individual expert. Model inputs and outputs The phixtral-4x2_8 model takes text inputs and generates text outputs. It is a generative language model capable of producing coherent and contextual responses to prompts. Inputs Text prompts that the model can use to generate relevant and meaningful output. Outputs Coherent and contextual text responses generated based on the input prompts. Capabilities The phixtral-4x2_8 model demonstrates improved performance compared to individual models like dolphin-2_6-phi-2, phi-2-dpo, and phi-2-coder on various benchmarks such as AGIEval, GPT4All, TruthfulQA, and Bigbench. What can I use it for? The phixtral-4x2_8 model can be used for a variety of text-to-text tasks, such as: General language understanding and generation Question answering Summarization Code generation Creative writing Its strong performance on various benchmarks suggests it could be a capable model for many natural language processing applications. Things to try You can try fine-tuning the phixtral-4x2_8 model on specific datasets or tasks to further improve its performance for your use case. The model's modular nature, with multiple experts, also provides an opportunity to explore different expert configurations and observe their impact on the model's capabilities.

Read more

Updated 5/28/2024

🎲

Meta-Llama-3-120B-Instruct

mlabonne

Total Score

182

Meta-Llama-3-120B-Instruct is a large language model created by Meta that builds upon the Meta-Llama-3-70B-Instruct model. It was inspired by other large language models like alpindale/goliath-120b, nsfwthrowitaway69/Venus-120b-v1.0, cognitivecomputations/MegaDolphin-120b, and wolfram/miquliz-120b-v2.0. The model was developed and released by mlabonne at Meta. Model inputs and outputs Inputs Text**: The model takes text as input and generates text in response. Outputs Text**: The model outputs generated text based on the input. Capabilities Meta-Llama-3-120B-Instruct is particularly well-suited for creative writing tasks. It uses the Llama 3 chat template with a default context window of 8K tokens that can be extended. The model generally has a strong writing style but can sometimes output typos and relies heavily on uppercase. What can I use it for? This model is recommended for creative writing projects. It outperforms many open-source chat models on common benchmarks, though it may struggle in tasks outside of creative writing compared to more specialized models like GPT-4. Developers should test the model thoroughly for their specific use case and consider incorporating safety tools like Llama Guard to mitigate risks. Things to try Try using this model to generate creative fiction, poetry, or other imaginative text. Experiment with different temperature and top-p settings to find the right balance of creativity and coherence. You can also try fine-tuning the model on your own dataset to adapt it for your specific needs.

Read more

Updated 6/4/2024

⚙️

NeuralBeagle14-7B

mlabonne

Total Score

151

The NeuralBeagle14-7B is a 7B parameter language model developed by mlabonne that is based on a merge of several large language models, including fblgit/UNA-TheBeagle-7b-v1 and argilla/distilabeled-Marcoro14-7B-slerp. It was fine-tuned using the argilla/distilabel-intel-orca-dpo-pairs dataset and Direct Preference Optimization (DPO). This model is claimed to be one of the best performing 7B models available. Model inputs and outputs Inputs Text inputs of up to 8,192 tokens Outputs Fluent text outputs generated in response to the input Capabilities The NeuralBeagle14-7B model demonstrates strong performance on instruction following and reasoning tasks compared to other 7B language models. It can also be used for roleplaying and storytelling. What can I use it for? The NeuralBeagle14-7B model can be used for a variety of text-to-text tasks, such as language generation, question answering, and text summarization. Its capabilities make it well-suited for applications like interactive storytelling, virtual assistants, and educational tools. Things to try You can experiment with the NeuralBeagle14-7B model by using it to generate creative fiction, engage in open-ended conversations, or tackle challenging reasoning problems. Its strong performance on instruction following and reasoning tasks suggests it may be a useful tool for developing advanced language applications.

Read more

Updated 5/28/2024

🔍

NeuralHermes-2.5-Mistral-7B

mlabonne

Total Score

148

The NeuralHermes-2.5-Mistral-7B model is a fine-tuned version of the OpenHermes-2.5-Mistral-7B model. It was developed by mlabonne and further trained using Direct Preference Optimization (DPO) on the mlabonne/chatml_dpo_pairs dataset. The model surpasses the original OpenHermes-2.5-Mistral-7B on most benchmarks, ranking as one of the best 7B models on the Open LLM leaderboard. Model inputs and outputs The NeuralHermes-2.5-Mistral-7B model is a text-to-text model that can be used for a variety of natural language processing tasks. It accepts text input and generates relevant text output. Inputs Text**: The model takes in text-based input, such as prompts, questions, or instructions. Outputs Text**: The model generates text-based output, such as responses, answers, or completions. Capabilities The NeuralHermes-2.5-Mistral-7B model has demonstrated strong performance on a range of tasks, including instruction following, reasoning, and question answering. It can engage in open-ended conversations, provide creative responses, and assist with tasks like writing, analysis, and code generation. What can I use it for? The NeuralHermes-2.5-Mistral-7B model can be useful for a wide range of applications, such as: Conversational AI**: Develop chatbots and virtual assistants that can engage in natural language interactions. Content Generation**: Create text-based content, such as articles, stories, or product descriptions. Task Assistance**: Provide support for tasks like research, analysis, code generation, and problem-solving. Educational Applications**: Develop interactive learning tools and tutoring systems. Things to try One interesting thing to try with the NeuralHermes-2.5-Mistral-7B model is to use the provided quantized models to explore the model's capabilities on different hardware setups. The quantized versions can be deployed on a wider range of devices, making the model more accessible for a variety of use cases.

Read more

Updated 5/28/2024

👁️

phixtral-2x2_8

mlabonne

Total Score

145

phixtral-2x2_8 is a Mixture of Experts (MoE) model made with two microsoft/phi-2 models, inspired by the mistralai/Mixtral-8x7B-v0.1 architecture. It performs better than each individual expert model. The model was created by mlabonne. Another similar MoE model is the phixtral-4x2_8, which uses four microsoft/phi-2 models instead of two. Model inputs and outputs phixtral-2x2_8 is a text-to-text model that can handle a variety of input formats, including question-answering, chatbot, and code generation. The model takes in raw text prompts and generates relevant output text. Inputs Free-form text prompts for tasks like: Question-answering (e.g. "What is a Fermi paradox?") Chatbot conversations (e.g. "I'm struggling to focus while studying. Any suggestions?") Code generation (e.g. "def print_prime(n):") Outputs Relevant text responses to the input prompts, ranging from short answers to longer generated text. Capabilities The phixtral-2x2_8 model has shown strong performance on benchmarks like AGIEval, GPT4All, TruthfulQA, and Bigbench, outperforming its individual expert models. It demonstrates capabilities in areas like language understanding, logical reasoning, and code generation. What can I use it for? Given its diverse capabilities, phixtral-2x2_8 could be useful for a variety of applications, such as: Building chatbots or virtual assistants that can engage in open-ended conversations Developing question-answering systems for educational or research purposes Automating code generation for prototyping or productivity tasks Things to try Some interesting things to explore with phixtral-2x2_8 could include: Experimenting with different prompting techniques to see how the model responds Comparing the model's performance to other language models on specific tasks Investigating ways to further fine-tune or adapt the model for specialized use cases Overall, phixtral-2x2_8 is a capable and versatile model that could be a valuable tool for researchers and developers working on a variety of natural language processing and generation projects.

Read more

Updated 5/28/2024

👨‍🏫

AlphaMonarch-7B

mlabonne

Total Score

145

AlphaMonarch-7B is a new DPO fine-tuned model based on a merge of several other models like NeuralMonarch-7B, OmniTruthyBeagle-7B-v0, NeuBeagle-7B, and NeuralOmniBeagle-7B. The model was trained using the argilla/OpenHermes2.5-dpo-binarized-alpha preference dataset. It is maintained by mlabonne. Model inputs and outputs AlphaMonarch-7B is a text-to-text AI model that can generate responses to a wide variety of prompts. It uses a context window of 8,000 tokens, making it well-suited for conversational tasks. Inputs Text prompts of up to 8,000 tokens Outputs Coherent, contextual text responses Capabilities The model displays strong reasoning and instruction-following abilities, making it well-suited for tasks like conversations, roleplaying, and storytelling. It has a formal and sophisticated writing style, though this can be adjusted by modifying the prompt. What can I use it for? AlphaMonarch-7B is recommended for use with the Mistral Instruct chat template, which works well with the model's capabilities. It can be used for a variety of applications, such as: Open-ended conversations Roleplaying and creative writing Answering questions and following instructions Things to try Since AlphaMonarch-7B has a large context window, it can be particularly useful for tasks that require long-form reasoning or generation, such as: Engaging in multi-turn dialogues and maintaining context Generating longer pieces of text, like stories or reports Answering complex questions that require synthesizing information Additionally, the model's formal and sophisticated style can be an interesting contrast to explore in creative writing or roleplaying scenarios.

Read more

Updated 5/28/2024

🛸

Beyonder-4x7B-v2

mlabonne

Total Score

120

The Beyonder-4x7B-v2 is a Mixture of Experts (MoE) model created by mlabonne using the mergekit tool. It combines four base models: openchat/openchat-3.5-1210, beowolx/CodeNinja-1.0-OpenChat-7B, maywell/PiVoT-0.1-Starling-LM-RP, and WizardLM/WizardMath-7B-V1.1. This MoE architecture enables the model to leverage the strengths of these diverse base models, potentially leading to improved capabilities. Model inputs and outputs Inputs The recommended context length for Beyonder-4x7B-v2 is 8k. Outputs The model can generate natural language responses based on the provided input. Capabilities The Beyonder-4x7B-v2 model displays competitive performance on the Open LLM Leaderboard compared to the larger 8-expert Mixtral-8x7B-Instruct-v0.1 model, despite only having 4 experts. It also shows significant improvements over the individual expert models. Additionally, the Beyonder-4x7B-v2 performs very well on the Nous benchmark suite, coming close to the performance of the much larger 34B parameter Yi-34B fine-tuned model, while only using around 12B parameters. What can I use it for? The Beyonder-4x7B-v2 model can be used for a variety of natural language processing tasks, such as open-ended conversation, question answering, and task completion. Its strong performance on the Nous benchmark suggests it may be particularly well-suited for instruction following and reasoning tasks. Things to try Experiment with the model's capabilities by prompting it to complete a wide range of tasks, from creative writing to analytical problem-solving. Pay attention to how it handles different types of inputs and whether its responses demonstrate strong reasoning and language understanding abilities.

Read more

Updated 5/28/2024

🔎

NeuralDaredevil-8B-abliterated

mlabonne

Total Score

105

NeuralDaredevil-8B-abliterated is a DPO fine-tune of mlabonne/Daredevil-8-abliterated, trained on one epoch of mlabonne/orpo-dpo-mix-40k. The DPO fine-tuning successfully recovers the performance loss due to the abliteration process, making it an excellent uncensored model. Model inputs and outputs Inputs Text prompts Outputs Uncensored text generation Capabilities The NeuralDaredevil-8B-abliterated model performs better than the Instruct model on tests and can be used for applications that don't require alignment, like role-playing. What can I use it for? You can use NeuralDaredevil-8B-abliterated for any application that doesn't require alignment, like role-playing. The model has been tested on LM Studio using the "Llama 3" preset. Things to try Thanks to QuantFactory, Zoyd, and solidrust, there are several quantized versions of the NeuralDaredevil-8B-abliterated model available, including GGUF, EXL2, and AWQ formats.

Read more

Updated 7/8/2024

🧪

Meta-Llama-3.1-8B-Instruct-abliterated

mlabonne

Total Score

94

The Meta-Llama-3.1-8B-Instruct-abliterated is an uncensored version of the Llama 3.1 8B Instruct model created by mlabonne using a technique called "abliteration" (see this article for more details). This model was built on top of the original Llama 3.1 8B Instruct model released by Meta. It uses the same architecture and training data as the original, but with the content filtering and safety constraints removed, resulting in an "uncensored" language model. Similar models like the Meta-Llama-3-8B-Instruct-GGUF and the Meta-Llama-3-70B-Instruct-GGUF have also been created by the community, often with quantization techniques applied to optimize the model size and inference speed. Model inputs and outputs Inputs The Meta-Llama-3.1-8B-Instruct-abliterated model takes in text as input. Outputs The model generates text as output, which can include natural language, code, and other types of content. Capabilities The Meta-Llama-3.1-8B-Instruct-abliterated model has a wide range of capabilities, including natural language generation, question answering, summarization, and even code generation. As an uncensored version of the Llama 3.1 8B Instruct model, it is not constrained by the same safety and content filtering mechanisms, allowing it to generate a broader range of content. What can I use it for? Given its unconstrained nature, the Meta-Llama-3.1-8B-Instruct-abliterated model could be useful for a variety of applications where the user is looking for more open-ended and less filtered responses, such as creative writing, research, and exploratory analysis. However, it's important to note that the lack of safety constraints also means the model may generate potentially offensive or harmful content, so it should be used with caution and appropriate safeguards. Things to try One interesting thing to try with the Meta-Llama-3.1-8B-Instruct-abliterated model is to explore the boundaries of its capabilities by providing it with prompts that push the limits of its training, such as requests for very long-form content, highly technical or specialized topics, or tasks that require strong reasoning and inference skills. This can help uncover the model's strengths and limitations, as well as potential areas for further development and refinement.

Read more

Updated 8/29/2024

⚙️

Meta-Llama-3.1-8B-Instruct-abliterated-GGUF

mlabonne

Total Score

91

Meta-Llama-3.1-8B-Instruct-abliterated is an uncensored version of the Llama 3.1 8B Instruct model created by mlabonne using a technique called "abliteration". This model was developed as a collaboration with FailSpy, who provided the original code and technique. Meta-Llama-3.1-8B-Instruct-abliterated is larger and more capable than the original Llama 2 models, with 8 billion parameters and pretraining on over 15 trillion tokens of data. Similar models include the Meta-Llama-3-8B-Instruct-GGUF and Meta-Llama-3-120B-Instruct, which are quantized and merged versions of the original Llama 3 models respectively. Model inputs and outputs Inputs Text data, such as prompts, instructions, or conversation history Outputs Generated text, including responses, continuations, and completions Capabilities Meta-Llama-3.1-8B-Instruct-abliterated is a powerful language model capable of a wide range of text generation tasks. It excels at task-oriented dialogue, with the ability to follow instructions and provide helpful, coherent responses. The model also demonstrates strong capabilities in areas like creative writing, open-ended conversation, and code generation. What can I use it for? You can use Meta-Llama-3.1-8B-Instruct-abliterated for a variety of applications that involve natural language processing and generation. Some potential use cases include: Building interactive chatbots or virtual assistants Generating creative writing, stories, or scripts Providing code completion and generation assistance Summarizing or paraphrasing text Engaging in open-ended conversations on a wide range of topics The model's capabilities make it well-suited for commercial and research applications that require fluent, coherent language generation. Things to try One interesting aspect of Meta-Llama-3.1-8B-Instruct-abliterated is its ability to generate text in diverse styles and tones. Try providing the model with different system prompts or persona descriptions to see how it can adapt its language and personality to match the given context. For example, you could try instructing the model to respond as a pirate, a scientist, or a historical figure, and observe how it adjusts its vocabulary, syntax, and tone accordingly. Another interesting experiment would be to explore the model's capabilities in code generation and programming tasks. Provide the model with programming prompts or problem statements and see how it can generate relevant code snippets or solutions. This could be a useful tool for developers looking to streamline their coding workflow.

Read more

Updated 9/4/2024