MN-12B-Lyra-v1

Maintainer: Sao10K

Last updated 9/18/2024

🏋️

Property	Value
Run this model	Run on HuggingFace
API spec	View on HuggingFace
Github link	No Github link provided
Paper link	No paper link provided

Create account to get full access

Model overview

The MN-12B-Lyra-v1 is an experimental general roleplaying model developed by Sao10K. It is a merge of two different Mistral-Nemo 12B models, one focused on instruction-following and the other on roleplay and creative writing. The model scored well on the EQ-Bench, ranking just below the Nemomix v4 model. Sao10K found that a temperature of 1.2 and a minimum probability of 0.1 works well for this model, though they also note that it can perform well at lower temperatures.

The model was created by merging two differently formatted training datasets - one on Mistral Instruct and one on ChatML. Sao10K found that keeping the datasets separate and using the della_linear merge method worked best, as opposed to mixing the datasets together. They also note that the base Nemo 12B model was difficult to train on their datasets, and that they would likely need to do some stage-wise fine-tuning in the future.

Model inputs and outputs

Inputs

Either [INST] or ChatML input formats work well for this model.

Outputs

The MN-12B-Lyra-v1 model generates text outputs in a general roleplaying and creative writing style.

Capabilities

The MN-12B-Lyra-v1 model excels at general roleplaying tasks, with good performance on the EQ-Bench. Sao10K notes that the model can handle a context length of up to 16K tokens, which is sufficient for most roleplaying use cases.

What can I use it for?

The MN-12B-Lyra-v1 model would be well-suited for creative writing, storytelling, and roleplaying applications. Its ability to generate coherent and engaging text could make it useful for applications like interactive fiction, collaborative worldbuilding, or even as a foundation for more advanced AI-driven narratives.

Things to try

One interesting aspect of the MN-12B-Lyra-v1 model is Sao10K's observation that the base Nemo 12B model was difficult to train on their datasets, and that they would likely need to do some stage-wise fine-tuning in the future. This suggests that the model may benefit from a more iterative or multi-stage training process to optimize its performance on specific types of tasks or datasets.

Sao10K also notes that the model's effective context length of 16K tokens may be a limitation for some applications, and that they are working on further iterations to improve upon this. Trying the model with longer context lengths or more advanced prompt engineering techniques could be an interesting area of exploration.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

📉

MN-12B-Celeste-V1.9

nothingiisreal

The MN-12B-Celeste-V1.9 is a story writing and roleplaying AI model developed by nothingiisreal on the Mistral Nemo 12B Instruct base model. It has been trained on a variety of datasets including Reddit Writing Prompts, Kalo's Opus 25K Instruct, and c2 logs to improve its NSFW handling, narration, and use of ChatML tokens. The model is available in several variations, including Dynamic by Auri, EXL2 models by Kingbri, and GGUF models with static and IMatrix quantizations. There are also several API endpoints available, including Featherless, Infermatic, and OpenRouter. Model Inputs and Outputs Inputs Text prompts for creative writing and roleplaying scenarios Outputs Coherent and engaging story continuations and responses to roleplaying prompts Capabilities The MN-12B-Celeste-V1.9 model excels at creative writing and roleplaying tasks. It can generate immersive narratives, develop complex characters, and respond to open-ended prompts with vivid and imaginative prose. The model's improved NSFW handling and active narration make it well-suited for writing stories with mature themes or fantasy/sci-fi settings. What can I use it for? The MN-12B-Celeste-V1.9 model is a great tool for authors, game developers, and creative writers looking to generate inspirational story ideas or expand on existing narratives. It could be used to brainstorm plots, flesh out characters, or generate content for interactive fiction or tabletop roleplaying games. In a professional setting, this model could be leveraged to produce marketing copy, product descriptions, or other creative business content. Its ability to generate engaging and varied text makes it a potentially valuable asset for companies looking to enhance their digital presence and connect with customers in a more compelling way. Things to try One interesting aspect of the MN-12B-Celeste-V1.9 model is its ability to maintain coherence and continuity over multiple turns of a roleplaying scenario. Try engaging the model in an extended back-and-forth conversation, exploring different narrative arcs or character interactions. The model's improved narration and NSFW handling may also make it suitable for crafting more mature or fantastical stories. Additionally, consider experimenting with the provided sampling settings, as the "Stable" and "Creative" options may result in different styles of output that could be suited to different writing tasks or creative goals.

Updated Invalid Date

Text-to-Text

📊

Mistral-Nemo-Base-2407

mistralai

232

The Mistral-Nemo-Base-2407 is a 12 billion parameter Large Language Model (LLM) jointly developed by Mistral AI and NVIDIA. It significantly outperforms existing models of similar size, thanks to its large training dataset that includes a high proportion of multilingual and code data. The model is released under the Apache 2 License and offers both pre-trained and instructed versions. Compared to similar models from Mistral, such as the Mistral-7B-v0.1 and Mistral-7B-v0.3, the Mistral-Nemo-Base-2407 has more than 12 billion parameters and a larger 128k context window. It also incorporates architectural choices like Grouped-Query Attention, Sliding-Window Attention, and a Byte-fallback BPE tokenizer. Model Inputs and Outputs The Mistral-Nemo-Base-2407 is a text-to-text model, meaning it takes text as input and generates text as output. The model can be used for a variety of natural language processing tasks, such as language generation, text summarization, and question answering. Inputs Text prompts Outputs Generated text Capabilities The Mistral-Nemo-Base-2407 model has demonstrated strong performance on a range of benchmarks, including HellaSwag, Winogrande, OpenBookQA, CommonSenseQA, TruthfulQA, and MMLU. It also exhibits impressive multilingual capabilities, scoring well on MMLU benchmarks across multiple languages such as French, German, Spanish, Italian, Portuguese, Russian, Chinese, and Japanese. What Can I Use It For? The Mistral-Nemo-Base-2407 model can be used for a variety of natural language processing tasks, such as: Content Generation**: The model can be used to generate high-quality text, such as articles, stories, or product descriptions. Question Answering**: The model can be used to answer questions on a wide range of topics, making it useful for building conversational agents or knowledge-sharing applications. Text Summarization**: The model can be used to summarize long-form text, such as news articles or research papers, into concise and informative summaries. Code Generation**: The model's training on a large proportion of code data makes it a potential candidate for tasks like code completion or code generation. Things to Try One interesting aspect of the Mistral-Nemo-Base-2407 model is its large 128k context window, which allows it to maintain coherence and understanding over longer stretches of text. This could be particularly useful for tasks that require reasoning over extended context, such as multi-step problem-solving or long-form dialogue. Researchers and developers may also want to explore the model's multilingual capabilities and see how it performs on specialized tasks or domains that require cross-lingual understanding or generation.

Updated Invalid Date

Text-to-Text

🤷

Mistral-Nemo-Instruct-2407

mistralai

972

The Mistral-Nemo-Instruct-2407 is a Large Language Model (LLM) that has been fine-tuned for instructional tasks. It is an instruct version of the Mistral-Nemo-Base-2407 model, which was jointly trained by Mistral AI and NVIDIA. The Mistral-Nemo-Instruct-2407 model significantly outperforms existing models of similar or smaller size. Model Inputs and Outputs The Mistral-Nemo-Instruct-2407 model takes text inputs and generates text outputs. It can be used for a variety of natural language processing tasks, including: Inputs Free-form text prompts Outputs Coherent, contextual text completions Responses to instructions or prompts Capabilities The Mistral-Nemo-Instruct-2407 model has strong capabilities in areas such as reasoning, knowledge, and coding. It performs well on a variety of benchmark tasks, including HellaSwag, Winogrande, OpenBookQA, CommonSenseQA, and TriviaQA. What Can I Use It For? The Mistral-Nemo-Instruct-2407 model can be used for a wide range of natural language processing applications, such as: Content Generation**: Generating coherent and contextual text, including stories, articles, and other creative content. Question Answering**: Answering questions on a variety of topics by drawing upon its broad knowledge base. Instructional Tasks**: Following and executing complex instructions or prompts, such as those related to coding, math, or task planning. Things to Try Some interesting things to try with the Mistral-Nemo-Instruct-2407 model include: Experimenting with different prompting strategies to see how the model responds to various types of instructions or queries. Exploring the model's multilingual capabilities by providing prompts in different languages. Testing the model's coding and reasoning abilities by presenting it with math problems, coding challenges, or open-ended questions that require logical thinking.

Updated Invalid Date

Text-to-Text

🏷️

L3-8B-Stheno-v3.1

Sao10K

100

The Llama-3-8B-Stheno-v3.1 model is an experimental roleplay-focused model created by Sao10K. It was fine-tuned using outputs from the Claude-3-Opus model along with human-generated data, with the goal of being well-suited for one-on-one roleplay scenarios, RPGs, and creative writing. Compared to the original LLaMA-3 model, this version has been optimized for roleplay use cases. The model is known as L3-RP-v2.1 on the Chaiverse platform, where it performed well with an Elo rating over 1200. Sao10K notes that the model handles character personalities effectively for one-on-one roleplay sessions, but may require some additional context and examples when used for more broad narrative or RPG scenarios. The model leans toward NSFW content, so users should explicitly indicate if they want to avoid that in their prompts. Model inputs and outputs Inputs Textual prompts for chatting, roleplaying, or creative writing Outputs Textual responses generated by the model to continue the conversation or narrative Capabilities The Llama-3-8B-Stheno-v3.1 model excels at immersive one-on-one roleplaying, with the ability to maintain consistent character personalities and flowing prose. It can handle a variety of roleplay scenarios, from fantasy RPGs to more intimate interpersonal interactions. The model also demonstrates creativity in its narrative outputs, making it well-suited for collaborative storytelling and worldbuilding. What can I use it for? This model would be well-suited for applications focused on interactive roleplay and creative writing. Game developers could leverage it to power NPCs and interactive storytelling in RPGs or narrative-driven games. Writers could use it to aid in collaborative worldbuilding and character development for their stories. The model's uncensored nature also makes it potentially useful for adult-oriented roleplaying and creative content, though users should be mindful of potential risks and legal considerations. Things to try Try using the model to engage in open-ended roleplaying scenarios, either one-on-one or in a group setting. Experiment with providing it with detailed character backstories and see how it responds, maintaining consistent personalities and personalities. You could also challenge the model with more complex narrative prompts, such as worldbuilding exercises or branching storylines, to explore its creative writing capabilities.

Updated Invalid Date

Text-to-Text