Mosaicml

Models by this creator

🛸

mpt-7b

1.1K

The mpt-7b is a large language model developed by MosaicML, a company focused on building efficient AI models. It is part of the MosaicPretrainedTransformer (MPT) family of models, which use a modified transformer architecture optimized for efficient training and inference. The model was trained on 1 trillion tokens of English text and code, making it one of the larger open-source language models available. The key differences between mpt-7b and similar models like LLaMA and Pythia are: It is licensed for commercial use, unlike LLaMA. It was trained on a significantly larger dataset of 1 trillion tokens, compared to 300 billion for Pythia and 800 billion for StableLM. It can handle extremely long inputs of up to 84,000 tokens, thanks to the use of Attention with Linear Biases (ALiBi), compared to only 2,000-4,000 tokens for other open-source models. It is capable of fast training and inference, leveraging techniques like FlashAttention and FasterTransformer. Model inputs and outputs Inputs Text data, including natural language and source code Outputs Generated text, which can be used for a variety of language modeling tasks Capabilities The mpt-7b model is a powerful language model with impressive capabilities. It can be used for tasks like text generation, summarization, and translation. The model's large training dataset and long context length make it well-suited for working with long-form text, such as writing stories or generating technical documentation. What can I use it for? The mpt-7b model can be used for a variety of natural language processing tasks, such as: Content creation**: Use the model to generate draft text for blogs, articles, or stories, which can then be edited and refined. Technical writing**: Leverage the model's knowledge of code and technical concepts to assist in generating technical documentation or other software-related content. Chatbots and virtual assistants**: Fine-tune the model for conversational tasks to create more engaging and capable chatbots and virtual assistants. The model's commercial licensing also makes it suitable for use in commercial applications, unlike some other open-source language models. Things to try One interesting aspect of the mpt-7b model is its ability to handle extremely long inputs, thanks to the use of ALiBi. This could be leveraged to generate long-form content, such as novels or academic papers, by providing the model with detailed outlines or prompts as input. The model's efficiency and speed also make it a good candidate for experimentation with different prompt engineering techniques or fine-tuning approaches.

Updated 5/28/2024

Text-to-Text

👨‍🏫

mpt-7b-storywriter

mosaicml

793

The mpt-7b-storywriter is a large language model developed by MosaicML that is designed to read and write fictional stories with very long context lengths. It was built by fine-tuning the base MPT-7B model on a filtered fiction subset of the books3 dataset. The model utilizes techniques like ALiBi to handle extrapolating beyond its 65k token training context length, demonstrating generations up to 84k tokens. The mpt-7b-storywriter model is part of the MosaicPretrainedTransformer (MPT) family, which uses a modified transformer architecture optimized for efficient training and inference. These architectural changes include performance-optimized layer implementations and the elimination of context length limits. The MPT models can be served efficiently with both standard Hugging Face pipelines and NVIDIA's FasterTransformer. Model Inputs and Outputs Inputs Text prompts of up to 65,536 tokens in length, thanks to the use of ALiBi Outputs Continued story text generation, with the ability to extrapolate beyond the 65k token training context length up to 84k tokens Capabilities The mpt-7b-storywriter model is designed to excel at generating long-form fictional stories. It can handle extremely long input contexts and produce coherent, extended narratives. This makes it well-suited for tasks like creative writing assistance, story generation, and even interactive storytelling applications. What Can I Use It For? The mpt-7b-storywriter model can be used for a variety of creative writing and storytelling applications. Some potential use cases include: Generating original story ideas and plot outlines Assisting human writers by producing narrative continuations and story extensions Creating interactive fiction or choose-your-own-adventure style narratives Developing conversational storytelling agents or interactive characters Things to Try One interesting aspect of the mpt-7b-storywriter model is its ability to handle extremely long input context lengths and produce cohesive, extended narratives. You could try providing the model with a short story prompt and see how it continues and develops the story over many thousands of tokens. Alternatively, you could experiment with giving the model partial story outlines or character descriptions and see how it fleshes out the narrative. Another intriguing possibility is to fine-tune or adapt the mpt-7b-storywriter model for specific genres, styles, or storytelling formats. This could involve further training on domain-specific datasets or incorporating custom prompting techniques to tailor the model's outputs.

Updated 5/28/2024

Text-to-Text

⚙️

mpt-7b-chat

mosaicml

512

mpt-7b-chat is a chatbot-like model for dialogue generation. It was built by fine-tuning MPT-7B on several datasets, including ShareGPT-Vicuna, HC3, Alpaca, HH-RLHF, and Evol-Instruct. This allows the model to engage in more natural, open-ended dialogue compared to the base MPT-7B model. Model Inputs and Outputs Inputs Text prompts that the model will use to generate a response. Outputs Generated text responses that continue the dialogue based on the input prompt. Capabilities mpt-7b-chat can engage in freeform dialogue on a wide range of topics. It demonstrates strong language generation abilities and can provide detailed, contextual responses. For example, it can discuss programming concepts, generate gourmet meal recipes, and even roleplay as characters from fiction. What Can I Use It For? The mpt-7b-chat model could be used to power chatbots, virtual assistants, or other applications that require natural language interaction. Its ability to continue a conversation and provide relevant, engaging responses makes it well-suited for customer service, education, entertainment, and other applications where users need to interact with an AI system. Things to Try One interesting aspect of mpt-7b-chat is its ability to maintain context and persona over multiple turns of a conversation. Try providing the model with a detailed system prompt that establishes its identity and goals, then see how it responds to a series of follow-up questions or requests. This can help you explore the model's conversational capabilities and understand how it uses the provided context to inform its responses.

Updated 5/28/2024

Text-to-Text

📊

mpt-7b-instruct

mosaicml

461

mpt-7b-instruct is a model for short-form instruction following. It was built by finetuning MPT-7B on a dataset derived from the Databricks Dolly-15k and the Anthropic Helpful and Harmless (HH-RLHF) datasets. This model was trained by MosaicML. Model Inputs and Outputs This is a text-to-text model, taking in natural language text and generating new text in response. The model can handle a wide range of input prompts and produce diverse outputs, from succinct factual answers to engaging stories. Inputs Natural language text prompts, which can include instructions, questions, or open-ended requests Outputs Generated text relevant to the input prompt Outputs can range from short factual responses to longer narrative pieces Capabilities mpt-7b-instruct demonstrates strong performance on a variety of language tasks, including question answering, summarization, and open-ended generation. For example, when given the prompt "What is a quoll?", the model provides a detailed explanation of this Australian marsupial. The model can also generate creative stories and engage in open-ended dialogue when prompted. What Can I Use It For? The mpt-7b-instruct model could be useful for a variety of applications that require natural language processing, such as: Building chatbots or virtual assistants that can understand and respond to user instructions Automating content creation tasks like writing summaries, articles, or creative fiction Enhancing search engines or question-answering systems with more natural language understanding Things to Try One interesting aspect of the mpt-7b-instruct model is its ability to handle very long input sequences, thanks to the use of ALiBi. You could try providing the model with long passages of text, such as entire books or lengthy articles, and see how it responds to open-ended prompts or generates continuations. The model's capacity for handling long-form content makes it a compelling tool for tasks like story generation or summarization.

Updated 5/28/2024

Text-to-Text

🤔

mpt-30b

mosaicml

338

The mpt-30b is a large language model trained by MosaicML, a company focused on developing cutting-edge AI models. It is part of the Mosaic Pretrained Transformer (MPT) family of models, which use a modified transformer architecture optimized for efficient training and inference. The mpt-30b model was trained on 1 trillion tokens of English text and code, significantly more data than models like LLaMA (300 billion tokens), Pythia (300 billion), OpenLLaMA (300 billion), and StableLM (800 billion). This allows the mpt-30b to have strong capabilities across a wide range of natural language tasks. Additionally, the mpt-30b includes several architectural innovations that set it apart, like support for an 8k token context window (which can be further extended via finetuning), context-length extrapolation via ALiBi, and efficient inference and training via FlashAttention. These features enable the model to handle very long inputs and generate coherent text, making it well-suited for tasks like long-form writing. Model inputs and outputs Inputs Text**: The mpt-30b model takes in natural language text as input, which can range from short prompts to long-form passages. Outputs Generated text**: The primary output of the mpt-30b model is continuation of the input text, generating coherent and contextually relevant output. The model can be used for a variety of text generation tasks, from creative writing to question-answering. Capabilities The mpt-30b model has shown strong performance on a wide range of language tasks, including text generation, question-answering, and code generation. Its large scale and architectural innovations allow it to handle long-form inputs and outputs effectively. For example, the model can be used to generate multi-paragraph stories or long-form instructional content. What can I use it for? The mpt-30b model is well-suited for a variety of natural language processing applications, particularly those that require handling long-form text. Some potential use cases include: Content creation**: The model can be used to assist with writing tasks like creative fiction, technical documentation, or marketing copy. Question-answering**: With its strong understanding of language, the mpt-30b can be used to build chatbots or virtual assistants that can engage in informative and contextual conversations. Code generation**: Due to its training on a mix of text and code, the model can be used to generate or assist with writing code. Companies looking to leverage large language models for their business could consider finetuning the mpt-30b on their own data to create custom AI assistants or content generation tools. The MosaicML Platform provides tools and services to help with this process. Things to try One interesting aspect of the mpt-30b model is its ability to handle very long inputs and outputs due to the ALiBi architecture. This could make it well-suited for tasks like long-form story generation or summarization of lengthy documents. Experimenting with pushing the boundaries of the model's context window could yield compelling results. Additionally, the model's strong performance on both text and code suggests it could be a powerful tool for developing AI-assisted programming workflows. Prompting the model with high-level instructions or pseudocode and seeing how it translates that into working code could be an illuminating exercise. Overall, the mpt-30b represents a significant step forward in the development of large language models, and its combination of scale, capability, and efficiency make it an intriguing model to explore and experiment with.

Updated 5/28/2024

Text-to-Text

👁️

mpt-30b-chat

mosaicml

199

mpt-30b-chat is a chatbot-like model for dialogue generation developed by MosaicML. It was built by fine-tuning the larger MPT-30B model on several datasets, including ShareGPT-Vicuna, Camel-AI, GPTeacher, Guanaco, and Baize. This model follows a modified decoder-only transformer architecture and is licensed for non-commercial use only. Model inputs and outputs The mpt-30b-chat model is designed for text-to-text tasks, taking in natural language prompts and generating relevant responses. It has an 8k token context window, which can be further extended via fine-tuning, and supports context-length extrapolation via ALiBi. Inputs Natural language prompts for conversation or dialogue Outputs Generated text responses to continue a conversation or provide relevant information Capabilities The mpt-30b-chat model excels at engaging in multi-turn conversations and following short-form instructions. Its large 30B parameter size and fine-tuning on specialized datasets give it strong coding abilities and the capacity to handle a wide range of conversational topics. What can I use it for? The mpt-30b-chat model can be used to power conversational AI assistants, chatbots, and interactive applications. Its capabilities make it well-suited for tasks like customer service, educational applications, and creative writing assistance. While licensed for non-commercial use only, interested parties can explore the model's potential on the MosaicML platform. Things to try One interesting aspect of mpt-30b-chat is its ability to extrapolate beyond its 8k token context window through the use of ALiBi. This allows the model to maintain coherence and context over longer dialogues, opening up possibilities for more substantive and engaging conversations.

Updated 5/28/2024

Text-to-Text

🌐

mpt-30b-instruct

mosaicml

The mpt-30b-instruct model is a powerful open-source language model developed by MosaicML that is designed for short-form instruction following. This model is built by fine-tuning the larger MPT-30B model on several datasets, including Dolly HHRLHF, Competition Math, Duorc, and more. Compared to similar open-source models like mpt-7b-instruct and mpt-30b-chat, the mpt-30b-instruct model is significantly larger with 30 billion parameters, providing enhanced capabilities for tasks like instruction following. It utilizes the same modified decoder-only transformer architecture as other MPT models, which incorporates performance-boosting techniques like FlashAttention and ALiBi. Model inputs and outputs Inputs Text prompts**: The model accepts natural language text prompts that describe a task or provide instructions for the model to follow. Outputs Text responses**: The model generates text responses that complete the given task or follow the provided instructions. Capabilities The mpt-30b-instruct model excels at a variety of short-form instruction following tasks, such as answering questions, solving math problems, summarizing texts, and more. It demonstrates strong language understanding and reasoning abilities, allowing it to interpret complex instructions and provide relevant, coherent responses. What can I use it for? Developers and researchers can leverage the mpt-30b-instruct model for a wide range of applications that require natural language processing and generation capabilities. Some potential use cases include: Question-answering systems**: Build chatbots or virtual assistants that can comprehend and respond to user queries. Automated task completion**: Develop applications that can follow written instructions to perform various tasks, such as writing reports, generating code snippets, or solving math problems. Content summarization**: Use the model to automatically summarize long-form text, such as articles or research papers, into concise summaries. Things to try One interesting aspect of the mpt-30b-instruct model is its ability to handle long-form inputs and outputs, thanks to the use of ALiBi in its architecture. Developers can experiment with extending the model's context length during fine-tuning or inference to see how it performs on tasks that require generating or comprehending longer passages of text. Additionally, the model's strong coding abilities, gained from its pretraining data mixture, make it a compelling choice for applications that involve code generation or analysis. Researchers and engineers can explore using the mpt-30b-instruct model for tasks like code completion, code summarization, or even automated programming.

Updated 5/28/2024

Text-to-Text

👁️

mpt-1b-redpajama-200b

mosaicml

The mpt-1b-redpajama-200b is a 1.3 billion parameter decoder-only transformer model trained by MosaicML on the RedPajama dataset. It follows a modified decoder-only transformer architecture, using techniques like FlashAttention, ALIBI, and QK LayerNorm. This model was trained for 200 billion tokens, with the dataset mix similar to the Llama series of models. Model inputs and outputs The mpt-1b-redpajama-200b is a causal language model that takes in text and generates continuations of that text. It can be used for a variety of natural language processing tasks, such as text generation, summarization, and translation. Inputs Raw text that the model will use to generate continuations Outputs Continued text generated by the model based on the input Capabilities The mpt-1b-redpajama-200b model has been trained on a large and diverse corpus of text, giving it broad capabilities in natural language understanding and generation. It can be used for tasks like creative writing, summarization, and open-ended conversation. What can I use it for? The mpt-1b-redpajama-200b model can be used for a variety of natural language processing tasks, such as: Text generation**: Use the model to generate coherent and contextually relevant text continuations, such as stories, articles, or dialogue. Summarization**: Feed the model long-form text and have it generate concise summaries. Conversational AI**: Fine-tune the model on conversational data to create chatbots and virtual assistants. Things to try One interesting thing to try with the mpt-1b-redpajama-200b model is to experiment with the different architectural modifications, such as the use of ALIBI and the elimination of positional embeddings. This can help you understand how these choices impact the model's performance and capabilities. Another idea is to fine-tune the model on a specific domain or task, leveraging its broad knowledge base to create a specialized model tailored to your needs. The MosaicML Platform offers tools and resources to help with this process.

Updated 5/28/2024

Text-to-Text

📶

mpt-1b-redpajama-200b-dolly

mosaicml

mpt-1b-redpajama-200b-dolly is a 1.3 billion parameter decoder-only transformer model that was pre-trained on the RedPajama dataset and then fine-tuned on the Databricks Dolly instruction dataset. This model was trained by MosaicML, a company focused on developing efficient and capable AI models. The mpt-1b-redpajama-200b model, which serves as the base for this fine-tuned version, was pre-trained for 200B tokens using the same data proportions as the Llama series of models. The architecture of this model follows a modified decoder-only transformer design, incorporating features like FlashAttention, ALIBI, and QK LayerNorm. Model Inputs and Outputs Inputs Text prompts that describe a task or request Outputs Responses that appropriately complete the requested task Capabilities mpt-1b-redpajama-200b-dolly is an instruction-following model that can perform a wide variety of tasks based on the input prompt, such as answering questions, writing reports, generating creative stories, and providing analysis. The model's training on the Databricks Dolly dataset helps it understand and follow complex instructions reliably. What Can I Use It For? This model could be useful for automating various text-based workflows within a company, such as customer service, content creation, or data analysis. By providing clear instructions, employees can leverage the model to save time and improve consistency. Additionally, the model's open-source nature and commercial use license make it accessible for companies to fine-tune on their own proprietary data. Things to Try One interesting aspect of mpt-1b-redpajama-200b-dolly is its ability to handle extremely long input context, thanks to the use of ALIBI. This could allow for tasks that require synthesizing information from large amounts of text, such as summarizing research papers or generating long-form creative writing. Experimenting with providing the model with extended context and observing its responses could yield interesting results.

Updated 5/27/2024

Text-to-Text

🔍

mosaic-bert-base

mosaicml

The mosaic-bert-base model is a custom BERT architecture and training recipe optimized for fast pretraining. Developed by MosaicML, MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy compared to Hugging Face's bert-base-uncased model. This work informed many of the architecture choices around MosaicML's larger models like MPT-7B and MPT-30B. Model Inputs and Outputs The mosaic-bert-base model uses the same tokenizer as the standard BERT model, so the inputs and outputs are similar. It takes in tokenized text sequences and outputs logits for masked language modeling. Inputs Tokenized text sequences Outputs Logits for masked language modeling Capabilities The mosaic-bert-base model achieves higher pretraining and finetuning accuracy compared to the standard BERT model. This makes it well-suited for a variety of natural language tasks that can benefit from enhanced language modeling capabilities, such as text generation, classification, and question answering. What Can I Use It For? The mosaic-bert-base model can be used as a strong base model for finetuning on specialized NLP tasks. For example, a DNABERT-2-117M model was created by finetuning mosaic-bert-base for genome classification. Things to Try One interesting aspect of the mosaic-bert-base model is its ability to extrapolate to longer sequence lengths through the use of ALiBi (Attention with Linear Biases). By adjusting the alibi_starting_size flag in the config, you can experiment with increasing the maximum sequence length the model can handle during inference, potentially unlocking new use cases.

Updated 9/6/2024

Text-to-Text