Togethercomputer

Models by this creator

🎲

GPT-NeoXT-Chat-Base-20B

togethercomputer

Total Score

694

GPT-NeoXT-Chat-Base-20B is a 20 billion parameter language model developed by Together Computer. It is based on EleutherAI's GPT-NeoX model and has been fine-tuned on over 43 million high-quality conversational instructions. The fine-tuning process focused on tasks such as question answering, classification, extraction, and summarization. Additionally, the model has undergone further fine-tuning on a small amount of feedback data to better adapt to human preferences in conversations. Model Inputs and Outputs Inputs Text prompt to generate a response from the model Outputs Generated text continuation of the input prompt Capabilities GPT-NeoXT-Chat-Base-20B is capable of engaging in open-ended dialog, answering questions, and generating human-like text across a variety of topics. Its fine-tuning on conversational data allows it to produce more coherent and contextually appropriate responses compared to a general language model. What Can I Use It For? The GPT-NeoXT-Chat-Base-20B model can be used as a foundation for building conversational AI applications, such as chatbots, virtual assistants, and interactive educational tools. Its large size and specialized training make it well-suited for tasks that require in-depth understanding and generation of natural language. You can fine-tune this model further on domain-specific data to create custom AI assistants for your business or organization. The OpenChatKit feedback app provided by the maintainers is a good starting point to experiment with the model's capabilities. Things to Try Try using the model to engage in open-ended dialog on a wide range of topics. Observe how it maintains context and coherence across multiple turns of conversation. You can also experiment with different prompting techniques, such as providing detailed instructions or personas, to see how the model adapts its responses accordingly. Another interesting aspect to explore is the model's ability to perform tasks like question answering, text summarization, and content generation. Provide the model with appropriate prompts and evaluate the quality and relevance of its outputs.

Read more

Updated 5/28/2024

🤔

LLaMA-2-7B-32K

togethercomputer

Total Score

522

LLaMA-2-7B-32K is an open-source, long context language model developed by Together, fine-tuned from Meta's original Llama-2 7B model. This model extends the context length to 32K with position interpolation, allowing applications on multi-document QA, long text summarization, and more. Compared to similar models like Llama-2-13b-chat-hf, Llama-2-7b-hf, Llama-2-13b-hf, and Llama-2-70b-chat-hf, this model focuses on handling longer contexts. Model inputs and outputs Inputs Text input Outputs Generated text Capabilities LLaMA-2-7B-32K can handle context lengths up to 32K, making it suitable for applications that require processing of long-form content, such as multi-document question answering and long text summarization. The model has been fine-tuned on a mixture of pre-training and instruction tuning data to improve its few-shot capabilities under long context. What can I use it for? You can use LLaMA-2-7B-32K for a variety of natural language generation tasks that benefit from long-form context, such as: Multi-document question answering Long-form text summarization Generating coherent and informative responses to open-ended prompts that require drawing upon a large context The model's extended context length and fine-tuning on long-form data make it well-suited for these kinds of applications. Things to try One interesting aspect of LLaMA-2-7B-32K is its ability to leverage long-range context to generate more coherent and informative responses. You could try providing the model with multi-paragraph prompts or documents and see how it performs on tasks like summarization or open-ended question answering, where the additional context can help it generate more relevant and substantive outputs.

Read more

Updated 5/27/2024

GPT-JT-6B-v1

togethercomputer

Total Score

301

GPT-JT-6B-v1 is a language model developed by togethercomputer. It is a fork of EleutherAI's GPT-J (6B) model that has been fine-tuned using a new decentralized training algorithm. The resulting model outperforms many 100B+ parameter models on classification benchmarks. GPT-JT-6B-v1 was trained on a large collection of diverse data, including Chain-of-Thought (CoT), the Public Pool of Prompts (P3) dataset, and the Natural-Instructions (NI) dataset. The model also uses the UL2 training objective, which allows the model to see bidirectional context of the prompt. Model inputs and outputs Inputs Text prompts of varying lengths Outputs Continued text output based on the input prompt Capabilities GPT-JT-6B-v1 has shown strong performance on a variety of classification benchmarks compared to larger 100B+ parameter models. The model is particularly adept at tasks that require reasoning and understanding of context, such as question answering and natural language inference. What can I use it for? GPT-JT-6B-v1 can be a powerful tool for a variety of text-based applications, such as: Content generation**: The model can be used to generate coherent and contextually relevant text, such as stories, articles, or dialogue. Question answering**: The model can be used to answer questions by drawing upon its broad knowledge base and understanding of language. Text classification**: The model can be used to classify text into different categories, such as sentiment, topic, or intent. Things to try One interesting aspect of GPT-JT-6B-v1 is its use of the UL2 training objective, which allows the model to see bidirectional context of the prompt. This can be particularly useful for tasks that require a deep understanding of the input text, such as summarization or natural language inference. Try experimenting with prompts that require the model to reason about the relationships between different parts of the input text. Another interesting avenue to explore is the model's performance on few-shot learning tasks. The description mentions that the model performs well on few-shot prompts for both classification and extraction tasks. Try designing a few-shot learning experiment and see how the model performs.

Read more

Updated 5/28/2024

🚀

Llama-2-7B-32K-Instruct

togethercomputer

Total Score

160

Llama-2-7B-32K-Instruct is an open-source, long-context chat model fine-tuned from Llama-2-7B-32K, over high-quality instruction and chat data. The model was built by togethercomputer using less than 200 lines of Python script and the Together API. This model extends the capabilities of Llama-2-7B-32K to handle longer context and focuses on few-shot instruction following. Model inputs and outputs Inputs Llama-2-7B-32K-Instruct takes text as input. Outputs The model generates text outputs, including code. Capabilities Llama-2-7B-32K-Instruct can engage in long-form conversations and follow instructions effectively, leveraging the extended context length of 32,000 tokens. The model has demonstrated strong performance on tasks like multi-document question answering and long-form text summarization. What can I use it for? You can use Llama-2-7B-32K-Instruct for a variety of language understanding and generation tasks, such as: Building conversational AI assistants that can engage in multi-turn dialogues Summarizing long documents or articles Answering questions that require reasoning across multiple sources Generating code or technical content based on prompts Things to try One interesting aspect of this model is its ability to effectively leverage in-context examples to improve its few-shot performance on various tasks. You can experiment with providing relevant examples within the input prompt to see how the model's outputs adapt and improve.

Read more

Updated 5/28/2024

⛏️

RedPajama-INCITE-Chat-3B-v1

togethercomputer

Total Score

144

The RedPajama-INCITE-Chat-3B-v1 is a 2.8B parameter language model developed by Together and leaders from the open-source AI community. It is fine-tuned on datasets like OASST1 and Dolly2 to enhance its chatting ability. The model is part of the RedPajama-INCITE series, which includes the base model RedPajama-INCITE-Base-3B-v1 and an instruction-tuned version RedPajama-INCITE-Instruct-3B-v1. The chat version, RedPajama-INCITE-Chat-3B-v1, is designed to excel at dialog-style interactions. Model inputs and outputs The RedPajama-INCITE-Chat-3B-v1 model takes in text prompts in a conversational format, where the human message is prefixed with : and the model's response is prefixed with :. The model outputs text continuations that continue the dialog. Inputs Text prompts in a conversational format, with the human message prefixed by : and the model's response prefixed by :. Outputs Continuation of the dialog, output as text. Capabilities The RedPajama-INCITE-Chat-3B-v1 model excels at several tasks out of the box, including: Summarization and question answering within context Extraction Classification The model also performs well on few-shot prompts, with improved performance on classification and extraction tasks compared to the base model. What can I use it for? The RedPajama-INCITE-Chat-3B-v1 model is intended for research purposes, such as safe deployment of models with the potential to generate harmful content, probing and understanding the limitations and biases of dialogue models, and use in educational or creative tools. The maintainer, togethercomputer, provides the model under an Apache 2.0 license. Things to try One interesting thing to try with the RedPajama-INCITE-Chat-3B-v1 model is exploring its few-shot capabilities. The model performs better on classification and extraction tasks when provided with a few examples in the prompt, compared to the base model. This suggests the model has learned to effectively leverage in-context information, which could be useful for a variety of applications.

Read more

Updated 5/27/2024

📉

StripedHyena-Nous-7B

togethercomputer

Total Score

135

The StripedHyena-Nous-7B (SH-N 7B) is a state-of-the-art chat model developed by Together Computer in collaboration with Nous Research. It is part of the StripedHyena model family, which uses a hybrid architecture of multi-head, grouped-query attention and gated convolutions arranged in Hyena blocks - a departure from traditional decoder-only Transformer models. The StripedHyena models are designed to improve on Transformers in terms of long-context processing, training, and inference performance. Compared to optimized Transformer models like LLaMA-2, SH-N 7B offers constant memory decoding, lower latency, and faster throughput. It is also trained on sequences up to 32k tokens, allowing it to handle longer prompts than typical chatbots. The model is similar in scale and capabilities to other open-source chatbots like Pythia-Chat-Base-7B and Nous-Hermes-13b, which are also fine-tuned on large instruction datasets to excel at open-ended dialogue and task completion. Model inputs and outputs Inputs Prompt**: The text that the model is asked to continue or respond to. Outputs Response**: The model's generated text output, continuing or responding to the provided prompt. Capabilities The StripedHyena-Nous-7B model is designed for open-ended chat and task completion. It can engage in freeform dialogue, answer questions, summarize information, and complete a variety of other language-based tasks. Its long-context processing capabilities allow it to maintain coherence and memory over longer interactions. What can I use it for? The SH-N 7B model is well-suited for building chatbots, virtual assistants, and other conversational AI applications. Its strong performance on language tasks makes it applicable for use cases like customer service, tutoring, content generation, and research. The long-context abilities could also enable applications in areas like multi-document summarization and question answering. Things to try One interesting aspect of the SH-N 7B model is its hybrid architecture, which aims to improve on the limitations of standard Transformer models. You could experiment with prompts that require long-range reasoning or coherence to see how the model performs compared to other chatbots. Additionally, you could try fine-tuning the model on domain-specific datasets to enhance its capabilities for your particular use case.

Read more

Updated 5/28/2024

🛸

m2-bert-80M-32k-retrieval

togethercomputer

Total Score

117

The m2-bert-80M-32k-retrieval model, developed by Together Computer, is an 80 million parameter checkpoint of M2-BERT that has been pretrained with a sequence length of 32,768 and fine-tuned for long-context retrieval tasks. This model builds upon the Monarch Mixer architecture, which aims to improve upon the standard BERT model for handling long sequences. Similar models include the all-mpnet-base-v2 from the Sentence-Transformers library, which maps sentences and paragraphs to a 768-dimensional vector space for tasks like clustering and semantic search, and the LLaMA-2-7B-32K model, which also extends the context length to 32,768 tokens. Model inputs and outputs Inputs Text**: The model can take in single sentences or longer passages of text up to 32,768 tokens in length. Outputs Sentence embeddings**: The model generates 768-dimensional vector representations of the input text, which can be used for tasks like retrieval, clustering, or similarity search. Capabilities The m2-bert-80M-32k-retrieval model is particularly well-suited for long-context tasks that require understanding and relating large amounts of text. Its extended 32,768 token context length allows it to capture and leverage relationships between distant parts of a document or corpus. What can I use it for? This model can be useful for applications that involve searching, ranking, or clustering large text corpora, such as academic papers, book chapters, or long-form web content. The long-context embeddings it generates could power semantic search engines, content recommendation systems, or document organization tools. Things to try One interesting aspect of this model is its ability to handle very long input sequences. You could experiment with feeding it excerpts from novels, technical manuals, or other long-form content and see how the model's understanding and representations of the text evolve as the context length increases. This could provide insights into the model's reasoning and help identify its strengths and limitations for real-world applications.

Read more

Updated 5/27/2024

RedPajama-INCITE-7B-Instruct

togethercomputer

Total Score

104

The RedPajama-INCITE-7B-Instruct model is a 6.9 billion parameter language model developed by Together Computer. It was fine-tuned from the RedPajama-INCITE-7B-Base model with a focus on few-shot learning applications. The model was trained using data from GPT-JT, with exclusion of tasks that overlap with the HELM core scenarios. The model is also available in an instruction-tuned version (RedPajama-INCITE-7B-Instruct) and a chat version (RedPajama-INCITE-7B-Chat). These variants are designed for specific use cases and may have different capabilities. Model inputs and outputs Inputs Text prompts for language generation tasks, such as open-ended questions, instructions, or dialogue starters. Outputs Coherent and contextual text responses generated by the model, based on the input prompt. Capabilities The RedPajama-INCITE-7B-Instruct model is particularly adept at few-shot learning tasks, where it can quickly adapt to new prompts and scenarios with limited training data. It has been shown to perform well on a variety of classification, extraction, and summarization tasks. What can I use it for? The RedPajama-INCITE-7B-Instruct model can be used for a wide range of language generation and understanding tasks, such as: Question answering Dialogue and chat applications Content generation (e.g., articles, stories, poems) Summarization Text classification Due to its few-shot learning capabilities, the model could be particularly useful for applications that require rapid adaptation to new domains or tasks. Things to try One interesting thing to try with the RedPajama-INCITE-7B-Instruct model is exploring its few-shot learning abilities. Try providing the model with prompts that are outside of its core training data, and see how it adapts and responds. You can also experiment with different prompt formats and techniques to further fine-tune the model for your specific use case.

Read more

Updated 5/28/2024

📉

RedPajama-INCITE-7B-Base

togethercomputer

Total Score

94

RedPajama-INCITE-7B-Base is a 6.9B parameter pretrained language model developed by Together and leaders from the open-source AI community, including Ontocord.ai, ETH DS3Lab, AAI CERC, Université de Montréal, MILA - Québec AI Institute, Stanford Center for Research on Foundation Models (CRFM), Stanford Hazy Research research group and LAION. The training was done on 3,072 V100 GPUs provided as part of the INCITE 2023 project on Scalable Foundation Models for Transferrable Generalist AI, awarded to MILA, LAION, and EleutherAI in fall 2022, with support from the Oak Ridge Leadership Computing Facility (OLCF) and INCITE program. Similar models developed by Together include the RedPajama-INCITE-Chat-3B-v1, which is fine-tuned for chatting ability, and the RedPajama-INCITE-Instruct-3B-v1, which is fine-tuned for few-shot applications. Model inputs and outputs Inputs Text prompts for language modeling tasks Outputs Predicted text continuation based on the input prompt Capabilities RedPajama-INCITE-7B-Base is a powerful language model that can be used for a variety of text-based tasks, such as text generation, summarization, and question answering. The model has been trained on a large corpus of text data, giving it broad knowledge and language understanding capabilities. What can I use it for? RedPajama-INCITE-7B-Base can be used for a variety of applications, such as chatbots, content generation, and language understanding. For example, you could use the model to build a chatbot that can engage in natural conversations, or to generate coherent and relevant text for tasks like creative writing or content creation. Things to try One interesting thing to try with RedPajama-INCITE-7B-Base is using it for few-shot learning tasks. The model has been trained on a large amount of data, but it can also be fine-tuned on smaller datasets for specific applications. This can help the model adapt to new tasks and domains while maintaining its strong language understanding capabilities.

Read more

Updated 5/28/2024

🤷

RedPajama-INCITE-7B-Chat

togethercomputer

Total Score

92

The RedPajama-INCITE-7B-Chat model was developed by Together and leaders from the open-source AI community, including Ontocord.ai, ETH DS3Lab, AAI CERC, Université de Montréal, MILA - Québec AI Institute, Stanford Center for Research on Foundation Models (CRFM), Stanford Hazy Research research group, and LAION. It is a 6.9B parameter pretrained language model that has been fine-tuned on OASST1 and Dolly2 datasets to enhance its chatting abilities. The model is available in three versions: RedPajama-INCITE-7B-Base, RedPajama-INCITE-7B-Instruct, and RedPajama-INCITE-7B-Chat. The RedPajama-INCITE-Chat-3B-v1 model is a smaller 2.8B parameter version of the RedPajama-INCITE-7B-Chat model, also developed by Together and the same community. It has been fine-tuned on the same datasets to enhance its chatting abilities. Model inputs and outputs The RedPajama-INCITE-7B-Chat model accepts text prompts as input and generates relevant text responses. The model is designed for conversational tasks, such as engaging in open-ended dialogue, answering questions, and providing informative responses. Inputs Text prompts**: The model takes text prompts as input, which can be in the form of a single sentence, a paragraph, or a multi-turn conversation. Outputs Text responses**: The model generates text responses that are relevant to the input prompt. The responses can vary in length and complexity, depending on the nature of the input. Capabilities The RedPajama-INCITE-7B-Chat model excels at a variety of conversational tasks, such as question answering, summarization, and task completion. For example, the model can provide informative responses to questions about a given topic, summarize long passages of text, and assist with completing open-ended tasks. What can I use it for? The RedPajama-INCITE-7B-Chat model can be used in a wide range of applications, such as chatbots, virtual assistants, and content generation tools. Developers can integrate the model into their applications to provide users with a more natural and engaging conversational experience. For example, the model could be used to create a virtual customer service agent that can assist customers with product inquiries and troubleshooting. It could also be used to generate summaries of news articles or research papers, or to assist with creative writing tasks. Things to try One interesting thing to try with the RedPajama-INCITE-7B-Chat model is to engage it in a multi-turn conversation and observe how it maintains context and understanding throughout the dialogue. You could also try providing the model with prompts that require it to draw insights or make inferences, rather than just providing factual information. Additionally, you could experiment with the model's ability to adapt to different styles of communication, such as formal versus casual language, or different levels of complexity in the prompts.

Read more

Updated 5/28/2024