Bigscience

Models by this creator

🤔

bloom

4.6K

BLOOM is a large language model developed by the BigScience collective, a group of over 1,000 researchers from around the world. It is a 176 billion parameter decoder-only transformer model trained on a dataset of over 1.5 TB of text data in 46 natural languages and 13 programming languages. Like other GPT-style models, BLOOM is trained to continue text from a prompt, producing coherent and contextually relevant output. Similar models include the bloom-7b1 and bloomz variants, which are smaller models finetuned from BLOOM for different applications. The BLOOMChat-176B-v1 model, developed by SambaNova Systems, is an instruction-tuned version of BLOOM for conversational tasks. Model inputs and outputs BLOOM takes a text prompt as input and generates continuation text as output. The model can understand and generate text in 46 natural languages and 13 programming languages. Some key highlights include the large scale of the model, its multilingual capabilities, and the use of ALiBI positional embeddings to enable modeling of long-range dependencies. Inputs Text prompt:** A sequence of text, which the model will use to generate a continuation. Sequence length:** BLOOM accepts sequences up to 2048 tokens in length. Outputs Generated text:** Text continuation, where each generated token is selected to maximize the probability of the full output sequence given the input prompt. Likelihood:** A measure of how likely the generated text is, based on the model's internal probabilities. Capabilities BLOOM is a highly capable language model that can be used for a wide variety of text-related tasks. It can be used for open-ended text generation, such as creative writing or story generation. It can also be used for more structured tasks like translation, summarization, and question answering by framing them as text generation problems. What can I use it for? BLOOM's large scale and multilingual capabilities make it a powerful tool for research and development in natural language processing. Researchers can use BLOOM as a starting point for fine-tuning on specific tasks, or analyze its internal representations to gain insights into language learning. Developers can also integrate BLOOM into applications that require language understanding and generation, such as chatbots, virtual assistants, and language learning tools. However, it's important to note that BLOOM is not intended for use in high-stakes or safety-critical applications, as it can produce incorrect or biased information. Users should carefully evaluate the model's outputs and take appropriate precautions when deploying BLOOM-based systems. Things to try One interesting aspect of BLOOM is its ability to generate text in multiple languages. You could try prompting the model with a phrase in one language and see what it generates in another. Another interesting experiment would be to explore BLOOM's performance on programming language tasks, such as code generation or explanation. Additionally, you could investigate BLOOM's few-shot or zero-shot learning capabilities by framing tasks as text generation problems and seeing how the model performs without fine-tuning. This could provide insights into the model's general language understanding abilities.

Updated 5/28/2024

Text-to-Text

🌿

bloomz

bigscience

491

The bloomz model is a family of multilingual language models trained by the BigScience workshop. It is based on the BLOOM model and fine-tuned on the cross-lingual task mixture (xP3) dataset, giving it the capability to follow human instructions in dozens of languages without additional training. The model comes in a range of sizes, from 300M to 176B parameters, allowing users to choose the appropriate size for their needs. The bloomz-mt variants are further fine-tuned on the xP3mt dataset and are recommended for prompting in non-English languages. The bloomz model is similar to other large language models like BELLE-7B-2M, which is also based on Bloomz-7b1-mt and fine-tuned on Chinese and English data. Another related model is xlm-roberta-base, a multilingual version of RoBERTa pre-trained on 100 languages. Model inputs and outputs Inputs Prompts**: The bloomz model takes natural language prompts as input, which can be in any of the supported languages. Outputs Generated text**: The model outputs generated text that responds to the input prompt, following the instructions provided. The output can be in the same language as the input or in a different supported language. Capabilities The bloomz model is capable of understanding and generating text in dozens of languages, including both high-resource and low-resource languages. It can follow a wide range of instructions, such as translation, question answering, and task completion, without additional fine-tuning. This makes it a versatile tool for multilingual natural language processing tasks. What can I use it for? The bloomz model can be used for a variety of multilingual natural language processing tasks, such as: Machine translation**: Use the model to translate text between different languages. Question answering**: Ask the model questions and have it provide relevant answers. Task completion**: Give the model instructions for a task, and have it generate the required output. Text generation**: Use the model to generate coherent and contextually appropriate text. The different model sizes available allow users to choose the appropriate model for their needs, balancing performance and resource requirements. Things to try One interesting aspect of the bloomz model is its ability to generalize across languages. Try providing prompts in different languages and observe how the model responds. You can also experiment with mixing languages within a single prompt to see how the model handles code-switching. Additionally, the bloomz-mt variants may be particularly useful for applications where the input or output language is not English. Explore the performance of these models on non-English tasks and compare them to the original bloomz versions.

Updated 5/28/2024

Text-to-Text

👨‍🏫

T0pp

bigscience

390

The T0pp model, pronounced "T Zero Plus Plus", is an encoder-decoder language model developed by the BigScience workshop. It shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks while being 16x smaller. The T0pp model is part of the T0 series, which are a set of models trained on a large mixture of different NLP tasks specified through natural language prompts. The T0 and T0p models are similar variants that were trained on different datasets. The T0_3B model is a 3 billion parameter version of the T0 series. Model inputs and outputs Inputs Natural language prompts describing a task or query Outputs Predictions or responses generated by the model to complete the task described in the input prompt Capabilities The T0pp model can perform a wide variety of NLP tasks by interpreting natural language prompts, including: Question answering Sentiment analysis Paraphrasing Natural language inference Word sense disambiguation And more For example, you can ask the model "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", and it will likely generate the response "Positive". What can I use it for? The T0pp model can be used to build applications that can understand and complete a diverse range of natural language tasks without needing to be specifically trained on each task. This makes it useful for building flexible, multi-purpose AI assistants and chatbots. Some potential use cases include: Customer service chatbots that can handle a wide variety of inquiries Writing assistants that can help with tasks like proofreading, ideation, and summarization Intelligent search and question-answering systems Educational and language learning tools The model's ability to generalize to new tasks through natural language prompts makes it a powerful tool for quickly deploying new AI capabilities. Things to try One interesting aspect of the T0pp model is its ability to perform well on tasks with minimal or varying prompting. You can experiment with rephrasing the same task in different ways to see how the model's performance is affected. This can provide insights into the model's understanding and the importance of prompt engineering. Additionally, the T0pp model can be further fine-tuned on specific tasks or datasets to improve its performance on those areas. This fine-tuning process and the resulting model's capabilities would be an interesting area to explore.

Updated 5/28/2024

Text-to-Text

📈

bloom-560m

bigscience

326

The bloom-560m is a large language model developed by the BigScience research collective. It is a transformer-based model trained on a vast multilingual dataset spanning 45 natural languages and 12 programming languages. The model is part of the BLOOM family of language models, which also includes the larger bloom-1b1 and bloom-1b7 models. These models are designed to enable public research on large language models and can be used for a variety of text generation tasks. Model inputs and outputs The bloom-560m model takes text prompts as input and generates coherent text outputs in response. The model was trained on a diverse dataset, allowing it to understand and generate text in multiple languages. It can be used for tasks like text generation, language modeling, and exploring the characteristics of language generated by a large language model. Inputs Text prompts in a variety of languages, including natural languages and programming languages Outputs Generated text in response to the input prompts The generated text can be in the same language as the input prompt, or in a different language if the model is instructed to translate or generate text in a specific language Capabilities The bloom-560m model is capable of generating coherent and contextually relevant text in a wide range of languages. It can be used for tasks like language translation, text summarization, and even creative writing. The model's multilingual capabilities make it a valuable tool for researchers and developers working on multilingual applications. What can I use it for? The bloom-560m model can be used for a variety of text-based tasks, such as: Text generation**: Generating coherent text in response to prompts, which can be used for creative writing, content generation, and more. Language modeling**: Exploring the characteristics of the language generated by the model, which can provide insights into language use and patterns. Language translation**: Translating text from one language to another, leveraging the model's multilingual capabilities. Downstream tasks**: Using the bloom-560m model as a pre-trained base for fine-tuning on specific tasks, such as question answering, information extraction, or summarization. Researchers and developers can use the bloom-560m model to explore the capabilities of large language models and develop applications that leverage these capabilities. Things to try One interesting aspect of the bloom-560m model is its ability to generate text in a wide range of programming languages. Developers can experiment with using the model to generate code snippets, explore how the model represents programming concepts, or even try to fine-tune the model on specific programming tasks. Another interesting direction to explore is the model's multilingual capabilities. Users can try providing prompts in different languages and observe how the model generates text in response, or experiment with using the model for cross-lingual tasks like translating between languages. Overall, the bloom-560m model offers a rich set of capabilities for researchers and developers to explore, and the provided links to similar models and related research papers can serve as a valuable starting point for further investigation.

Updated 5/28/2024

Text-to-Text

🌿

tr11-176B-logs

bigscience

249

The tr11-176B-logs model is a large language model being developed by the BigScience research workshop. It is a 176 billion parameter decoder-only model trained on a multilingual dataset of 46 languages and over 341 billion tokens. The model uses a GPT-like architecture with 70 layers, 112 attention heads per layer, and a hidden dimensionality of 14,336. Similar to GPT-2 and GPT-3, the tr11-176B-logs model is designed for general-purpose natural language tasks. The training data for the tr11-176B-logs model comes from a diverse set of web-crawled sources, including Wikipedia, news articles, and other web pages in 46 languages. The dataset totals 341.6 billion tokens, making it one of the largest public language model training sets available. The model uses a 250,680 token vocabulary. In comparison to other large language models, the tr11-176B-logs model is similar in scale to GPT-3, with over 2x the parameters of the 175B parameter GPT-3 model. However, the focus on multilingual training sets it apart from models like GPT-3 that are primarily trained on English data. The BigScience workshop is also taking a more open and collaborative approach to the development of this model compared to the closed-source nature of GPT-3. Model Inputs and Outputs Inputs Text**: The tr11-176B-logs model takes raw text as input, with a maximum sequence length of 2,048 tokens. Outputs Text generation**: The primary output of the tr11-176B-logs model is the generation of natural language text. Given a prompt, the model can continue generating additional text in a coherent and contextual manner. Capabilities The massive scale and multilingual training of the tr11-176B-logs model enable a wide range of natural language processing capabilities. The model can be used for tasks like language translation, question answering, text summarization, and general text generation across many languages. For example, the model could be used to generate coherent and informative text on a wide variety of topics in multiple languages. It could also be used to translate text between languages or answer questions based on provided context. What Can I Use It For? The tr11-176B-logs model is primarily intended for research purposes, to further the development of large language models and their applications. Researchers and developers could fine-tune or adapt the model for a variety of natural language tasks, leveraging the model's strong performance and broad knowledge. Some potential use cases include: Developing multilingual chatbots or virtual assistants Enhancing machine translation systems Powering content generation for multi-lingual websites or applications Providing a foundation for research into ethical and responsible AI development However, due to the model's large scale and lack of fine-tuning on specific tasks, it may not be immediately ready for deployment in production environments without additional safety and robustness testing. Things to Try One interesting aspect of the tr11-176B-logs model is its ability to handle a wide range of languages. Developers could experiment with providing prompts in different languages and observing the model's response quality and coherence. This could help uncover strengths, weaknesses, or biases in the model's multilingual capabilities. Researchers could also investigate methods for fine-tuning or adapting the tr11-176B-logs model for specific downstream tasks, such as question answering or text summarization. By leveraging the model's strong general-purpose capabilities, it may be possible to achieve high performance on these tasks with relatively little additional training data or fine-tuning. Overall, the tr11-176B-logs model represents an exciting development in the field of large language models and opens up many possibilities for future research and applications.

Updated 5/28/2024

Text-to-Text

🔮

bloom-7b1

bigscience

184

bloom-7b1 is a 7 billion parameter multilingual language model developed by the BigScience collaborative research workshop. It was pretrained on a large, diverse dataset of 341.6 billion tokens in 46 languages. The model uses a transformer-based architecture similar to GPT-2, with modifications such as layer normalization on the word embeddings, ALiBI positional encodings, and GeLU activation functions. bloom-7b1 is part of the larger BLOOM model family, which includes variants ranging from 560 million to 176 billion parameters. The BLOOMZ model is a finetuned version of bloom-7b1 that has been optimized for cross-lingual tasks and understanding. Model inputs and outputs bloom-7b1 is a text-to-text model that can be used for a variety of natural language processing tasks. It takes text as input and generates relevant text as output. Inputs Free-form text in multiple languages, such as prompts, instructions, or questions Outputs Relevant text responses generated based on the input The model can be used for tasks like translation, question answering, and open-ended text generation Capabilities bloom-7b1 has strong multilingual capabilities, able to understand and generate text in 46 different languages. The model has shown promising performance on a variety of benchmarks, including translation, language understanding, and open-ended generation tasks. What can I use it for? bloom-7b1 can be used for a wide range of natural language processing applications, such as: Translation**: Translating text between supported languages Question Answering**: Answering questions based on provided context Summarization**: Generating concise summaries of longer text Text Generation**: Producing coherent, human-like text based on prompts The model's multilingual capabilities make it particularly useful for projects that involve working with text in multiple languages. Developers and researchers can fine-tune bloom-7b1 on domain-specific data to adapt it for their particular use cases. Things to try Some interesting things to try with bloom-7b1 include: Experimenting with different prompting techniques to see how the model responds to various types of input Evaluating the model's performance on specialized benchmarks or datasets relevant to your application Exploring the model's ability to handle long-form text, such as generating multi-paragraph responses Investigating how the model's performance varies across different languages and language pairs By leveraging the capabilities of bloom-7b1, you can unlock new possibilities for your natural language processing projects.

Updated 5/28/2024

Text-to-Text

🤖

bloomz-7b1-mt

bigscience

133

The bloomz-7b1-mt model is a multilingual language model developed by the BigScience research workshop. It is a variant of the BLOOM model that has been fine-tuned on a cross-lingual task mixture (xP3) dataset to improve its ability to follow human instructions and perform tasks in multiple languages. The model has 7.1 billion parameters and was trained using a variety of computational resources, including a Jean Zay Public Supercomputer. Model inputs and outputs Inputs Natural language prompts or instructions in a wide range of languages, including English, Mandarin Chinese, Spanish, Hindi, and many others. Outputs Coherent text continuations or responses in the same language as the input prompt, following the given instructions or completing the requested task. Capabilities The bloomz-7b1-mt model is capable of understanding and generating text in dozens of languages, allowing it to perform a variety of cross-lingual tasks. It can translate between languages, answer questions, summarize text, and even generate creative content like stories and poems. The model's multilingual capabilities make it a powerful tool for language learning, international communication, and multilingual applications. What can I use it for? The bloomz-7b1-mt model can be used for a wide range of natural language processing tasks, including: Machine translation between languages Question answering in multiple languages Text summarization across languages Creative writing assistance in different languages Language learning and practice Developers and researchers can fine-tune the model for more specific use cases, or use it as a starting point for building multilingual AI applications. Things to try Some interesting things to try with the bloomz-7b1-mt model include: Providing prompts in different languages and observing the model's ability to understand and respond appropriately. Experimenting with the model's code generation capabilities by giving it prompts to write code in various programming languages. Exploring the model's ability to maintain coherence and consistency when responding to multi-turn conversations or tasks that span multiple languages. Evaluating the model's performance on specialized tasks or domains, such as scientific or legal text, to assess its broader applicability. By testing the model's capabilities and limitations, users can gain valuable insights into the current state of multilingual language models and help drive future advancements in this important area of AI research.

Updated 5/28/2024

Text-to-Text

🧪

bloomz-7b1

bigscience

133

The bloomz-7b1 is a large language model developed by the BigScience research workshop. It is part of the BLOOMZ and mT0 model family, which are capable of following human instructions in dozens of languages zero-shot. The model was created by fine-tuning the BLOOM and mT5 pre-trained multilingual language models on the xP3 crosslingual task mixture dataset. This resulted in a model that can generalize to unseen tasks and languages. Model inputs and outputs The bloomz-7b1 model is a text-to-text transformer that can take natural language prompts as input and generate coherent text responses. It has been trained on a vast multilingual dataset spanning 46 natural languages and 13 programming languages. The model can understand both the languages used in pre-training as well as the additional languages introduced during fine-tuning. Inputs Natural language prompts in a variety of languages, including instructions, questions, and open-ended text generation tasks. Outputs Fluent text responses in the same languages as the input prompts, demonstrating the model's ability to understand and generate content across many languages. Capabilities The bloomz-7b1 model has shown strong zero-shot performance on a wide range of tasks, including translation, question answering, and few-shot learning. It can be prompted to perform tasks it was not explicitly trained for by framing them as text generation problems. For example, the model can be asked to "Translate to English: Je taime" and generate the response "I love you." What can I use it for? The bloomz-7b1 model is well-suited for research and exploration of large language models, particularly in the areas of multilingual and crosslingual learning. Developers and researchers can use the model as a foundation for building applications that require natural language understanding and generation in multiple languages. Some potential use cases include: Building multilingual chatbots and virtual assistants Developing crosslingual information retrieval and question answering systems Exploring the capabilities and limitations of zero-shot learning in language models Things to try One interesting aspect of the bloomz-7b1 model is its ability to understand and generate text in dozens of languages. Experiment with prompting the model in different languages to see how it responds. You can also try providing the model with more context about the desired language or task, such as "Explain in Telugu what is backpropagation in neural networks." Another area to explore is the model's performance on specific downstream tasks. The paper accompanying the model release provides some initial zero-shot evaluation results, but there may be opportunities to fine-tune or adapt the model for more specialized applications.

Updated 5/27/2024

Text-to-Text

🔮

bloom-1b7

bigscience

115

bloom-1b7 is a large open-access multilingual language model developed by the BigScience research workshop. It is a transformer-based model trained on 45 natural languages and 12 programming languages, with 7 billion parameters. The model is based on a modified version of the Megatron-LM GPT2 architecture, with an autoregressive decoder-only design. Similar models in the BigScience ecosystem include the bloom-7b1 model, which has more parameters and was trained on a larger corpus, as well as the BLOOMZ family of models that have been further fine-tuned on cross-lingual tasks. Model inputs and outputs Inputs Natural language text prompts in a wide range of languages Programming language code snippets Outputs Continued natural language text, generating coherent passages Translations between supported languages Responses to open-ended prompts and questions Capabilities bloom-1b7 is a highly capable language model that can generate fluent text in dozens of languages, perform translation tasks, and even write original content like stories and explanations. It demonstrates strong cross-lingual understanding, allowing it to generalize to new tasks and languages beyond its training data. What can I use it for? The bloom-1b7 model is well-suited for a variety of text-based applications and research projects. Potential use cases include: Text generation and creative writing assistance Multilingual chatbots and virtual assistants Language learning and educational tools Exploratory analysis of model capabilities and biases Researchers may also find the model useful as a pre-trained base for further fine-tuning on specific tasks or domains. Things to try One interesting aspect of bloom-1b7 is its ability to generate text in a wide range of programming languages, not just natural languages. You could try prompting the model with code snippets and seeing how it continues or modifies the code. Another fun experiment would be to give the model open-ended prompts in different languages and see how it responds, exploring its cross-lingual reasoning and generation abilities. For example, you could prompt it to "Write a fairy tale about a troll saving a princess from a dangerous dragon" in Spanish and see the resulting story.

Updated 5/28/2024

Text-to-Text

👀

T0_3B

bigscience

The T0_3B model is a series of encoder-decoder models trained on a large set of different natural language processing tasks. It was developed by the BigScience research workshop and outperforms GPT-3 on many tasks while being 16 times smaller. The T0_3B model is part of the T0 model family, which includes variants like T0pp and T0_single_prompt. These models show strong zero-shot task generalization, meaning they can perform unseen tasks specified in natural language prompts. Model inputs and outputs The T0_3B model is designed to accept natural language prompts as input and generate corresponding predictions as output. For example, you could provide the prompt "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy" and the model would output "Positive". Inputs Natural language prompts specifying various tasks, such as: Question answering Sentiment analysis Textual entailment Language understanding Outputs Textual responses to the input prompts, such as: Answer to a question Sentiment label (positive, negative, etc.) Entailment prediction (entailment, contradiction, neutral) Explanations or reasoning about the input Capabilities The T0_3B model demonstrates strong zero-shot task generalization, meaning it can perform a wide variety of natural language processing tasks without any task-specific fine-tuning. This is achieved by training the model on a large set of diverse tasks specified through natural language prompts. The model is able to understand and complete tasks like answering trivia questions, identifying duplicate questions, and analyzing word usage - all from a single, general-purpose model. What can I use it for? You can use the T0_3B model to quickly prototype and experiment with a variety of natural language processing applications. The model's zero-shot capabilities make it useful for quickly evaluating different task formulations and prompting strategies. Some potential use cases include: Building chatbots or virtual assistants that can handle diverse user queries Developing text analysis tools for sentiment analysis, topic classification, and more Augmenting existing NLP pipelines with a flexible, general-purpose model Things to try Try providing the T0_3B model with prompts that involve logical reasoning, common sense understanding, or task descriptions that are quite different from the training data. Observe how the model performs and explore ways to improve the prompting for better results. Additionally, experiment with different model variants like T0pp to see how the performance and capabilities change.

Updated 5/28/2024

Text-to-Text