polylm-13b

Maintainer: DAMO-NLP-MT

Total Score

51

Last updated 5/27/2024

⚙️

PropertyValue
Run this modelRun on HuggingFace
API specView on HuggingFace
Github linkNo Github link provided
Paper linkNo paper link provided

Create account to get full access

or

If you already have an account, we'll log you in

Model overview

PolyLM is a multilingual large language model developed by DAMO-NLP-MT. It is trained on 640 billion tokens across 18 languages, including Chinese, English, Spanish, German, French, and more. This model improves upon existing multilingual models like LLaMA and BLOOM by integrating bilingual data into the training and using a curriculum learning strategy to increase the proportion of non-English data over time. PolyLM is available in two sizes: 1.7 billion and 13 billion parameters.

Model inputs and outputs

PolyLM is a decoder-only language model that can be used for a variety of text-to-text tasks. It takes in natural language prompts or instructions and generates relevant text outputs.

Inputs

  • Natural language prompts or instructions in any of the 18 supported languages

Outputs

  • Generated text outputs in the same language as the input prompt
  • Outputs can be used for tasks like language generation, translation, question answering, and more

Capabilities

PolyLM demonstrates strong multilingual capabilities, outperforming other open-source models like LLaMA and BLOOM on various multilingual tasks while maintaining comparable performance in English. It can be used for tasks like multilingual understanding, question answering, generation, and translation.

What can I use it for?

PolyLM can be used as a powerful multilingual language model for a variety of natural language processing applications. Some potential use cases include:

  • Multilingual content generation: Automatically generating high-quality text in multiple languages for websites, marketing materials, product descriptions, and more.
  • Machine translation: Fine-tuning the model for machine translation between any of the 18 supported languages.
  • Multilingual question answering: Building chatbots or virtual assistants that can understand and respond to queries in multiple languages.
  • Multilingual text summarization: Summarizing long-form content in various languages.

Things to try

One interesting thing to try with PolyLM is its multilingual self-instruction capabilities. The model was trained using a method that automatically generates over 132,000 diverse multilingual instructions, allowing it to better understand and follow instructions across languages. You could experiment with providing the model with prompts or instructions in different languages and see how it responds.

Another idea is to fine-tune PolyLM on a specific multilingual task or domain to further improve its performance. The flexibility of the model allows it to be adapted for a wide range of applications beyond just open-ended language generation.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

💬

madlad400-10b-mt

google

Total Score

62

The madlad400-10b-mt model is a multilingual machine translation model based on the T5 architecture. It was trained by Google on 250 billion tokens covering over 450 languages using publicly available data. The model is competitive with significantly larger models in terms of performance. The model was converted and documented by Juarez Bochi, who was not involved in the original research. Similar models include the madlad400-3b-mt and madlad400-3b-mt models, which are smaller versions of the madlad400-10b-mt model. The PolyLM-13B model is another large multilingual language model, trained by DAMO-NLP-MT. Model inputs and outputs Inputs Text to be translated, potentially in any of over 400 supported languages. Outputs Translated text in the target language. Capabilities The madlad400-10b-mt model is a powerful multilingual machine translation model that can translate text between over 400 different languages. It achieves strong performance, even compared to much larger models, through the use of a large and diverse training dataset. What can I use it for? The primary intended use of the madlad400-10b-mt model is for machine translation and other multilingual NLP tasks. Researchers and developers working on projects that require translation between a wide range of languages may find this model particularly useful. Things to try Some interesting things to try with the madlad400-10b-mt model include: Exploring the model's performance on low-resource language pairs, which can be a key challenge for machine translation. Analyzing the model's outputs to better understand its strengths, weaknesses, and potential biases. Fine-tuning the model on domain-specific data to see if it can be adapted for specialized translation tasks. Comparing the model's performance to other large multilingual models, such as the PolyLM-13B, to gain insights into the state of the art in this field.

Read more

Updated Invalid Date

🤿

madlad400-3b-mt

google

Total Score

69

The madlad400-3b-mt is a multilingual machine translation model based on the T5 architecture that was trained on 1 trillion tokens covering over 450 languages using publicly available data. Developed by Google, this model is competitive with significantly larger models in terms of performance. The model was trained using a similar approach to the Flan-T5 models, which involved fine-tuning the T5 architecture on a mixture of tasks and datasets to improve zero-shot and few-shot performance. Like Flan-T5, the madlad400-3b-mt model can be used for a variety of natural language processing tasks, with a focus on machine translation and multilingual applications. Model Inputs and Outputs Inputs Text to be translated or processed, with a language token `` prepended to indicate the target language. Outputs Translated text or output for the given natural language processing task. Capabilities The madlad400-3b-mt model has been trained on a massive multilingual dataset, allowing it to perform well on a wide range of languages. It can be used for tasks like machine translation, question answering, and text generation, with competitive performance compared to much larger models. What can I use it for? The madlad400-3b-mt model is primarily intended for research purposes, where it can be used to explore the capabilities and limitations of large language models in a multilingual setting. Researchers may find it useful for tasks like zero-shot and few-shot learning, as well as investigating bias and fairness issues in language models. Things to Try One interesting aspect of the madlad400-3b-mt model is its ability to handle long sequences of text, thanks to the use of ALiBi position embeddings. You could try generating or processing text with longer context lengths to see how the model performs. Additionally, the model's multilingual capabilities make it a good candidate for exploring cross-lingual transfer learning, where you fine-tune the model on a task in one language and then evaluate its performance on the same task in another language.

Read more

Updated Invalid Date

🤿

madlad400-3b-mt

jbochi

Total Score

115

The madlad400-3b-mt model is a multilingual machine translation model based on the T5 architecture. It was trained on over 1 trillion tokens covering more than 450 languages using publicly available data. Despite its large size, the model is competitive with significantly larger models in terms of performance. The model was converted from the original checkpoints and the model card was written by the maintainer Juarez Bochi, who was not involved in the original research. The model is similar to other large multilingual models like distilbert-base-multilingual-cased, btlm-3b-8k-base, nllb-200-3.3B, and flan-t5-xl in that they are all large, multilingual language models. However, the madlad400-3b-mt model is unique in its breadth of coverage, spanning over 450 languages. Model Inputs and Outputs Inputs Text**: The model takes text as input, which can be in any of the 450+ supported languages. Outputs Translated Text**: The model outputs translated text, with the target language determined by the input prompt. Capabilities The madlad400-3b-mt model is capable of translating text between a wide range of languages, making it useful for tasks like multi-lingual communication, content localization, and language learning. The model's large size and training on over 1 trillion tokens gives it strong performance, allowing it to compete with much larger models in terms of translation quality. What Can I Use It For? The madlad400-3b-mt model could be useful for a variety of applications that require multilingual text translation, such as: Content Localization**: Translating website content, marketing materials, or product information into multiple languages to reach a global audience. Multilingual Communication**: Enabling communication between speakers of different languages, such as in business meetings, customer support, or personal conversations. Language Learning**: Providing translation support for language learners to help them understand and practice in their target language. Research**: Exploring the capabilities and limitations of large multilingual language models, and using the model as a foundation for further research and development. Things to Try One interesting aspect of the madlad400-3b-mt model is its ability to handle a very large number of languages. You could experiment with translating text between less common language pairs to see the model's performance and limitations. Additionally, you could try fine-tuning the model on domain-specific data to improve its performance for specialized applications, such as medical or legal translation.

Read more

Updated Invalid Date

🤖

decapoda-research-llama-7B-hf

baffo32

Total Score

49

The decapoda-research-llama-7B-hf model is a 7B parameter version of the LLaMA language model developed by the FAIR team at Meta AI. It was converted to work with the Transformers/HuggingFace library by the maintainer baffo32. This model is similar to other open-source LLaMA-based models like llama-7b-hf-transformers-4.29 and llama-7b-hf, which also provide HuggingFace-compatible versions of the 7B LLaMA model. Model inputs and outputs The decapoda-research-llama-7B-hf model is an autoregressive language model that takes text as input and generates text as output. It can be used for a variety of natural language processing tasks such as language generation, question answering, and text summarization. Inputs Arbitrary text in a supported language (primarily English, but the model was also trained on 19 other languages) Outputs Generated text in the same language as the input Capabilities The decapoda-research-llama-7B-hf model is capable of generating coherent and fluent text across a wide range of domains, from creative writing to technical documentation. It can also be fine-tuned for more specialized tasks like question-answering or code generation. The model's performance is competitive with other open-source large language models of similar size. What can I use it for? The decapoda-research-llama-7B-hf model can be used for a variety of natural language processing applications, such as: Text Generation**: The model can be used to generate human-like text on a wide range of topics, which can be useful for applications like content creation, story writing, and dialogue systems. Question Answering**: The model can be fine-tuned on question-answering datasets to provide accurate responses to queries on a variety of subjects. Summarization**: The model can be used to generate concise summaries of longer text documents, which can be helpful for applications like news digests or research paper reviews. Language Translation**: While the model was primarily trained on English, its multilingual capabilities allow it to be used for translation between the 20 languages it was trained on. Things to try One interesting aspect of the decapoda-research-llama-7B-hf model is its ability to generate coherent and relevant text based on relatively short prompts. This can be useful for exploring the model's knowledge and reasoning capabilities, as well as its potential biases and limitations. For example, you could try prompting the model with open-ended questions or hypothetical scenarios and observe the quality and consistency of its responses. Another interesting avenue to explore is the model's few-shot learning capabilities. By fine-tuning the model on small, domain-specific datasets, it may be possible to adapt the model for specialized tasks like code generation, legal document summarization, or medical diagnosis assistance. The transferability of the model's learned representations could make it a powerful starting point for building custom language models.

Read more

Updated Invalid Date