sarvam-2b-v0.5

Maintainer: sarvamai

Total Score

69

Last updated 9/18/2024

⛏️

PropertyValue
Run this modelRun on HuggingFace
API specView on HuggingFace
Github linkNo Github link provided
Paper linkNo paper link provided

Create account to get full access

or

If you already have an account, we'll log you in

Model overview

The sarvam-2b-v0.5 is an early checkpoint of the sarvam-2b language model, a small yet powerful model pre-trained from scratch on 2 trillion tokens. It is trained to be good at 10 Indic languages (Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu) plus English. The final checkpoint of sarvam-2b will be released soon, and it will be trained on a data mixture of 4 trillion tokens, containing equal parts English (2T) and Indic (2T) tokens.

This early checkpoint has not undergone any post-training, but you can see its current capabilities in this video. The model was trained with the NVIDIA NeMo Framework on the Yotta Shakti Cloud using HGX H100 systems.

Similar models include the OpenHathi-7B-Hi-v0.1-Base and the orca_mini_3b, both of which are based on the LLaMA-2 architecture.

Model inputs and outputs

Inputs

  • Text prompts: The model accepts text prompts as input, which can be in any of the 11 supported languages (10 Indic languages plus English).

Outputs

  • Text completions: The model generates text completions based on the input prompt, continuing the sequence of text.

Capabilities

The sarvam-2b-v0.5 model has demonstrated strong performance on a variety of Indic language tasks, including text generation, translation, and understanding. Its tokenizer is designed to be efficient for Indic languages, with an average fertility score that is significantly lower than other popular models like LLaMA-3.1, Gemma-2, and GPT-4. This allows the model to handle Indic languages more effectively than some of its counterparts.

What can I use it for?

The sarvam-2b-v0.5 model can be used for a variety of natural language processing tasks in the Indic language domain, such as:

  • Text generation: The model can be used to generate coherent and fluent text in any of the 10 Indic languages or English.
  • Translation: The model can be fine-tuned for translation tasks between Indic languages and English.
  • Question answering: The model can be fine-tuned on question-answering datasets to provide accurate answers in Indic languages.

To get started with using the model, you can check out this notebook on Google Colab.

Things to try

One interesting thing to try with the sarvam-2b-v0.5 model is to explore its multilingual capabilities. Since it is trained on a mix of Indic languages and English, you could experiment with prompts that combine multiple languages, or try generating text that seamlessly transitions between different languages. This could be useful for applications that need to handle code-switching or multilingual content.

Another area to explore is the model's performance on different Indic language tasks, such as translation, summarization, or dialogue generation. By fine-tuning the model on task-specific datasets, you could unlock its full potential for real-world applications in the Indic language domain.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

🎲

OpenHathi-7B-Hi-v0.1-Base

sarvamai

Total Score

89

OpenHathi-7B-Hi-v0.1-Base is a large language model developed by Sarvam AI that is based on Llama2 and trained on Hindi, English, and Hinglish data. It is a 7 billion parameter model, making it a mid-sized model compared to similar offerings like the alpaca-30b and PMC_LLAMA_7B models. This base model is designed to be fine-tuned on specific tasks, rather than used directly. Model inputs and outputs OpenHathi-7B-Hi-v0.1-Base is a text-to-text model, meaning it takes in text and generates new text in response. The model can handle a variety of language inputs, including Hindi, English, and code. Inputs Text prompts in Hindi, English, or Hinglish Outputs Generated text in response to the input prompt Capabilities OpenHathi-7B-Hi-v0.1-Base has broad capabilities in language generation, from open-ended conversation to task-oriented outputs. The model can be used for tasks like text summarization, question answering, and creative writing. It also has the potential to be fine-tuned for more specialized use cases, such as code generation or domain-specific language modeling. What can I use it for? The OpenHathi-7B-Hi-v0.1-Base model could be useful for a variety of applications that require language understanding and generation in Hindi, English, or a mix of the two. Some potential use cases include: Building virtual assistants or chatbots that can communicate in Hindi and English Generating content like news articles, product descriptions, or creative writing in multiple languages Translating between Hindi and English Providing language support for applications targeting Indian users Things to try One interesting thing to try with OpenHathi-7B-Hi-v0.1-Base would be to fine-tune it on a specific domain or task, such as customer service, technical writing, or programming. This could help the model learn the nuances and specialized vocabulary of that area, allowing it to generate more relevant and useful text. Additionally, exploring the model's performance on code-switching between Hindi and English could yield insights into its language understanding capabilities.

Read more

Updated Invalid Date

🧠

gpt2

openai-community

Total Score

2.0K

gpt2 is a transformer-based language model created and released by OpenAI. It is the smallest version of the GPT-2 model, with 124 million parameters. Like other GPT-2 models, gpt2 is a causal language model pretrained on a large corpus of English text using a self-supervised objective to predict the next token in a sequence. This allows the model to learn a general understanding of the English language that can be leveraged for a variety of downstream tasks. The gpt2 model is related to larger GPT-2 variations such as GPT2-Large, GPT2-Medium, and GPT2-XL, which have 355 million, 774 million, and 1.5 billion parameters respectively. These larger models were also developed and released by the OpenAI community. Model inputs and outputs Inputs Text sequence**: The model takes a sequence of text as input, which it uses to generate additional text. Outputs Generated text**: The model outputs a continuation of the input text sequence, generating new text one token at a time in an autoregressive fashion. Capabilities The gpt2 model is capable of generating fluent, coherent text in English on a wide variety of topics. It can be used for tasks like creative writing, text summarization, and language modeling. However, as the OpenAI team notes, the model does not distinguish fact from fiction, so it should not be used for applications that require the generated text to be truthful. What can I use it for? The gpt2 model can be used for a variety of text generation tasks. Researchers may use it to better understand the behaviors, capabilities, and biases of large-scale language models. The model could also be fine-tuned for applications like grammar assistance, auto-completion, creative writing, and chatbots. However, users should be aware of the model's limitations and potential for biased or harmful output, as discussed in the OpenAI model card. Things to try One interesting aspect of the gpt2 model is its ability to generate diverse and creative text from a given prompt. You can experiment with providing the model with different types of starting prompts, such as the beginning of a story, a description of a scene, or even a single word, and see what kind of coherent and imaginative text it generates in response. Additionally, you can try fine-tuning the model on a specific domain or task to see how its performance and output changes compared to the base model.

Read more

Updated Invalid Date

🔎

SambaLingo-Hungarian-Chat

sambanovasystems

Total Score

40

SambaLingo-Hungarian-Chat is a human-aligned chat model trained in Hungarian and English. It is fine-tuned from the base model SambaLingo-Hungarian-Base, which adapts the Llama-2-7b model to Hungarian by training on 59 billion tokens from the Hungarian split of the Cultura-X dataset. The chat model is further trained using direct preference optimization. Similar chat models are available for Russian and Arabic. Model inputs and outputs Inputs Natural language text prompts in Hungarian or English Outputs Natural language text responses in Hungarian or English Capabilities The SambaLingo-Hungarian-Chat model can engage in open-ended conversations in both Hungarian and English. It can understand and generate fluent and coherent responses on a wide range of topics. Some example capabilities include: Answering questions and providing information in Hungarian or English Generating creative stories or dialogue in either language Translating between Hungarian and English Providing instructions or explaining complex topics What can I use it for? The SambaLingo-Hungarian-Chat model can be useful for a variety of applications that require natural language interaction in both Hungarian and English, such as: Chatbots or virtual assistants for customer service, education, or entertainment Language learning tools to practice conversational Hungarian or English Multilingual content generation for blogs, articles, or creative writing Text translation between Hungarian and English Things to try Some interesting things to try with the SambaLingo-Hungarian-Chat model include: Engaging in open-ended conversations and observing how the model handles topics that go beyond its training data Exploring the model's code-switching capabilities by mixing Hungarian and English within a single prompt Providing prompts that require factual information retrieval or complex reasoning to see the model's limitations Fine-tuning or further training the model on specialized data to enhance its performance on domain-specific tasks

Read more

Updated Invalid Date

👁️

Meta-Llama-3.1-8B-bnb-4bit

unsloth

Total Score

67

The Meta-Llama-3.1-8B-bnb-4bit model is part of the Meta Llama 3.1 collection of multilingual large language models developed by Meta. This 8B parameter model is optimized for multilingual dialogue use cases and outperforms many open source and closed chat models on common industry benchmarks. It uses an auto-regressive transformer architecture and is trained on a mix of publicly available online data. The model supports text input and output in multiple languages including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. Similar models in the Llama 3.1 family include the Meta-Llama-3.1-70B and Meta-Llama-3.1-405B which offer larger model sizes for more demanding applications. Other related models include the llama-3-8b from Unsloth which provides a finetuned version of the original Llama 3 8B model. Model inputs and outputs Inputs Multilingual Text**: The model accepts text input in multiple languages including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. Multilingual Code**: The model can also accept code snippets in various programming languages. Outputs Multilingual Text**: The model generates text output in the same supported languages as the inputs. Multilingual Code**: The model can generate code outputs in various programming languages. Capabilities The Meta-Llama-3.1-8B-bnb-4bit model is particularly well-suited for multilingual dialogue and conversational tasks, outperforming many open source and closed chat models. It can engage in natural discussions, answer questions, and complete a variety of text generation tasks across different languages. The model also demonstrates strong capabilities in areas like reading comprehension, knowledge reasoning, and code generation. What can I use it for? This model could be used to power multilingual chatbots, virtual assistants, and other conversational AI applications. It could also be fine-tuned for specialized tasks like language translation, text summarization, or creative writing. Developers could leverage the model's outputs to generate synthetic data or distill knowledge into smaller models. The Llama Impact Grants program from Meta also highlights compelling applications of Llama models for societal benefit. Things to try One interesting aspect of this model is its ability to handle code generation in multiple programming languages, in addition to natural language tasks. Developers could experiment with using the model to assist with coding projects, generating test cases, or even drafting technical documentation. The model's multilingual capabilities also open up possibilities for cross-cultural communication and international collaboration.

Read more

Updated Invalid Date