Sheared-LLaMA-1.3B

Maintainer: princeton-nlp

Total Score

85

Last updated 5/28/2024

🖼️

PropertyValue
Run this modelRun on HuggingFace
API specView on HuggingFace
Github linkNo Github link provided
Paper linkNo paper link provided

Create account to get full access

or

If you already have an account, we'll log you in

Model overview

Sheared-LLaMA-1.3B is a model pruned and further pre-trained from the meta-llama/Llama-2-7b-hf model. The maintainer, princeton-nlp, dynamically loaded data from different domains in the RedPajama dataset to prune and continue pre-training the model. They used 0.4B tokens for pruning and 50B tokens for continued pre-training the pruned model.

This model is smaller-scale compared to the original LLaMA models, but shares the same vocabulary. It was derived by the maintainer with a budget of 50B tokens, leveraging existing strong large language models.

Model inputs and outputs

Inputs

  • Natural language text

Outputs

  • Continued generation of natural language text

Capabilities

The Sheared-LLaMA-1.3B model outperforms existing large language models on an extensive set of downstream tasks including reasoning, reading comprehension, language modeling, and knowledge-intensive tasks.

What can I use it for?

The Sheared-LLaMA-1.3B model can be used for a variety of natural language processing tasks, such as text generation, question answering, and language modeling. Its strong performance on downstream tasks makes it a viable option for projects that require robust language understanding and generation capabilities.

Things to try

Given the model's smaller size compared to the original LLaMA models, it could be an interesting option to explore for deployments with more constrained computational resources. The maintainer's approach of pruning and continued pre-training on diverse datasets also suggests that the model may have unique strengths, such as improved efficiency or specialized knowledge, that could be worth investigating further.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

🔍

Sheared-LLaMA-2.7B

princeton-nlp

Total Score

54

Sheared-LLaMA-2.7B is a pruned and further pre-trained model derived from the meta-llama/Llama-2-7b-hf model. The model was developed by the princeton-nlp team and is available on the Hugging Face Hub. Like the original LLaMA model, Sheared-LLaMA-2.7B is a large language model based on the transformer architecture. However, this model was pruned and further trained on the RedPajama dataset using a budget of 50 billion tokens. Model inputs and outputs Inputs Text prompts Outputs Continuation of the input text, generating coherent and relevant text Capabilities The Sheared-LLaMA-2.7B model has demonstrated strong performance across a variety of downstream tasks, including reasoning, reading comprehension, language modeling, and knowledge-intensive tasks. The model outperforms existing large language models like OPT-2.7B and Pythia-2.8B on average performance metrics. What can I use it for? The Sheared-LLaMA-2.7B model can be used for a wide range of natural language processing tasks, such as text generation, question answering, summarization, and content creation. Developers and researchers can fine-tune the model for specific applications or use it as a strong baseline for further research and development. Things to try One interesting aspect of the Sheared-LLaMA-2.7B model is that it was trained with a budget of only 50 billion tokens, which is significantly less than the 1 trillion tokens used to train the original LLaMA models. This suggests that the model's performance can be achieved with a more efficient and cost-effective training process, making it an attractive option for those with limited computational resources.

Read more

Updated Invalid Date

llama3-42b-v0

chargoddard

Total Score

111

The llama3-42b-v0 model is a pruned version of Meta's Llama 3 70B foundation model. It was created by chargoddard using the methodology described in The Unreasonable Ineffectiveness of the Deeper Layers to prune the base Llama 3 model down to 42B parameters. The model was then further trained on around 100M tokens from the JeanKaddour/minipile dataset using QLoRA. This pruned model is intended to be used as an untrained foundation, with appropriate prompts, as injecting random noise into the latent space will produce "deranged results". Model inputs and outputs Inputs The llama3-42b-v0 model accepts text input only. Outputs The model generates text and code output. Capabilities The llama3-42b-v0 model has been evaluated on a variety of benchmarks, including MMLU, Winogrande, and HellaSwag, where it achieves respectable performance. However, the maintainer notes that the model is still being evaluated and may exhibit "incredibly dumb" behavior, so it should be treated as an untrained foundation model. What can I use it for? Given the model's status as an untrained foundation, it is likely most useful for researchers and developers looking to experiment with pruning techniques or continue pre-training on additional data. The maintainer cautions against using the model with Llama 3's instruction format, as this will lead to "deranged results". Instead, users should focus on developing appropriate prompts to leverage the model's capabilities. Things to try Developers interested in exploring the llama3-42b-v0 model could try fine-tuning it on specific downstream tasks or datasets to evaluate its performance. Additionally, experimenting with different pruning techniques and training regimes could yield interesting insights about the model's behavior and potential.

Read more

Updated Invalid Date

⛏️

llama-7b-hf-transformers-4.29

elinas

Total Score

53

The llama-7b-hf-transformers-4.29 is an open-source large language model developed by the FAIR team of Meta AI. It is a 7-billion parameter model based on the transformer architecture, and is part of the larger LLaMA family of models that also includes 13B, 33B, and 65B parameter versions. The model was trained between December 2022 and February 2023 on a mix of publicly available online data, including data from sources like CCNet, C4, GitHub, Wikipedia, Books, ArXiv, and Stack Exchange. The llama-7b-hf-transformers-4.29 model was converted to work with the latest Transformers library on Hugging Face, resolving some issues with the EOS token. It is licensed under a non-commercial bespoke license, and can be used for research on large language models, including exploring potential applications, understanding model capabilities and limitations, and developing techniques to improve them. Model inputs and outputs Inputs Text prompts of arbitrary length Outputs Continuation of the input text, generating coherent and contextually relevant language Capabilities The llama-7b-hf-transformers-4.29 model exhibits strong performance on a variety of natural language understanding and generation tasks, including commonsense reasoning, reading comprehension, and question answering. It was evaluated on benchmarks like BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, and others, demonstrating capabilities comparable to or better than other large language models like GPT-J. The model also shows promising results in terms of mitigating biases, with lower average bias scores across categories like gender, religion, race, and sexual orientation compared to the original LLaMA models. However, as with any large language model, the llama-7b-hf-transformers-4.29 may still exhibit biases and generate inaccurate or unsafe content, so it should be used with appropriate caution and safeguards. What can I use it for? The primary intended use of the llama-7b-hf-transformers-4.29 model is for research on large language models, such as exploring potential applications, understanding model capabilities and limitations, and developing techniques to improve them. Researchers in natural language processing, machine learning, and artificial intelligence would be the main target users for this model. While the model is not recommended for direct deployment in production applications without further risk evaluation and mitigation, it could potentially be used as a starting point for fine-tuning on specific tasks or domains, or as a general-purpose language model for prototyping and experimentation. Things to try One interesting aspect of the llama-7b-hf-transformers-4.29 model is its performance on commonsense reasoning tasks, which can provide insights into the model's understanding of the world and its ability to make inferences. Prompting the model with questions that require commonsense knowledge, such as "What is the largest animal?" or "What do you need to do to make a cake?", and analyzing its responses could be a fruitful area of exploration. Additionally, given the model's potential biases, it could be worthwhile to investigate the model's behavior on prompts related to sensitive topics, such as gender, race, or religion, and to develop techniques for mitigating these biases.

Read more

Updated Invalid Date

↗️

llama2-22b

chargoddard

Total Score

46

The llama2-22b model is a large language model developed by Meta's researchers and released by the creator chargoddard. It is a version of Llama 2 with some additional attention heads from the original 33B Llama model. The model has been fine-tuned on around 10 million tokens from the RedPajama dataset to help the added components settle in. This model is not intended for use as-is, but rather to serve as a base for further tuning and adaptation, with the goal of providing greater capacity for learning than the 13B Llama 2 model. The llama2-22b model is similar to other models in the Llama 2 family, such as the Llama-2-13b-hf and Llama-2-13b-chat-hf models, which range in size from 7 billion to 70 billion parameters. These models were developed and released by Meta's AI research team. Model inputs and outputs Inputs The llama2-22b model takes in text as its input. Outputs The model generates text as its output. Capabilities The llama2-22b model has been evaluated on various academic benchmarks, including commonsense reasoning, world knowledge, reading comprehension, and math. The model performs well on these tasks, with the 70B version achieving the best results among the Llama 2 models. The model also exhibits good performance on safety metrics, such as truthfulness and low toxicity, especially in the fine-tuned Llama-2-Chat versions. What can I use it for? The llama2-22b model is intended for commercial and research use in English. While the fine-tuned Llama-2-Chat models are optimized for assistant-like dialogue, the pretrained llama2-22b model can be adapted for a variety of natural language generation tasks, such as text summarization, language translation, and content creation. However, developers should perform thorough safety testing and tuning before deploying any applications of the model, as the potential outputs cannot be fully predicted. Things to try One interesting aspect of the llama2-22b model is its use of additional attention heads from the original 33B Llama model. This architectural change may allow the model to better capture certain linguistic patterns or relationships, potentially leading to improved performance on specific tasks. Researchers and developers could explore fine-tuning the model on domain-specific datasets or incorporating it into novel application architectures to unlock its full potential.

Read more

Updated Invalid Date