Chargoddard

Models by this creator

llama3-42b-v0

chargoddard

Total Score

111

The llama3-42b-v0 model is a pruned version of Meta's Llama 3 70B foundation model. It was created by chargoddard using the methodology described in The Unreasonable Ineffectiveness of the Deeper Layers to prune the base Llama 3 model down to 42B parameters. The model was then further trained on around 100M tokens from the JeanKaddour/minipile dataset using QLoRA. This pruned model is intended to be used as an untrained foundation, with appropriate prompts, as injecting random noise into the latent space will produce "deranged results". Model inputs and outputs Inputs The llama3-42b-v0 model accepts text input only. Outputs The model generates text and code output. Capabilities The llama3-42b-v0 model has been evaluated on a variety of benchmarks, including MMLU, Winogrande, and HellaSwag, where it achieves respectable performance. However, the maintainer notes that the model is still being evaluated and may exhibit "incredibly dumb" behavior, so it should be treated as an untrained foundation model. What can I use it for? Given the model's status as an untrained foundation, it is likely most useful for researchers and developers looking to experiment with pruning techniques or continue pre-training on additional data. The maintainer cautions against using the model with Llama 3's instruction format, as this will lead to "deranged results". Instead, users should focus on developing appropriate prompts to leverage the model's capabilities. Things to try Developers interested in exploring the llama3-42b-v0 model could try fine-tuning it on specific downstream tasks or datasets to evaluate its performance. Additionally, experimenting with different pruning techniques and training regimes could yield interesting insights about the model's behavior and potential.

Read more

Updated 5/28/2024

📉

Yi-34B-Llama

chargoddard

Total Score

56

Yi-34B-Llama is an AI model that has been derived from the Llama language model developed by the FAIR team at Meta AI. The model has had its tensors renamed to match the standard Llama modeling code, allowing it to be loaded without the need for trust_remote_code. The llama-tokenizer branch also uses the Llama tokenizer class. This model shares similarities with other Llama-based models like Llama-2-7b-longlora-100k-ft, llama2-7b-chat-hf-codeCherryPop-qLoRA-merged, llama-13b, llama-65b, and llama-7b-hf, all of which are based on the Llama architecture. Model inputs and outputs Yi-34B-Llama is a text-to-text model, meaning it takes text as input and generates text as output. The model can be used for a variety of natural language processing tasks, such as language generation, question answering, and text summarization. Inputs Text prompts that the model can use to generate output Outputs Generated text based on the input prompts Capabilities Yi-34B-Llama can be used for a variety of text-based tasks, such as generating coherent and contextual responses to prompts, answering questions, and summarizing text. The model has been trained on a large corpus of text data and can leverage its knowledge to produce human-like outputs. What can I use it for? The Yi-34B-Llama model can be used for a wide range of applications, such as chatbots, content generation, and language understanding. Researchers and developers can use this model as a starting point for building more specialized AI systems or fine-tuning it on specific tasks. The model's capabilities make it a useful tool for projects involving natural language processing and generation. Things to try Researchers and developers can experiment with the Yi-34B-Llama model by prompting it with different types of text and evaluating its performance on various tasks. They can also explore ways to fine-tune or adapt the model to their specific needs, such as by incorporating additional training data or adjusting the model architecture.

Read more

Updated 5/28/2024

🗣️

mixtralnt-4x7b-test

chargoddard

Total Score

56

The mixtralnt-4x7b-test model is an experimental AI model created by the maintainer chargoddard. It is a Sparse Mixture of Experts (MoE) model that combines parts from several pre-trained Mistral models, including Q-bert/MetaMath-Cybertron-Starling, NeverSleep/Noromaid-7b-v0.1.1, teknium/Mistral-Trismegistus-7B, meta-math/MetaMath-Mistral-7B, and PocketDoc/Dans-AdventurousWinds-Mk2-7b. The maintainer is experimenting with a hack to populate the MoE gates in order to take advantage of the experts. Model inputs and outputs The mixtralnt-4x7b-test model is a text-to-text model, meaning it takes text as input and generates text as output. The specific input and output formats are not clearly defined, but the maintainer suggests the model may use an "alpaca??? or chatml??? format". Inputs Text prompts in an unspecified format, potentially related to alpaca or chatml Outputs Generated text in response to the input prompts Capabilities The mixtralnt-4x7b-test model is capable of generating coherent text, taking advantage of the experts from the combined Mistral models. However, the maintainer is still experimenting with the hack used to populate the MoE gates, so the full capabilities of the model are not yet known. What can I use it for? The mixtralnt-4x7b-test model could potentially be used for a variety of text generation tasks, such as creative writing, conversational responses, or other applications that require generating coherent text. However, since the model is still in an experimental stage, it's unclear how it would perform compared to more established language models. Things to try One interesting aspect of the mixtralnt-4x7b-test model is the maintainer's approach of combining parts of several pre-trained Mistral models into a Sparse Mixture of Experts. This technique could lead to improvements in the model's performance and capabilities, but the results are still unknown. It would be worth exploring the model's output quality, coherence, and consistency to see how it compares to other language models.

Read more

Updated 5/28/2024

↗️

llama2-22b

chargoddard

Total Score

46

The llama2-22b model is a large language model developed by Meta's researchers and released by the creator chargoddard. It is a version of Llama 2 with some additional attention heads from the original 33B Llama model. The model has been fine-tuned on around 10 million tokens from the RedPajama dataset to help the added components settle in. This model is not intended for use as-is, but rather to serve as a base for further tuning and adaptation, with the goal of providing greater capacity for learning than the 13B Llama 2 model. The llama2-22b model is similar to other models in the Llama 2 family, such as the Llama-2-13b-hf and Llama-2-13b-chat-hf models, which range in size from 7 billion to 70 billion parameters. These models were developed and released by Meta's AI research team. Model inputs and outputs Inputs The llama2-22b model takes in text as its input. Outputs The model generates text as its output. Capabilities The llama2-22b model has been evaluated on various academic benchmarks, including commonsense reasoning, world knowledge, reading comprehension, and math. The model performs well on these tasks, with the 70B version achieving the best results among the Llama 2 models. The model also exhibits good performance on safety metrics, such as truthfulness and low toxicity, especially in the fine-tuned Llama-2-Chat versions. What can I use it for? The llama2-22b model is intended for commercial and research use in English. While the fine-tuned Llama-2-Chat models are optimized for assistant-like dialogue, the pretrained llama2-22b model can be adapted for a variety of natural language generation tasks, such as text summarization, language translation, and content creation. However, developers should perform thorough safety testing and tuning before deploying any applications of the model, as the potential outputs cannot be fully predicted. Things to try One interesting aspect of the llama2-22b model is its use of additional attention heads from the original 33B Llama model. This architectural change may allow the model to better capture certain linguistic patterns or relationships, potentially leading to improved performance on specific tasks. Researchers and developers could explore fine-tuning the model on domain-specific datasets or incorporating it into novel application architectures to unlock its full potential.

Read more

Updated 9/6/2024