Kaist-ai

Models by this creator

🏅

prometheus-13b-v1.0

kaist-ai

Total Score

115

prometheus-13b-v1.0 is an alternative to GPT-4 for fine-grained evaluation of language models and reward models for Reinforcement Learning from Human Feedback (RLHF). It was developed by kaist-ai and is based on the Llama-2-Chat model. prometheus-13b-v1.0 was fine-tuned on 100K feedback samples from the Feedback Collection dataset, allowing it to perform specialized evaluation of long-form responses. Compared to GPT-3.5-Turbo and Llama-2-Chat 70B, prometheus-13b-v1.0 outperforms on various benchmarks and is on par with GPT-4 in performance. Model inputs and outputs Inputs Instruction**: The task or prompt to be evaluated Response**: The long-form text to be evaluated Reference answer**: The expected or target response Score rubric**: Criteria for evaluating the response on a 1-5 scale Outputs Score**: A numeric score between 1-5 evaluating the quality of the provided response based on the given rubric Capabilities prometheus-13b-v1.0 is specialized for fine-grained evaluation of language model outputs, outperforming GPT-3.5-Turbo and Llama-2-Chat 70B on various benchmarks. It can be used to evaluate LLMs against customized criteria like child readability, cultural sensitivity, or creativity. Additionally, prometheus-13b-v1.0 could serve as a reward model for training LLMs using Reinforcement Learning from Human Feedback (RLHF). What can I use it for? prometheus-13b-v1.0 can be a powerful and cost-effective alternative to GPT-4 for evaluating LLMs and training reward models for RLHF. Developers can use it to assess the quality of LLM outputs against their specific use case requirements, such as evaluating the readability or cultural sensitivity of generated text. This could be valuable for applications in education, content moderation, or personalized recommendation systems. Things to try One interesting aspect of prometheus-13b-v1.0 is its ability to perform fine-grained evaluation of LLM outputs. You could experiment with using it to assess the performance of different LLMs on specific criteria, such as factual accuracy, logical reasoning, or creativity. This could help identify the strengths and weaknesses of different models and guide further model development or fine-tuning. Another potential application is using prometheus-13b-v1.0 as a reward model for training LLMs using RLHF. By providing detailed feedback on the quality of model outputs, prometheus-13b-v1.0 could help shape the learning process and guide the model towards generating higher-quality responses.

Read more

Updated 5/28/2024