## Model overview

`prometheus-13b-v1.0` is an alternative to GPT-4 for fine-grained evaluation of language models and reward models for Reinforcement Learning from Human Feedback (RLHF). It was developed by [kaist-ai](https://aimodels.fyi/creators/huggingFace/kaist-ai) and is based on the [Llama-2-Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) model. `prometheus-13b-v1.0` was fine-tuned on 100K feedback samples from the [Feedback Collection](https://huggingface.co/datasets/kaist-ai/Feedback-Collection) dataset, allowing it to perform specialized evaluation of long-form responses. Compared to GPT-3.5-Turbo and Llama-2-Chat 70B, `prometheus-13b-v1.0` outperforms on various benchmarks and is on par with GPT-4 in performance.

## Model inputs and outputs

### Inputs
- **Instruction**: The task or prompt to be evaluated
- **Response**: The long-form text to be evaluated
- **Reference answer**: The expected or target response
- **Score rubric**: Criteria for evaluating the response on a 1-5 scale

### Outputs
- **Score**: A numeric score between 1-5 evaluating the quality of the provided response based on the given rubric

## Capabilities

`prometheus-13b-v1.0` is specialized for fine-grained evaluation of language model outputs, outperforming GPT-3.5-Turbo and Llama-2-Chat 70B on various benchmarks. It can be used to evaluate LLMs against customized criteria like child readability, cultural sensitivity, or creativity. Additionally, `prometheus-13b-v1.0` could serve as a reward model for training LLMs using Reinforcement Learning from Human Feedback (RLHF).

## What can I use it for?

`prometheus-13b-v1.0` can be a powerful and cost-effective alternative to GPT-4 for evaluating LLMs and training reward models for RLHF. Developers can use it to assess the quality of LLM outputs against their specific use case requirements, such as evaluating the readability or cultural sensitivity of generated text. This could be valuable for applications in education, content moderation, or personalized recommendation systems.

## Things to try

One interesting aspect of `prometheus-13b-v1.0` is its ability to perform fine-grained evaluation of LLM outputs. You could experiment with using it to assess the performance of different LLMs on specific criteria, such as factual accuracy, logical reasoning, or creativity. This could help identify the strengths and weaknesses of different models and guide further model development or fine-tuning.

Another potential application is using `prometheus-13b-v1.0` as a reward model for training LLMs using RLHF. By providing detailed feedback on the quality of model outputs, `prometheus-13b-v1.0` could help shape the learning process and guide the model towards generating higher-quality responses.

Kaist-ai

Models by this creator

prometheus-13b-v1.0