Sfairxc

Models by this creator

👀

FsfairX-LLaMA3-RM-v0.1

sfairXC

Total Score

46

The FsfairX-LLaMA3-RM-v0.1 model is a reward function developed by sfairXC that can be used for reinforcement learning with human feedback (RLHF), including proximal policy optimization (PPO), iterative supervised fine-tuning (SFT), and iterative discriminator policy optimization (DPO). The model is based on the meta-llama/Meta-Llama-3-8B-Instruct base model and was trained using the script at https://github.com/WeiXiong UST/RLHF-Reward-Modeling. Similar models include SFR-Iterative-DPO-LLaMA-3-8B-R and Llama-3-8B-SFR-Iterative-DPO-R, which are also RLHF-trained models from Salesforce. Model inputs and outputs Inputs Text data, for which the model will provide a sentiment analysis score. Outputs A sentiment analysis score between 0 and 1, where 1 indicates a positive sentiment and 0 indicates a negative sentiment. Capabilities The FsfairX-LLaMA3-RM-v0.1 model can be used as a reward function for RLHF training of large language models. It provides a way to evaluate the safety and alignment of model outputs with human preferences. What can I use it for? The FsfairX-LLaMA3-RM-v0.1 model can be used as part of an RLHF training pipeline for large language models, such as SFR-Iterative-DPO-LLaMA-3-8B-R and Llama-3-8B-SFR-Iterative-DPO-R. By providing a reward signal that aligns with human preferences, the model can help train more helpful and safe language models. Things to try One interesting thing to try with the FsfairX-LLaMA3-RM-v0.1 model is to use it to evaluate the safety and alignment of model outputs during the RLHF training process. By monitoring the reward scores provided by the model, you can gain insights into how the trained model is progressing in terms of safety and alignment with human preferences.

Read more

Updated 9/19/2024