allenai
/

llama-3-tulu-2-8b-uf-mean-rm

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hamishivi commited on Jun 21

Commit

5f82170

•

1 Parent(s): 7235b11

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -22,6 +22,9 @@ This is a 8B reward model used for PPO training trained on the UltraFeedback dat
 For more details, read the paper:
 [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
 ## Performance
 We evaluate the model on [RewardBench](https://github.com/allenai/reward-bench):

 For more details, read the paper:
 [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
+**Built with Meta Llama 3!**
+Note that Llama 3 is released under the Meta Llama 3 community license, included here under `llama_3_license.txt`.
 ## Performance
 We evaluate the model on [RewardBench](https://github.com/allenai/reward-bench):