Text Classification
Transformers
Safetensors
English
llama
text-generation-inference
Inference Endpoints
hamishivi commited on
Commit
5f82170
1 Parent(s): 7235b11

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -22,6 +22,9 @@ This is a 8B reward model used for PPO training trained on the UltraFeedback dat
22
  For more details, read the paper:
23
  [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
24
 
 
 
 
25
  ## Performance
26
 
27
  We evaluate the model on [RewardBench](https://github.com/allenai/reward-bench):
 
22
  For more details, read the paper:
23
  [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
24
 
25
+ **Built with Meta Llama 3!**
26
+ Note that Llama 3 is released under the Meta Llama 3 community license, included here under `llama_3_license.txt`.
27
+
28
  ## Performance
29
 
30
  We evaluate the model on [RewardBench](https://github.com/allenai/reward-bench):