allenai
/

tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hamishivi commited on Jun 12

Commit

8c0698c

•

1 Parent(s): 64a533e

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ datasets:
 - allenai/tulu-v2-sft-mixture
 language:
 - en
-base_model: allenai/tulu-2-dpo-13b
 license: apache-2.0
 ---
 <center>
@@ -19,6 +19,7 @@ license: apache-2.0
 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
 This model is trained on the UltraFeedback dataset (using the per-aspect/fine-grained scores for deciding chosen and rejected) using PPO.
 We used a 13B RM trained on our preference data mix, and then used the UltraFeedback prompts during PPO training.
 For more details, read the paper:

 - allenai/tulu-v2-sft-mixture
 language:
 - en
+base_model: allenai/tulu-v2.5-13b-preference-mix-rm
 license: apache-2.0
 ---
 <center>
 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
 This model is trained on the UltraFeedback dataset (using the per-aspect/fine-grained scores for deciding chosen and rejected) using PPO.
+It was initialised from the [Tulu v2.5 13B preference mixture RM](https://huggingface.co/allenai/tulu-v2.5-13b-preference-mix-rm).
 We used a 13B RM trained on our preference data mix, and then used the UltraFeedback prompts during PPO training.
 For more details, read the paper: