Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ datasets:
|
|
7 |
- allenai/tulu-v2-sft-mixture
|
8 |
language:
|
9 |
- en
|
10 |
-
base_model: allenai/tulu-
|
11 |
license: apache-2.0
|
12 |
---
|
13 |
<center>
|
@@ -19,6 +19,7 @@ license: apache-2.0
|
|
19 |
Tulu is a series of language models that are trained to act as helpful assistants.
|
20 |
Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
|
21 |
This model is trained on the UltraFeedback dataset (using the per-aspect/fine-grained scores for deciding chosen and rejected) using PPO.
|
|
|
22 |
We used a 13B RM trained on our preference data mix, and then used the UltraFeedback prompts during PPO training.
|
23 |
|
24 |
For more details, read the paper:
|
|
|
7 |
- allenai/tulu-v2-sft-mixture
|
8 |
language:
|
9 |
- en
|
10 |
+
base_model: allenai/tulu-v2.5-13b-preference-mix-rm
|
11 |
license: apache-2.0
|
12 |
---
|
13 |
<center>
|
|
|
19 |
Tulu is a series of language models that are trained to act as helpful assistants.
|
20 |
Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
|
21 |
This model is trained on the UltraFeedback dataset (using the per-aspect/fine-grained scores for deciding chosen and rejected) using PPO.
|
22 |
+
It was initialised from the [Tulu v2.5 13B preference mixture RM](https://huggingface.co/allenai/tulu-v2.5-13b-preference-mix-rm).
|
23 |
We used a 13B RM trained on our preference data mix, and then used the UltraFeedback prompts during PPO training.
|
24 |
|
25 |
For more details, read the paper:
|