Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ license: apache-2.0
|
|
18 |
|
19 |
Tulu is a series of language models that are trained to act as helpful assistants.
|
20 |
Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
|
21 |
-
This is a **value** model produced during the PPO training of [this](
|
22 |
It was initialised from the [Tulu v2.5 7B UltraFeedback RM](https://huggingface.co/hamishivi/tulu-v2.5-7b-uf-rm).
|
23 |
We release the value model as it may provide a good starting point for additional research or improved decoding with our released PPO models.
|
24 |
|
|
|
18 |
|
19 |
Tulu is a series of language models that are trained to act as helpful assistants.
|
20 |
Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
|
21 |
+
This is a **value** model produced during the PPO training of [this](hamishivi/tulu-v2.5-7b-uf-mean-7b-uf-rm) model.
|
22 |
It was initialised from the [Tulu v2.5 7B UltraFeedback RM](https://huggingface.co/hamishivi/tulu-v2.5-7b-uf-rm).
|
23 |
We release the value model as it may provide a good starting point for additional research or improved decoding with our released PPO models.
|
24 |
|