hamishivi
/

tulu-v2.5-7b-uf-mean-7b-uf-rm-value

Token Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hamishivi commited on Jun 25

Commit

bacacd2

•

1 Parent(s): b8fdd2e

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ license: apache-2.0
 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
-This is a **value** model produced during the PPO training of [this](https://huggingface.co/hamishivi/tulu-v2.5-ppo-7b-uf-mean) model.
 It was initialised from the [Tulu v2.5 7B UltraFeedback RM](https://huggingface.co/hamishivi/tulu-v2.5-7b-uf-rm).
 We release the value model as it may provide a good starting point for additional research or improved decoding with our released PPO models.

 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
+This is a **value** model produced during the PPO training of [this](hamishivi/tulu-v2.5-7b-uf-mean-7b-uf-rm) model.
 It was initialised from the [Tulu v2.5 7B UltraFeedback RM](https://huggingface.co/hamishivi/tulu-v2.5-7b-uf-rm).
 We release the value model as it may provide a good starting point for additional research or improved decoding with our released PPO models.