Text Generation
Transformers
PyTorch
English
llama
conversational
text-generation-inference
Inference Endpoints
hamishivi commited on
Commit
8c0698c
1 Parent(s): 64a533e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -7,7 +7,7 @@ datasets:
7
  - allenai/tulu-v2-sft-mixture
8
  language:
9
  - en
10
- base_model: allenai/tulu-2-dpo-13b
11
  license: apache-2.0
12
  ---
13
  <center>
@@ -19,6 +19,7 @@ license: apache-2.0
19
  Tulu is a series of language models that are trained to act as helpful assistants.
20
  Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
21
  This model is trained on the UltraFeedback dataset (using the per-aspect/fine-grained scores for deciding chosen and rejected) using PPO.
 
22
  We used a 13B RM trained on our preference data mix, and then used the UltraFeedback prompts during PPO training.
23
 
24
  For more details, read the paper:
 
7
  - allenai/tulu-v2-sft-mixture
8
  language:
9
  - en
10
+ base_model: allenai/tulu-v2.5-13b-preference-mix-rm
11
  license: apache-2.0
12
  ---
13
  <center>
 
19
  Tulu is a series of language models that are trained to act as helpful assistants.
20
  Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
21
  This model is trained on the UltraFeedback dataset (using the per-aspect/fine-grained scores for deciding chosen and rejected) using PPO.
22
+ It was initialised from the [Tulu v2.5 13B preference mixture RM](https://huggingface.co/allenai/tulu-v2.5-13b-preference-mix-rm).
23
  We used a 13B RM trained on our preference data mix, and then used the UltraFeedback prompts during PPO training.
24
 
25
  For more details, read the paper: