Update README.md
Browse files
README.md
CHANGED
@@ -23,6 +23,8 @@ This is an RL fine tuned model of [Teknium](https://huggingface.co/teknium)'s [O
|
|
23 |
|
24 |
DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.
|
25 |
|
|
|
|
|
26 |
# Training Details
|
27 |
|
28 |
DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.
|
|
|
23 |
|
24 |
DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.
|
25 |
|
26 |
+
Errata: Due to an issue with the DPO-only version failing to generate an eos token, this model was additional SFT with 7000 rows from the openhermes dataset to teach the model to use the eos_token again to end the turn. This resulted in lower benchmark scores. You can find the original DPO-only model in the `dpo-v0` branch.
|
27 |
+
|
28 |
# Training Details
|
29 |
|
30 |
DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.
|