openaccess-ai-collective
/

DPOpenHermes-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

winglian commited on Dec 3, 2023

Commit

9541d42

•

1 Parent(s): 7b15fa2

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -23,6 +23,8 @@ This is an RL fine tuned model of [Teknium](https://huggingface.co/teknium)'s [O
 DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.
 # Training Details
 DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.

 DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.
+Errata: Due to an issue with the DPO-only version failing to generate an eos token, this model was additional SFT with 7000 rows from the openhermes dataset to teach the model to use the eos_token again to end the turn. This resulted in lower benchmark scores. You can find the original DPO-only model in the `dpo-v0` branch.
 # Training Details
 DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.