Text Generation
Transformers
Safetensors
English
mistral
conversational
text-generation-inference
Inference Endpoints
winglian commited on
Commit
9541d42
1 Parent(s): 7b15fa2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -23,6 +23,8 @@ This is an RL fine tuned model of [Teknium](https://huggingface.co/teknium)'s [O
23
 
24
  DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.
25
 
 
 
26
  # Training Details
27
 
28
  DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.
 
23
 
24
  DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.
25
 
26
+ Errata: Due to an issue with the DPO-only version failing to generate an eos token, this model was additional SFT with 7000 rows from the openhermes dataset to teach the model to use the eos_token again to end the turn. This resulted in lower benchmark scores. You can find the original DPO-only model in the `dpo-v0` branch.
27
+
28
  # Training Details
29
 
30
  DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.