openaccess-ai-collective
/

DPOpenHermes-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

winglian commited on Dec 2, 2023

Commit

c27e00c

•

1 Parent(s): 2d77846

Update README.md

Files changed (1) hide show

README.md +11 -2

README.md CHANGED Viewed

@@ -3,10 +3,19 @@ library_name: peft
 base_model: teknium/OpenHermes-2.5-Mistral-7B
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details

 base_model: teknium/OpenHermes-2.5-Mistral-7B
 ---
+# DPOpenHermes 7B
+## OpenHermes x Notus x Neural
+This is an RL fine tuned OpenHermes using the Intel/orca_dpo_pairs and argilla/ultrafeedback-binarized-preferences preference datasets for reinforcement learning using Direct Preference Optimization (DPO)
+DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.
+# Training Details
+DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.
+https://wandb.ai/oaaic/openhermes-dpo/reports/DPOpenHermes--Vmlldzo2MTQ3NDg2
 ## Model Details