metadata
base_model: teknium/OpenHermes-2.5-Mistral-7B
license: apache-2.0
datasets:
- teknium/openhermes
- argilla/ultrafeedback-binarized-preferences
- Intel/orca_dpo_pairs
language:
- en
library_name: transformers
pipeline_tag: text-generation
DPOpenHermes 7B
OpenHermes x Notus x Neural
This is an RL fine tuned OpenHermes-2.5-Mistral-7B using the Intel/orca_dpo_pairs and argilla/ultrafeedback-binarized-preferences preference datasets for reinforcement learning using Direct Preference Optimization (DPO)
DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.
Training Details
DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.
https://wandb.ai/oaaic/openhermes-dpo/reports/DPOpenHermes--Vmlldzo2MTQ3NDg2
Benchmarks
AGIEval
| Task |Version| Metric |Value | |Stderr|
|------------------------------|------:|--------|-----:|---|-----:|
|agieval_aqua_rat | 0|acc |0.2480|_ |0.0272|
| | |acc_norm|0.2520|_ |0.0273|
|agieval_logiqa_en | 0|acc |0.3810|_ |0.0190|
| | |acc_norm|0.3856|_ |0.0191|
|agieval_lsat_ar | 0|acc |0.2348|_ |0.0280|
| | |acc_norm|0.2304|_ |0.0278|
|agieval_lsat_lr | 0|acc |0.5118|_ |0.0222|
| | |acc_norm|0.5196|_ |0.0221|
|agieval_lsat_rc | 0|acc |0.5948|_ |0.0300|
| | |acc_norm|0.5688|_ |0.0303|
|agieval_sat_en | 0|acc |0.7427|_ |0.0305|
| | |acc_norm|0.7427|_ |0.0305|
|agieval_sat_en_without_passage| 0|acc |0.4563|_ |0.0348|
| | |acc_norm|0.4515|_ |0.0348|
|agieval_sat_math | 0|acc |0.3818|_ |0.0328|
| | |acc_norm|0.3682|_ |0.0326|
Average: 0.4399
GPT4All
| Task |Version| Metric |Value | |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge| 0|acc |0.5930|_ |0.0144|
| | |acc_norm|0.6323|_ |0.0141|
|arc_easy | 0|acc |0.8443|_ |0.0074|
| | |acc_norm|0.8295|_ |0.0077|
|boolq | 1|acc |0.8599|_ |0.0061|
|hellaswag | 0|acc |0.6548|_ |0.0047|
| | |acc_norm|0.8365|_ |0.0037|
|openbookqa | 0|acc |0.3520|_ |0.0214|
| | |acc_norm|0.4640|_ |0.0223|
|piqa | 0|acc |0.8210|_ |0.0089|
| | |acc_norm|0.8335|_ |0.0087|
|winogrande | 0|acc |0.7466|_ |0.0122|
Average: 0.7431
TruthfulQA
hf-causal-experimental (pretrained=openaccess-ai-collective/dpopenhermes-alpha-v1,dtype=bfloat16,trust_remote_code=True,use_accelerate=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
| Task |Version|Metric|Value | |Stderr|
|-------------|------:|------|-----:|---|-----:|
|truthfulqa_mc| 1|mc1 |0.4186|_ |0.0173|
| | |mc2 |0.5847|_ |0.0153|