metadata
base_model: fblgit/zephyr-lora-dpo-b1
tags:
- alignment-handbook
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: juanako-7b-v1
results: []
license: artistic-2.0
juanako-7b-v1
This model is a fine-tuned version of fblgit/zephyr-lora-dpo-b1 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.4594
- Rewards/chosen: -1.1095
- Rewards/rejected: -2.3132
- Rewards/accuracies: 0.7964
- Rewards/margins: 1.2037
- Logps/rejected: -220.0052
- Logps/chosen: -217.5506
- Logits/rejected: -2.5535
- Logits/chosen: -2.7973
Model description
It seems to outperforms the original Zephyr in most of the tasks.
I trained Juanako with the same datasets and trainer from alignment-handbook/zephyr-7b-sft-lora
- 1 epoch on DPO with transformers-UNA, the result is fblgit/zephyr-lora-dpo-b1 after merge using FastChat converter.
- finally 1 epoch on DPO with transformers-UNA to fblgit/zephyr-lora-dpo-b1.
Some other experiments were performed as well to test transformers-UNA capabilities on diverse scenarios and models.
This is a complete version of the model, the result of converting LoRa's
Intended uses & limitations
Research purposes.
Training and evaluation data
alignment-handbook DPO with UNA on top of the SFT lora.
Evaluation lm-evaluation-harness
0-Shot
hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 0, batch_size: 8
Tasks | Version | Filter | Metric | Value | Stderr | |
---|---|---|---|---|---|---|
arc_challenge | Yaml | none | acc | 0.5691 | ± | 0.0145 |
none | acc_norm | 0.6041 | ± | 0.0143 | ||
arc_easy | Yaml | none | acc | 0.8363 | ± | 0.0076 |
none | acc_norm | 0.8161 | ± | 0.0079 | ||
hellaswag | Yaml | none | acc | 0.6554 | ± | 0.0047 |
none | acc_norm | 0.8411 | ± | 0.0036 | ||
boolq | Yaml | none | acc | 0.8355 | ± | 0.0065 |
lambada | N/A | none | perplexity | 3.3607 | ± | 0.1398 |
none | acc | 0.7309 | ± | 0.0137 | ||
piqa | Yaml | none | acc | 0.8194 | ± | 0.0090 |
none | acc_norm | 0.8335 | ± | 0.0087 | ||
sciq | Yaml | none | acc | 0.9480 | ± | 0.0070 |
none | acc_norm | 0.8960 | ± | 0.0097 | ||
truthfulqa | N/A | none | bleu_max | 26.0803 | ± | 0.6528 |
- truthfulqa_mc1 | Yaml | none | acc | 0.4198 | ± | 0.0173 |
- truthfulqa_mc2 | Yaml | none | acc | 0.5847 | ± | 0.0153 |
winogrande | Yaml | none | acc | 0.7609 | ± | 0.0120 |
1-Shot
hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 8
Tasks | Version | Filter | Metric | Value | Stderr | |
---|---|---|---|---|---|---|
arc_challenge | Yaml | none | acc | 0.6084 | ± | 0.0143 |
none | acc_norm | 0.6357 | ± | 0.0141 | ||
arc_easy | Yaml | none | acc | 0.8645 | ± | 0.0070 |
none | acc_norm | 0.8645 | ± | 0.0070 | ||
hellaswag | Yaml | none | acc | 0.6475 | ± | 0.0048 |
none | acc_norm | 0.8372 | ± | 0.0037 | ||
boolq | Yaml | none | acc | 0.8609 | ± | 0.0061 |
lambada | N/A | none | perplexity | 3.5484 | ± | 0.1034 |
none | acc | 0.7207 | ± | 0.0107 | ||
piqa | Yaml | none | acc | 0.8259 | ± | 0.0088 |
none | acc_norm | 0.8384 | ± | 0.0086 | ||
sciq | Yaml | none | acc | 0.9730 | ± | 0.0051 |
none | acc_norm | 0.9740 | ± | 0.0050 | ||
truthfulqa | N/A | none | bleu_max | 18.9814 | ± | 0.4805 |
none | acc | 0.4856 | ± | 0.0521 | ||
- truthfulqa_mc1 | Yaml | none | acc | 0.4333 | ± | 0.0173 |
- truthfulqa_mc2 | Yaml | none | acc | 0.5903 | ± | 0.0153 |
winogrande | Yaml | none | acc | 0.7609 | ± | 0.0120 |
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 12
- gradient_accumulation_steps: 16
- total_train_batch_size: 192
- total_eval_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.4966 | 0.15 | 50 | 0.4893 | -1.1759 | -2.2914 | 0.7485 | 1.1155 | -219.7872 | -218.2148 | -2.5450 | -2.7884 |
0.4522 | 0.31 | 100 | 0.4808 | -0.8099 | -1.8893 | 0.7784 | 1.0794 | -215.7659 | -214.5544 | -2.5644 | -2.8095 |
0.5048 | 0.46 | 150 | 0.4706 | -1.0526 | -2.1412 | 0.7725 | 1.0887 | -218.2852 | -216.9814 | -2.5638 | -2.8089 |
0.4853 | 0.62 | 200 | 0.4640 | -1.0787 | -2.2821 | 0.7725 | 1.2034 | -219.6941 | -217.2426 | -2.5460 | -2.7891 |
0.4639 | 0.77 | 250 | 0.4636 | -1.2348 | -2.4583 | 0.8084 | 1.2235 | -221.4559 | -218.8034 | -2.5533 | -2.7970 |
0.4634 | 0.93 | 300 | 0.4601 | -1.1370 | -2.3243 | 0.7964 | 1.1873 | -220.1163 | -217.8257 | -2.5540 | -2.7977 |
- | 1.00 | 300 | 0.4594 | -1.1095 | -2.3132 | 0.7964 | 1.2037 | -220.0052 | -217.5506 | -2.5535 | -2.7973 |
Framework versions
- Transformers 4.35.0-UNA
- Pytorch 2.1.0
- Datasets 2.14.6
- Tokenizers 0.14.1
MMLU Results
1-Shot
hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 1
Tasks | Version | Filter | Metric | Value | Stderr | |
---|---|---|---|---|---|---|
mmlu | N/A | none | acc | 0.6085 | ± | 0.1321 |
- humanities | N/A | none | acc | 0.5405 | ± | 0.1478 |
- formal_logic | Yaml | none | acc | 0.4206 | ± | 0.0442 |
- high_school_european_history | Yaml | none | acc | 0.7576 | ± | 0.0335 |
- high_school_us_history | Yaml | none | acc | 0.8186 | ± | 0.0270 |
- high_school_world_history | Yaml | none | acc | 0.7890 | ± | 0.0266 |
- international_law | Yaml | none | acc | 0.7438 | ± | 0.0398 |
- jurisprudence | Yaml | none | acc | 0.8056 | ± | 0.0383 |
- logical_fallacies | Yaml | none | acc | 0.7791 | ± | 0.0326 |
- moral_disputes | Yaml | none | acc | 0.7023 | ± | 0.0246 |
- moral_scenarios | Yaml | none | acc | 0.2145 | ± | 0.0137 |
- philosophy | Yaml | none | acc | 0.7074 | ± | 0.0258 |
- prehistory | Yaml | none | acc | 0.7377 | ± | 0.0245 |
- professional_law | Yaml | none | acc | 0.4361 | ± | 0.0127 |
- world_religions | Yaml | none | acc | 0.8421 | ± | 0.0280 |
- other | N/A | none | acc | 0.6894 | ± | 0.1091 |
- business_ethics | Yaml | none | acc | 0.5600 | ± | 0.0499 |
- clinical_knowledge | Yaml | none | acc | 0.6981 | ± | 0.0283 |
- college_medicine | Yaml | none | acc | 0.6185 | ± | 0.0370 |
- global_facts | Yaml | none | acc | 0.3300 | ± | 0.0473 |
- human_aging | Yaml | none | acc | 0.6726 | ± | 0.0315 |
- management | Yaml | none | acc | 0.8058 | ± | 0.0392 |
- marketing | Yaml | none | acc | 0.8419 | ± | 0.0239 |
- medical_genetics | Yaml | none | acc | 0.7200 | ± | 0.0451 |
- miscellaneous | Yaml | none | acc | 0.8033 | ± | 0.0142 |
- nutrition | Yaml | none | acc | 0.7288 | ± | 0.0255 |
- professional_accounting | Yaml | none | acc | 0.4929 | ± | 0.0298 |
- professional_medicine | Yaml | none | acc | 0.6801 | ± | 0.0283 |
- virology | Yaml | none | acc | 0.5000 | ± | 0.0389 |
- social_sciences | N/A | none | acc | 0.7195 | ± | 0.0676 |
- econometrics | Yaml | none | acc | 0.5000 | ± | 0.0470 |
- high_school_geography | Yaml | none | acc | 0.7879 | ± | 0.0291 |
- high_school_government_and_politics | Yaml | none | acc | 0.8601 | ± | 0.0250 |
- high_school_macroeconomics | Yaml | none | acc | 0.6231 | ± | 0.0246 |
- high_school_microeconomics | Yaml | none | acc | 0.6471 | ± | 0.0310 |
- high_school_psychology | Yaml | none | acc | 0.8000 | ± | 0.0171 |
- human_sexuality | Yaml | none | acc | 0.7557 | ± | 0.0377 |
- professional_psychology | Yaml | none | acc | 0.6552 | ± | 0.0192 |
- public_relations | Yaml | none | acc | 0.6636 | ± | 0.0453 |
- security_studies | Yaml | none | acc | 0.7184 | ± | 0.0288 |
- sociology | Yaml | none | acc | 0.8358 | ± | 0.0262 |
- us_foreign_policy | Yaml | none | acc | 0.8500 | ± | 0.0359 |
- stem | N/A | none | acc | 0.5217 | ± | 0.1149 |
- abstract_algebra | Yaml | none | acc | 0.3000 | ± | 0.0461 |
- anatomy | Yaml | none | acc | 0.6222 | ± | 0.0419 |
- astronomy | Yaml | none | acc | 0.6711 | ± | 0.0382 |
- college_biology | Yaml | none | acc | 0.7361 | ± | 0.0369 |
- college_chemistry | Yaml | none | acc | 0.4400 | ± | 0.0499 |
- college_computer_science | Yaml | none | acc | 0.5000 | ± | 0.0503 |
- college_mathematics | Yaml | none | acc | 0.3100 | ± | 0.0465 |
- college_physics | Yaml | none | acc | 0.4902 | ± | 0.0497 |
- computer_security | Yaml | none | acc | 0.7100 | ± | 0.0456 |
- conceptual_physics | Yaml | none | acc | 0.5362 | ± | 0.0326 |
- electrical_engineering | Yaml | none | acc | 0.5862 | ± | 0.0410 |
- elementary_mathematics | Yaml | none | acc | 0.4365 | ± | 0.0255 |
- high_school_biology | Yaml | none | acc | 0.7129 | ± | 0.0257 |
- high_school_chemistry | Yaml | none | acc | 0.5074 | ± | 0.0352 |
- high_school_computer_science | Yaml | none | acc | 0.6500 | ± | 0.0479 |
- high_school_mathematics | Yaml | none | acc | 0.3259 | ± | 0.0286 |
- high_school_physics | Yaml | none | acc | 0.3709 | ± | 0.0394 |
- high_school_statistics | Yaml | none | acc | 0.5139 | ± | 0.0341 |
- machine_learning | Yaml | none | acc | 0.5089 | ± | 0.0475 |
Groups | Version | Filter | Metric | Value | Stderr | |
---|---|---|---|---|---|---|
mmlu | N/A | none | acc | 0.6085 | ± | 0.1321 |
- humanities | N/A | none | acc | 0.5405 | ± | 0.1478 |
- other | N/A | none | acc | 0.6894 | ± | 0.1091 |
- social_sciences | N/A | none | acc | 0.7195 | ± | 0.0676 |
- stem | N/A | none | acc | 0.5217 | ± | 0.1149 |