Text Generation
Transformers
Safetensors
mistral
alignment-handbook
Generated from Trainer
text-generation-inference
Inference Endpoints
juanako-7b-v1 / README.md
fblgit's picture
Update README.md
91322ed
|
raw
history blame
12.8 kB
metadata
base_model: fblgit/zephyr-lora-dpo-b1
tags:
  - alignment-handbook
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: juanako-7b-v1
    results: []
license: artistic-2.0

juanako-7b-v1

This model is a fine-tuned version of fblgit/zephyr-lora-dpo-b1 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4594
  • Rewards/chosen: -1.1095
  • Rewards/rejected: -2.3132
  • Rewards/accuracies: 0.7964
  • Rewards/margins: 1.2037
  • Logps/rejected: -220.0052
  • Logps/chosen: -217.5506
  • Logits/rejected: -2.5535
  • Logits/chosen: -2.7973

Model description

It seems to outperforms the original Zephyr in most of the tasks.

I trained Juanako with the same datasets and trainer from alignment-handbook/zephyr-7b-sft-lora

Some other experiments were performed as well to test transformers-UNA capabilities on diverse scenarios and models.

This is a complete version of the model, the result of converting LoRa's

Intended uses & limitations

Research purposes.

Training and evaluation data

alignment-handbook DPO with UNA on top of the SFT lora.

Evaluation lm-evaluation-harness

0-Shot

hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 0, batch_size: 8
Tasks Version Filter Metric Value Stderr
arc_challenge Yaml none acc 0.5691 ± 0.0145
none acc_norm 0.6041 ± 0.0143
arc_easy Yaml none acc 0.8363 ± 0.0076
none acc_norm 0.8161 ± 0.0079
hellaswag Yaml none acc 0.6554 ± 0.0047
none acc_norm 0.8411 ± 0.0036
boolq Yaml none acc 0.8355 ± 0.0065
lambada N/A none perplexity 3.3607 ± 0.1398
none acc 0.7309 ± 0.0137
piqa Yaml none acc 0.8194 ± 0.0090
none acc_norm 0.8335 ± 0.0087
sciq Yaml none acc 0.9480 ± 0.0070
none acc_norm 0.8960 ± 0.0097
truthfulqa N/A none bleu_max 26.0803 ± 0.6528
- truthfulqa_mc1 Yaml none acc 0.4198 ± 0.0173
- truthfulqa_mc2 Yaml none acc 0.5847 ± 0.0153
winogrande Yaml none acc 0.7609 ± 0.0120

1-Shot

hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 8
Tasks Version Filter Metric Value Stderr
arc_challenge Yaml none acc 0.6084 ± 0.0143
none acc_norm 0.6357 ± 0.0141
arc_easy Yaml none acc 0.8645 ± 0.0070
none acc_norm 0.8645 ± 0.0070
hellaswag Yaml none acc 0.6475 ± 0.0048
none acc_norm 0.8372 ± 0.0037
boolq Yaml none acc 0.8609 ± 0.0061
lambada N/A none perplexity 3.5484 ± 0.1034
none acc 0.7207 ± 0.0107
piqa Yaml none acc 0.8259 ± 0.0088
none acc_norm 0.8384 ± 0.0086
sciq Yaml none acc 0.9730 ± 0.0051
none acc_norm 0.9740 ± 0.0050
truthfulqa N/A none bleu_max 18.9814 ± 0.4805
none acc 0.4856 ± 0.0521
- truthfulqa_mc1 Yaml none acc 0.4333 ± 0.0173
- truthfulqa_mc2 Yaml none acc 0.5903 ± 0.0153
winogrande Yaml none acc 0.7609 ± 0.0120

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 12
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 192
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.4966 0.15 50 0.4893 -1.1759 -2.2914 0.7485 1.1155 -219.7872 -218.2148 -2.5450 -2.7884
0.4522 0.31 100 0.4808 -0.8099 -1.8893 0.7784 1.0794 -215.7659 -214.5544 -2.5644 -2.8095
0.5048 0.46 150 0.4706 -1.0526 -2.1412 0.7725 1.0887 -218.2852 -216.9814 -2.5638 -2.8089
0.4853 0.62 200 0.4640 -1.0787 -2.2821 0.7725 1.2034 -219.6941 -217.2426 -2.5460 -2.7891
0.4639 0.77 250 0.4636 -1.2348 -2.4583 0.8084 1.2235 -221.4559 -218.8034 -2.5533 -2.7970
0.4634 0.93 300 0.4601 -1.1370 -2.3243 0.7964 1.1873 -220.1163 -217.8257 -2.5540 -2.7977
- 1.00 300 0.4594 -1.1095 -2.3132 0.7964 1.2037 -220.0052 -217.5506 -2.5535 -2.7973

Framework versions

  • Transformers 4.35.0-UNA
  • Pytorch 2.1.0
  • Datasets 2.14.6
  • Tokenizers 0.14.1

MMLU Results

1-Shot

hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 1
Tasks Version Filter Metric Value Stderr
mmlu N/A none acc 0.6085 ± 0.1321
- humanities N/A none acc 0.5405 ± 0.1478
- formal_logic Yaml none acc 0.4206 ± 0.0442
- high_school_european_history Yaml none acc 0.7576 ± 0.0335
- high_school_us_history Yaml none acc 0.8186 ± 0.0270
- high_school_world_history Yaml none acc 0.7890 ± 0.0266
- international_law Yaml none acc 0.7438 ± 0.0398
- jurisprudence Yaml none acc 0.8056 ± 0.0383
- logical_fallacies Yaml none acc 0.7791 ± 0.0326
- moral_disputes Yaml none acc 0.7023 ± 0.0246
- moral_scenarios Yaml none acc 0.2145 ± 0.0137
- philosophy Yaml none acc 0.7074 ± 0.0258
- prehistory Yaml none acc 0.7377 ± 0.0245
- professional_law Yaml none acc 0.4361 ± 0.0127
- world_religions Yaml none acc 0.8421 ± 0.0280
- other N/A none acc 0.6894 ± 0.1091
- business_ethics Yaml none acc 0.5600 ± 0.0499
- clinical_knowledge Yaml none acc 0.6981 ± 0.0283
- college_medicine Yaml none acc 0.6185 ± 0.0370
- global_facts Yaml none acc 0.3300 ± 0.0473
- human_aging Yaml none acc 0.6726 ± 0.0315
- management Yaml none acc 0.8058 ± 0.0392
- marketing Yaml none acc 0.8419 ± 0.0239
- medical_genetics Yaml none acc 0.7200 ± 0.0451
- miscellaneous Yaml none acc 0.8033 ± 0.0142
- nutrition Yaml none acc 0.7288 ± 0.0255
- professional_accounting Yaml none acc 0.4929 ± 0.0298
- professional_medicine Yaml none acc 0.6801 ± 0.0283
- virology Yaml none acc 0.5000 ± 0.0389
- social_sciences N/A none acc 0.7195 ± 0.0676
- econometrics Yaml none acc 0.5000 ± 0.0470
- high_school_geography Yaml none acc 0.7879 ± 0.0291
- high_school_government_and_politics Yaml none acc 0.8601 ± 0.0250
- high_school_macroeconomics Yaml none acc 0.6231 ± 0.0246
- high_school_microeconomics Yaml none acc 0.6471 ± 0.0310
- high_school_psychology Yaml none acc 0.8000 ± 0.0171
- human_sexuality Yaml none acc 0.7557 ± 0.0377
- professional_psychology Yaml none acc 0.6552 ± 0.0192
- public_relations Yaml none acc 0.6636 ± 0.0453
- security_studies Yaml none acc 0.7184 ± 0.0288
- sociology Yaml none acc 0.8358 ± 0.0262
- us_foreign_policy Yaml none acc 0.8500 ± 0.0359
- stem N/A none acc 0.5217 ± 0.1149
- abstract_algebra Yaml none acc 0.3000 ± 0.0461
- anatomy Yaml none acc 0.6222 ± 0.0419
- astronomy Yaml none acc 0.6711 ± 0.0382
- college_biology Yaml none acc 0.7361 ± 0.0369
- college_chemistry Yaml none acc 0.4400 ± 0.0499
- college_computer_science Yaml none acc 0.5000 ± 0.0503
- college_mathematics Yaml none acc 0.3100 ± 0.0465
- college_physics Yaml none acc 0.4902 ± 0.0497
- computer_security Yaml none acc 0.7100 ± 0.0456
- conceptual_physics Yaml none acc 0.5362 ± 0.0326
- electrical_engineering Yaml none acc 0.5862 ± 0.0410
- elementary_mathematics Yaml none acc 0.4365 ± 0.0255
- high_school_biology Yaml none acc 0.7129 ± 0.0257
- high_school_chemistry Yaml none acc 0.5074 ± 0.0352
- high_school_computer_science Yaml none acc 0.6500 ± 0.0479
- high_school_mathematics Yaml none acc 0.3259 ± 0.0286
- high_school_physics Yaml none acc 0.3709 ± 0.0394
- high_school_statistics Yaml none acc 0.5139 ± 0.0341
- machine_learning Yaml none acc 0.5089 ± 0.0475
Groups Version Filter Metric Value Stderr
mmlu N/A none acc 0.6085 ± 0.1321
- humanities N/A none acc 0.5405 ± 0.1478
- other N/A none acc 0.6894 ± 0.1091
- social_sciences N/A none acc 0.7195 ± 0.0676
- stem N/A none acc 0.5217 ± 0.1149