metadata

base_model: fblgit/zephyr-lora-dpo-b1
tags:
  - alignment-handbook
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: juanako-7b-v1
    results: []
license: artistic-2.0

juanako-7b-v1

This model is a fine-tuned version of fblgit/zephyr-lora-dpo-b1 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.4594
Rewards/chosen: -1.1095
Rewards/rejected: -2.3132
Rewards/accuracies: 0.7964
Rewards/margins: 1.2037
Logps/rejected: -220.0052
Logps/chosen: -217.5506
Logits/rejected: -2.5535
Logits/chosen: -2.7973

Model description

It seems to outperforms the original Zephyr in most of the tasks.

I trained Juanako with the same datasets and trainer from alignment-handbook/zephyr-7b-sft-lora

1 epoch on DPO with transformers-UNA, the result is fblgit/zephyr-lora-dpo-b1 after merge using FastChat converter.
finally 1 epoch on DPO with transformers-UNA to fblgit/zephyr-lora-dpo-b1.

Some other experiments were performed as well to test transformers-UNA capabilities on diverse scenarios and models.

This is a complete version of the model, the result of converting LoRa's

Intended uses & limitations

Research purposes.

Training and evaluation data

alignment-handbook DPO with UNA on top of the SFT lora.

Evaluation lm-evaluation-harness

0-Shot

hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 0, batch_size: 8

Tasks	Version	Filter	Metric	Value		Stderr
arc_challenge	Yaml	none	acc	0.5691	±	0.0145
		none	acc_norm	0.6041	±	0.0143
arc_easy	Yaml	none	acc	0.8363	±	0.0076
		none	acc_norm	0.8161	±	0.0079
hellaswag	Yaml	none	acc	0.6554	±	0.0047
		none	acc_norm	0.8411	±	0.0036
boolq	Yaml	none	acc	0.8355	±	0.0065
lambada	N/A	none	perplexity	3.3607	±	0.1398
		none	acc	0.7309	±	0.0137
piqa	Yaml	none	acc	0.8194	±	0.0090
		none	acc_norm	0.8335	±	0.0087
sciq	Yaml	none	acc	0.9480	±	0.0070
		none	acc_norm	0.8960	±	0.0097
truthfulqa	N/A	none	bleu_max	26.0803	±	0.6528
- truthfulqa_mc1	Yaml	none	acc	0.4198	±	0.0173
- truthfulqa_mc2	Yaml	none	acc	0.5847	±	0.0153
winogrande	Yaml	none	acc	0.7609	±	0.0120

1-Shot

hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 8

Tasks	Version	Filter	Metric	Value		Stderr
arc_challenge	Yaml	none	acc	0.6084	±	0.0143
		none	acc_norm	0.6357	±	0.0141
arc_easy	Yaml	none	acc	0.8645	±	0.0070
		none	acc_norm	0.8645	±	0.0070
hellaswag	Yaml	none	acc	0.6475	±	0.0048
		none	acc_norm	0.8372	±	0.0037
boolq	Yaml	none	acc	0.8609	±	0.0061
lambada	N/A	none	perplexity	3.5484	±	0.1034
		none	acc	0.7207	±	0.0107
piqa	Yaml	none	acc	0.8259	±	0.0088
		none	acc_norm	0.8384	±	0.0086
sciq	Yaml	none	acc	0.9730	±	0.0051
		none	acc_norm	0.9740	±	0.0050
truthfulqa	N/A	none	bleu_max	18.9814	±	0.4805
		none	acc	0.4856	±	0.0521
- truthfulqa_mc1	Yaml	none	acc	0.4333	±	0.0173
- truthfulqa_mc2	Yaml	none	acc	0.5903	±	0.0153
winogrande	Yaml	none	acc	0.7609	±	0.0120

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 12
gradient_accumulation_steps: 16
total_train_batch_size: 192
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.01
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.4966	0.15	50	0.4893	-1.1759	-2.2914	0.7485	1.1155	-219.7872	-218.2148	-2.5450	-2.7884
0.4522	0.31	100	0.4808	-0.8099	-1.8893	0.7784	1.0794	-215.7659	-214.5544	-2.5644	-2.8095
0.5048	0.46	150	0.4706	-1.0526	-2.1412	0.7725	1.0887	-218.2852	-216.9814	-2.5638	-2.8089
0.4853	0.62	200	0.4640	-1.0787	-2.2821	0.7725	1.2034	-219.6941	-217.2426	-2.5460	-2.7891
0.4639	0.77	250	0.4636	-1.2348	-2.4583	0.8084	1.2235	-221.4559	-218.8034	-2.5533	-2.7970
0.4634	0.93	300	0.4601	-1.1370	-2.3243	0.7964	1.1873	-220.1163	-217.8257	-2.5540	-2.7977
-	1.00	300	0.4594	-1.1095	-2.3132	0.7964	1.2037	-220.0052	-217.5506	-2.5535	-2.7973

Framework versions

Transformers 4.35.0-UNA
Pytorch 2.1.0
Datasets 2.14.6
Tokenizers 0.14.1

MMLU Results

1-Shot

hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 1

Tasks	Version	Filter	Metric	Value		Stderr
mmlu	N/A	none	acc	0.6085	±	0.1321
- humanities	N/A	none	acc	0.5405	±	0.1478
- formal_logic	Yaml	none	acc	0.4206	±	0.0442
- high_school_european_history	Yaml	none	acc	0.7576	±	0.0335
- high_school_us_history	Yaml	none	acc	0.8186	±	0.0270
- high_school_world_history	Yaml	none	acc	0.7890	±	0.0266
- international_law	Yaml	none	acc	0.7438	±	0.0398
- jurisprudence	Yaml	none	acc	0.8056	±	0.0383
- logical_fallacies	Yaml	none	acc	0.7791	±	0.0326
- moral_disputes	Yaml	none	acc	0.7023	±	0.0246
- moral_scenarios	Yaml	none	acc	0.2145	±	0.0137
- philosophy	Yaml	none	acc	0.7074	±	0.0258
- prehistory	Yaml	none	acc	0.7377	±	0.0245
- professional_law	Yaml	none	acc	0.4361	±	0.0127
- world_religions	Yaml	none	acc	0.8421	±	0.0280
- other	N/A	none	acc	0.6894	±	0.1091
- business_ethics	Yaml	none	acc	0.5600	±	0.0499
- clinical_knowledge	Yaml	none	acc	0.6981	±	0.0283
- college_medicine	Yaml	none	acc	0.6185	±	0.0370
- global_facts	Yaml	none	acc	0.3300	±	0.0473
- human_aging	Yaml	none	acc	0.6726	±	0.0315
- management	Yaml	none	acc	0.8058	±	0.0392
- marketing	Yaml	none	acc	0.8419	±	0.0239
- medical_genetics	Yaml	none	acc	0.7200	±	0.0451
- miscellaneous	Yaml	none	acc	0.8033	±	0.0142
- nutrition	Yaml	none	acc	0.7288	±	0.0255
- professional_accounting	Yaml	none	acc	0.4929	±	0.0298
- professional_medicine	Yaml	none	acc	0.6801	±	0.0283
- virology	Yaml	none	acc	0.5000	±	0.0389
- social_sciences	N/A	none	acc	0.7195	±	0.0676
- econometrics	Yaml	none	acc	0.5000	±	0.0470
- high_school_geography	Yaml	none	acc	0.7879	±	0.0291
- high_school_government_and_politics	Yaml	none	acc	0.8601	±	0.0250
- high_school_macroeconomics	Yaml	none	acc	0.6231	±	0.0246
- high_school_microeconomics	Yaml	none	acc	0.6471	±	0.0310
- high_school_psychology	Yaml	none	acc	0.8000	±	0.0171
- human_sexuality	Yaml	none	acc	0.7557	±	0.0377
- professional_psychology	Yaml	none	acc	0.6552	±	0.0192
- public_relations	Yaml	none	acc	0.6636	±	0.0453
- security_studies	Yaml	none	acc	0.7184	±	0.0288
- sociology	Yaml	none	acc	0.8358	±	0.0262
- us_foreign_policy	Yaml	none	acc	0.8500	±	0.0359
- stem	N/A	none	acc	0.5217	±	0.1149
- abstract_algebra	Yaml	none	acc	0.3000	±	0.0461
- anatomy	Yaml	none	acc	0.6222	±	0.0419
- astronomy	Yaml	none	acc	0.6711	±	0.0382
- college_biology	Yaml	none	acc	0.7361	±	0.0369
- college_chemistry	Yaml	none	acc	0.4400	±	0.0499
- college_computer_science	Yaml	none	acc	0.5000	±	0.0503
- college_mathematics	Yaml	none	acc	0.3100	±	0.0465
- college_physics	Yaml	none	acc	0.4902	±	0.0497
- computer_security	Yaml	none	acc	0.7100	±	0.0456
- conceptual_physics	Yaml	none	acc	0.5362	±	0.0326
- electrical_engineering	Yaml	none	acc	0.5862	±	0.0410
- elementary_mathematics	Yaml	none	acc	0.4365	±	0.0255
- high_school_biology	Yaml	none	acc	0.7129	±	0.0257
- high_school_chemistry	Yaml	none	acc	0.5074	±	0.0352
- high_school_computer_science	Yaml	none	acc	0.6500	±	0.0479
- high_school_mathematics	Yaml	none	acc	0.3259	±	0.0286
- high_school_physics	Yaml	none	acc	0.3709	±	0.0394
- high_school_statistics	Yaml	none	acc	0.5139	±	0.0341
- machine_learning	Yaml	none	acc	0.5089	±	0.0475

Groups	Version	Filter	Metric	Value		Stderr
mmlu	N/A	none	acc	0.6085	±	0.1321
- humanities	N/A	none	acc	0.5405	±	0.1478
- other	N/A	none	acc	0.6894	±	0.1091
- social_sciences	N/A	none	acc	0.7195	±	0.0676
- stem	N/A	none	acc	0.5217	±	0.1149