juanako-7b-v1 / README.md

Update README.md

91322ed about 1 year ago

12.8 kB

	---
	base_model: fblgit/zephyr-lora-dpo-b1
	tags:
	- alignment-handbook
	- generated_from_trainer
	datasets:
	- HuggingFaceH4/ultrafeedback_binarized
	model-index:
	- name: juanako-7b-v1
	results: []
	license: artistic-2.0
	---

	# juanako-7b-v1

	This model is a fine-tuned version of [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4594
	- Rewards/chosen: -1.1095
	- Rewards/rejected: -2.3132
	- Rewards/accuracies: 0.7964
	- Rewards/margins: 1.2037
	- Logps/rejected: -220.0052
	- Logps/chosen: -217.5506
	- Logits/rejected: -2.5535
	- Logits/chosen: -2.7973

	## Model description

	It seems to outperforms the original Zephyr in most of the tasks.

	I trained Juanako with the same datasets and trainer from [alignment-handbook/zephyr-7b-sft-lora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-lora)
	* 1 epoch on DPO with transformers-UNA, the result is [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) after merge using FastChat converter.
	* finally 1 epoch on DPO with transformers-UNA to [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1).

	Some other experiments were performed as well to test transformers-UNA capabilities on diverse scenarios and models.

	This is a complete version of the model, the result of converting LoRa's

	## Intended uses & limitations

	Research purposes.

	## Training and evaluation data

	alignment-handbook DPO with UNA on top of the SFT lora.

	### Evaluation lm-evaluation-harness
	#### 0-Shot
	```
	hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 0, batch_size: 8
	```
	\| Tasks \|Version\|Filter\| Metric \| Value \| \|Stderr\|
	\|-------------------\|-------\|------\|-----------\|------:\|---\|-----:\|
	\|arc_challenge \|Yaml \|none \|acc \| 0.5691\|± \|0.0145\|
	\| \| \|none \|acc_norm \| 0.6041\|± \|0.0143\|
	\|arc_easy \|Yaml \|none \|acc \| 0.8363\|± \|0.0076\|
	\| \| \|none \|acc_norm \| 0.8161\|± \|0.0079\|
	\|hellaswag \|Yaml \|none \|acc \| 0.6554\|± \|0.0047\|
	\| \| \|none \|acc_norm \| 0.8411\|± \|0.0036\|
	\|boolq \|Yaml \|none \|acc \| 0.8355\|± \|0.0065\|
	\|lambada \|N/A \|none \|perplexity \| 3.3607\|± \|0.1398\|
	\| \| \|none \|acc \| 0.7309\|± \|0.0137\|
	\|piqa \|Yaml \|none \|acc \| 0.8194\|± \|0.0090\|
	\| \| \|none \|acc_norm \| 0.8335\|± \|0.0087\|
	\|sciq \|Yaml \|none \|acc \| 0.9480\|± \|0.0070\|
	\| \| \|none \|acc_norm \| 0.8960\|± \|0.0097\|
	\|truthfulqa \|N/A \|none \|bleu_max \|26.0803\|± \|0.6528\|
	\| - truthfulqa_mc1 \|Yaml \|none \|acc \| 0.4198\|± \|0.0173\|
	\| - truthfulqa_mc2 \|Yaml \|none \|acc \| 0.5847\|± \|0.0153\|
	\|winogrande \|Yaml \|none \|acc \| 0.7609\|± \|0.0120\|

	#### 1-Shot
	```
	hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 8
	```
	\| Tasks \|Version\|Filter\| Metric \| Value \| \|Stderr\|
	\|-------------------\|-------\|------\|-----------\|------:\|---\|-----:\|
	\|arc_challenge \|Yaml \|none \|acc \| 0.6084\|± \|0.0143\|
	\| \| \|none \|acc_norm \| 0.6357\|± \|0.0141\|
	\|arc_easy \|Yaml \|none \|acc \| 0.8645\|± \|0.0070\|
	\| \| \|none \|acc_norm \| 0.8645\|± \|0.0070\|
	\|hellaswag \|Yaml \|none \|acc \| 0.6475\|± \|0.0048\|
	\| \| \|none \|acc_norm \| 0.8372\|± \|0.0037\|
	\|boolq \|Yaml \|none \|acc \| 0.8609\|± \|0.0061\|
	\|lambada \|N/A \|none \|perplexity \| 3.5484\|± \|0.1034\|
	\| \| \|none \|acc \| 0.7207\|± \|0.0107\|
	\|piqa \|Yaml \|none \|acc \| 0.8259\|± \|0.0088\|
	\| \| \|none \|acc_norm \| 0.8384\|± \|0.0086\|
	\|sciq \|Yaml \|none \|acc \| 0.9730\|± \|0.0051\|
	\| \| \|none \|acc_norm \| 0.9740\|± \|0.0050\|
	\|truthfulqa \|N/A \|none \|bleu_max \|18.9814\|± \|0.4805\|
	\| \| \|none \|acc \| 0.4856\|± \|0.0521\|
	\| - truthfulqa_mc1 \|Yaml \|none \|acc \| 0.4333\|± \|0.0173\|
	\| - truthfulqa_mc2 \|Yaml \|none \|acc \| 0.5903\|± \|0.0153\|
	\|winogrande \|Yaml \|none \|acc \| 0.7609\|± \|0.0120\|

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 12
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 192
	- total_eval_batch_size: 12
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.01
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.4966 \| 0.15 \| 50 \| 0.4893 \| -1.1759 \| -2.2914 \| 0.7485 \| 1.1155 \| -219.7872 \| -218.2148 \| -2.5450 \| -2.7884 \|
	\| 0.4522 \| 0.31 \| 100 \| 0.4808 \| -0.8099 \| -1.8893 \| 0.7784 \| 1.0794 \| -215.7659 \| -214.5544 \| -2.5644 \| -2.8095 \|
	\| 0.5048 \| 0.46 \| 150 \| 0.4706 \| -1.0526 \| -2.1412 \| 0.7725 \| 1.0887 \| -218.2852 \| -216.9814 \| -2.5638 \| -2.8089 \|
	\| 0.4853 \| 0.62 \| 200 \| 0.4640 \| -1.0787 \| -2.2821 \| 0.7725 \| 1.2034 \| -219.6941 \| -217.2426 \| -2.5460 \| -2.7891 \|
	\| 0.4639 \| 0.77 \| 250 \| 0.4636 \| -1.2348 \| -2.4583 \| 0.8084 \| 1.2235 \| -221.4559 \| -218.8034 \| -2.5533 \| -2.7970 \|
	\| 0.4634 \| 0.93 \| 300 \| 0.4601 \| -1.1370 \| -2.3243 \| 0.7964 \| 1.1873 \| -220.1163 \| -217.8257 \| -2.5540 \| -2.7977 \|
	\| - \| 1.00 \| 300 \| 0.4594 \| -1.1095 \| -2.3132 \| 0.7964 \| 1.2037 \| -220.0052 \| -217.5506 \| -2.5535 \| -2.7973 \|

	### Framework versions

	- Transformers 4.35.0-UNA
	- Pytorch 2.1.0
	- Datasets 2.14.6
	- Tokenizers 0.14.1

	## MMLU Results

	#### 1-Shot
	```
	hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 1
	```
	\| Tasks \|Version\|Filter\|Metric\|Value \| \|Stderr\|
	\|---------------------------------------\|-------\|------\|------\|-----:\|---\|-----:\|
	\|mmlu \|N/A \|none \|acc \|0.6085\|± \|0.1321\|
	\| - humanities \|N/A \|none \|acc \|0.5405\|± \|0.1478\|
	\| - formal_logic \|Yaml \|none \|acc \|0.4206\|± \|0.0442\|
	\| - high_school_european_history \|Yaml \|none \|acc \|0.7576\|± \|0.0335\|
	\| - high_school_us_history \|Yaml \|none \|acc \|0.8186\|± \|0.0270\|
	\| - high_school_world_history \|Yaml \|none \|acc \|0.7890\|± \|0.0266\|
	\| - international_law \|Yaml \|none \|acc \|0.7438\|± \|0.0398\|
	\| - jurisprudence \|Yaml \|none \|acc \|0.8056\|± \|0.0383\|
	\| - logical_fallacies \|Yaml \|none \|acc \|0.7791\|± \|0.0326\|
	\| - moral_disputes \|Yaml \|none \|acc \|0.7023\|± \|0.0246\|
	\| - moral_scenarios \|Yaml \|none \|acc \|0.2145\|± \|0.0137\|
	\| - philosophy \|Yaml \|none \|acc \|0.7074\|± \|0.0258\|
	\| - prehistory \|Yaml \|none \|acc \|0.7377\|± \|0.0245\|
	\| - professional_law \|Yaml \|none \|acc \|0.4361\|± \|0.0127\|
	\| - world_religions \|Yaml \|none \|acc \|0.8421\|± \|0.0280\|
	\| - other \|N/A \|none \|acc \|0.6894\|± \|0.1091\|
	\| - business_ethics \|Yaml \|none \|acc \|0.5600\|± \|0.0499\|
	\| - clinical_knowledge \|Yaml \|none \|acc \|0.6981\|± \|0.0283\|
	\| - college_medicine \|Yaml \|none \|acc \|0.6185\|± \|0.0370\|
	\| - global_facts \|Yaml \|none \|acc \|0.3300\|± \|0.0473\|
	\| - human_aging \|Yaml \|none \|acc \|0.6726\|± \|0.0315\|
	\| - management \|Yaml \|none \|acc \|0.8058\|± \|0.0392\|
	\| - marketing \|Yaml \|none \|acc \|0.8419\|± \|0.0239\|
	\| - medical_genetics \|Yaml \|none \|acc \|0.7200\|± \|0.0451\|
	\| - miscellaneous \|Yaml \|none \|acc \|0.8033\|± \|0.0142\|
	\| - nutrition \|Yaml \|none \|acc \|0.7288\|± \|0.0255\|
	\| - professional_accounting \|Yaml \|none \|acc \|0.4929\|± \|0.0298\|
	\| - professional_medicine \|Yaml \|none \|acc \|0.6801\|± \|0.0283\|
	\| - virology \|Yaml \|none \|acc \|0.5000\|± \|0.0389\|
	\| - social_sciences \|N/A \|none \|acc \|0.7195\|± \|0.0676\|
	\| - econometrics \|Yaml \|none \|acc \|0.5000\|± \|0.0470\|
	\| - high_school_geography \|Yaml \|none \|acc \|0.7879\|± \|0.0291\|
	\| - high_school_government_and_politics\|Yaml \|none \|acc \|0.8601\|± \|0.0250\|
	\| - high_school_macroeconomics \|Yaml \|none \|acc \|0.6231\|± \|0.0246\|
	\| - high_school_microeconomics \|Yaml \|none \|acc \|0.6471\|± \|0.0310\|
	\| - high_school_psychology \|Yaml \|none \|acc \|0.8000\|± \|0.0171\|
	\| - human_sexuality \|Yaml \|none \|acc \|0.7557\|± \|0.0377\|
	\| - professional_psychology \|Yaml \|none \|acc \|0.6552\|± \|0.0192\|
	\| - public_relations \|Yaml \|none \|acc \|0.6636\|± \|0.0453\|
	\| - security_studies \|Yaml \|none \|acc \|0.7184\|± \|0.0288\|
	\| - sociology \|Yaml \|none \|acc \|0.8358\|± \|0.0262\|
	\| - us_foreign_policy \|Yaml \|none \|acc \|0.8500\|± \|0.0359\|
	\| - stem \|N/A \|none \|acc \|0.5217\|± \|0.1149\|
	\| - abstract_algebra \|Yaml \|none \|acc \|0.3000\|± \|0.0461\|
	\| - anatomy \|Yaml \|none \|acc \|0.6222\|± \|0.0419\|
	\| - astronomy \|Yaml \|none \|acc \|0.6711\|± \|0.0382\|
	\| - college_biology \|Yaml \|none \|acc \|0.7361\|± \|0.0369\|
	\| - college_chemistry \|Yaml \|none \|acc \|0.4400\|± \|0.0499\|
	\| - college_computer_science \|Yaml \|none \|acc \|0.5000\|± \|0.0503\|
	\| - college_mathematics \|Yaml \|none \|acc \|0.3100\|± \|0.0465\|
	\| - college_physics \|Yaml \|none \|acc \|0.4902\|± \|0.0497\|
	\| - computer_security \|Yaml \|none \|acc \|0.7100\|± \|0.0456\|
	\| - conceptual_physics \|Yaml \|none \|acc \|0.5362\|± \|0.0326\|
	\| - electrical_engineering \|Yaml \|none \|acc \|0.5862\|± \|0.0410\|
	\| - elementary_mathematics \|Yaml \|none \|acc \|0.4365\|± \|0.0255\|
	\| - high_school_biology \|Yaml \|none \|acc \|0.7129\|± \|0.0257\|
	\| - high_school_chemistry \|Yaml \|none \|acc \|0.5074\|± \|0.0352\|
	\| - high_school_computer_science \|Yaml \|none \|acc \|0.6500\|± \|0.0479\|
	\| - high_school_mathematics \|Yaml \|none \|acc \|0.3259\|± \|0.0286\|
	\| - high_school_physics \|Yaml \|none \|acc \|0.3709\|± \|0.0394\|
	\| - high_school_statistics \|Yaml \|none \|acc \|0.5139\|± \|0.0341\|
	\| - machine_learning \|Yaml \|none \|acc \|0.5089\|± \|0.0475\|

	\| Groups \|Version\|Filter\|Metric\|Value \| \|Stderr\|
	\|------------------\|-------\|------\|------\|-----:\|---\|-----:\|
	\|mmlu \|N/A \|none \|acc \|0.6085\|± \|0.1321\|
	\| - humanities \|N/A \|none \|acc \|0.5405\|± \|0.1478\|
	\| - other \|N/A \|none \|acc \|0.6894\|± \|0.1091\|
	\| - social_sciences\|N/A \|none \|acc \|0.7195\|± \|0.0676\|
	\| - stem \|N/A \|none \|acc \|0.5217\|± \|0.1149\|