lewtun HF staff

End of training

6fba5ee 11 months ago

5.74 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- alignment-handbook
	- generated_from_trainer
	- trl
	- dpo
	- generated_from_trainer
	datasets:
	- HuggingFaceH4/ultrafeedback_binarized
	base_model: mistralai/Mistral-7B-v0.1
	model-index:
	- name: zephyr-7b-dpo-qlora
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-dpo-qlora

	This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.5325
	- Rewards/chosen: -1.2325
	- Rewards/rejected: -2.0565
	- Rewards/accuracies: 0.7656
	- Rewards/margins: 0.8240
	- Logps/rejected: -457.4398
	- Logps/chosen: -373.4022
	- Logits/rejected: 0.7596
	- Logits/chosen: 0.5001

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 4
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 32
	- total_eval_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6916 \| 0.05 \| 100 \| 0.6912 \| 0.0059 \| 0.0019 \| 0.6484 \| 0.0041 \| -251.6075 \| -249.5596 \| -2.2040 \| -2.2621 \|
	\| 0.655 \| 0.1 \| 200 \| 0.6498 \| -0.0559 \| -0.1762 \| 0.7070 \| 0.1203 \| -269.4106 \| -255.7421 \| -2.1011 \| -2.1614 \|
	\| 0.6342 \| 0.16 \| 300 \| 0.6146 \| -0.3407 \| -0.6269 \| 0.7031 \| 0.2862 \| -314.4839 \| -284.2224 \| -1.9037 \| -1.9793 \|
	\| 0.6121 \| 0.21 \| 400 \| 0.5946 \| -0.4657 \| -0.8916 \| 0.7031 \| 0.4259 \| -340.9551 \| -296.7203 \| -1.8717 \| -1.9543 \|
	\| 0.5973 \| 0.26 \| 500 \| 0.5938 \| -0.3681 \| -0.7766 \| 0.7305 \| 0.4085 \| -329.4522 \| -286.9666 \| -1.8440 \| -1.9282 \|
	\| 0.5473 \| 0.31 \| 600 \| 0.5774 \| -0.6893 \| -1.2264 \| 0.7344 \| 0.5371 \| -374.4341 \| -319.0812 \| -1.6815 \| -1.7726 \|
	\| 0.5792 \| 0.37 \| 700 \| 0.5709 \| -0.6635 \| -1.2100 \| 0.7578 \| 0.5465 \| -372.7989 \| -316.5072 \| -1.4783 \| -1.5775 \|
	\| 0.5194 \| 0.42 \| 800 \| 0.5590 \| -1.0208 \| -1.6453 \| 0.7461 \| 0.6245 \| -416.3269 \| -352.2357 \| -0.3791 \| -0.5486 \|
	\| 0.5367 \| 0.47 \| 900 \| 0.5492 \| -1.1477 \| -1.8521 \| 0.7266 \| 0.7044 \| -437.0040 \| -364.9276 \| -0.0908 \| -0.2899 \|
	\| 0.5575 \| 0.52 \| 1000 \| 0.5450 \| -1.1704 \| -1.9048 \| 0.7344 \| 0.7344 \| -442.2755 \| -367.1964 \| 0.2761 \| 0.0498 \|
	\| 0.5507 \| 0.58 \| 1100 \| 0.5429 \| -1.1040 \| -1.8671 \| 0.7422 \| 0.7631 \| -438.5026 \| -360.5551 \| 0.5339 \| 0.2877 \|
	\| 0.5305 \| 0.63 \| 1200 \| 0.5366 \| -1.1557 \| -1.9243 \| 0.7578 \| 0.7686 \| -444.2217 \| -365.7241 \| 0.7350 \| 0.4755 \|
	\| 0.5171 \| 0.68 \| 1300 \| 0.5304 \| -1.3741 \| -2.1678 \| 0.7656 \| 0.7937 \| -468.5735 \| -387.5681 \| 0.7686 \| 0.5029 \|
	\| 0.4875 \| 0.73 \| 1400 \| 0.5321 \| -1.3228 \| -2.1513 \| 0.7578 \| 0.8285 \| -466.9267 \| -382.4329 \| 0.8566 \| 0.5926 \|
	\| 0.5216 \| 0.78 \| 1500 \| 0.5326 \| -1.2006 \| -2.0034 \| 0.7617 \| 0.8028 \| -452.1298 \| -370.2103 \| 0.7189 \| 0.4630 \|
	\| 0.4894 \| 0.84 \| 1600 \| 0.5327 \| -1.2300 \| -2.0556 \| 0.7656 \| 0.8256 \| -457.3565 \| -373.1585 \| 0.7405 \| 0.4828 \|
	\| 0.5179 \| 0.89 \| 1700 \| 0.5326 \| -1.2313 \| -2.0558 \| 0.7656 \| 0.8245 \| -457.3720 \| -373.2860 \| 0.7604 \| 0.5012 \|
	\| 0.5534 \| 0.94 \| 1800 \| 0.5325 \| -1.2309 \| -2.0558 \| 0.7656 \| 0.8249 \| -457.3779 \| -373.2437 \| 0.7550 \| 0.4957 \|
	\| 0.5539 \| 0.99 \| 1900 \| 0.5325 \| -1.2325 \| -2.0565 \| 0.7656 \| 0.8240 \| -457.4398 \| -373.4022 \| 0.7596 \| 0.5001 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.36.2
	- Pytorch 2.1.2+cu121
	- Datasets 2.14.6
	- Tokenizers 0.15.0