End of training

058d247 verified 9 days ago

4.95 kB

	---
	library_name: transformers
	license: other
	base_model: trl-lib/qwen1.5-0.5b-sft
	tags:
	- alignment-handbook
	- trl
	- simpo
	- generated_from_trainer
	- trl
	- simpo
	- generated_from_trainer
	datasets:
	- yakazimir/ultrafeedback_binarized
	model-index:
	- name: qwen_cpo_entropy_0_01
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# qwen_cpo_entropy_0_01

	This model is a fine-tuned version of [trl-lib/qwen1.5-0.5b-sft](https://huggingface.co/trl-lib/qwen1.5-0.5b-sft) on the yakazimir/ultrafeedback_binarized dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.5583
	- Sft Loss: 3.4705
	- Rewards/chosen: -3.3285
	- Rewards/rejected: -4.3810
	- Rewards/accuracies: 0.7226
	- Rewards/margins: 1.0525
	- Logps/rejected: -4.3810
	- Logps/chosen: -3.3285
	- Logits/rejected: 0.2811
	- Logits/chosen: 0.1563

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-06
	- train_batch_size: 2
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Sft Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.7019 \| 0.2141 \| 400 \| 0.6977 \| 1.4219 \| -1.4375 \| -1.6032 \| 0.5631 \| 0.1657 \| -1.6032 \| -1.4375 \| 0.2993 \| 0.2138 \|
	\| 0.6225 \| 0.4282 \| 800 \| 0.6192 \| 2.0573 \| -2.0770 \| -2.5396 \| 0.6669 \| 0.4626 \| -2.5396 \| -2.0770 \| 0.3429 \| 0.2570 \|
	\| 0.6242 \| 0.6422 \| 1200 \| 0.5882 \| 2.6279 \| -2.4850 \| -3.1039 \| 0.6973 \| 0.6190 \| -3.1039 \| -2.4850 \| 0.5237 \| 0.4102 \|
	\| 0.5405 \| 0.8563 \| 1600 \| 0.5781 \| 2.5442 \| -2.4160 \| -3.0202 \| 0.7092 \| 0.6042 \| -3.0202 \| -2.4160 \| 0.4122 \| 0.3042 \|
	\| 0.6195 \| 1.0704 \| 2000 \| 0.5673 \| 2.7121 \| -2.5451 \| -3.2527 \| 0.7129 \| 0.7076 \| -3.2527 \| -2.5451 \| 0.4573 \| 0.3371 \|
	\| 0.5895 \| 1.2845 \| 2400 \| 0.5590 \| 3.0631 \| -2.8962 \| -3.7486 \| 0.7322 \| 0.8524 \| -3.7486 \| -2.8962 \| 0.3362 \| 0.2174 \|
	\| 0.5512 \| 1.4986 \| 2800 \| 0.5563 \| 2.9053 \| -2.7513 \| -3.5751 \| 0.7203 \| 0.8238 \| -3.5751 \| -2.7513 \| 0.2892 \| 0.1750 \|
	\| 0.5766 \| 1.7127 \| 3200 \| 0.5520 \| 2.9643 \| -2.8134 \| -3.6655 \| 0.7263 \| 0.8522 \| -3.6655 \| -2.8134 \| 0.2677 \| 0.1562 \|
	\| 0.5625 \| 1.9267 \| 3600 \| 0.5478 \| 3.0563 \| -2.8597 \| -3.7385 \| 0.7255 \| 0.8788 \| -3.7385 \| -2.8597 \| 0.3670 \| 0.2441 \|
	\| 0.4702 \| 2.1408 \| 4000 \| 0.5592 \| 3.5119 \| -3.3071 \| -4.3285 \| 0.7240 \| 1.0214 \| -4.3285 \| -3.3071 \| 0.2395 \| 0.1198 \|
	\| 0.4882 \| 2.3549 \| 4400 \| 0.5601 \| 3.5201 \| -3.3795 \| -4.4355 \| 0.7270 \| 1.0560 \| -4.4355 \| -3.3795 \| 0.2852 \| 0.1603 \|
	\| 0.4952 \| 2.5690 \| 4800 \| 0.5580 \| 3.4402 \| -3.3065 \| -4.3570 \| 0.7233 \| 1.0505 \| -4.3570 \| -3.3065 \| 0.3210 \| 0.1936 \|
	\| 0.4272 \| 2.7831 \| 5200 \| 0.5579 \| 3.4523 \| -3.3138 \| -4.3619 \| 0.7233 \| 1.0481 \| -4.3619 \| -3.3138 \| 0.3592 \| 0.2281 \|
	\| 0.459 \| 2.9972 \| 5600 \| 0.5583 \| 3.4705 \| -3.3285 \| -4.3810 \| 0.7226 \| 1.0525 \| -4.3810 \| -3.3285 \| 0.2811 \| 0.1563 \|


	### Framework versions

	- Transformers 4.44.2
	- Pytorch 2.2.2+cu121
	- Datasets 2.18.0
	- Tokenizers 0.19.1