qwen_l21_entropy

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.6612
Rewards/chosen: -4.9613
Rewards/rejected: -8.3580
Rewards/accuracies: 0.6766
Rewards/margins: 3.3967
Logps/rejected: -8.3580
Logps/chosen: -4.9613
Logits/rejected: 1.3373
Logits/chosen: 0.9296

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6893	0.2141	400	0.6976	-5.6399	-5.6514	0.5134	0.0115	-5.6514	-5.6399	0.6073	0.4970
0.6905	0.4282	800	0.6888	-9.5942	-10.2217	0.5772	0.6275	-10.2217	-9.5942	0.9367	0.7851
0.6827	0.6422	1200	0.6809	-3.7037	-4.6831	0.6417	0.9794	-4.6831	-3.7037	0.4628	0.3100
0.665	0.8563	1600	0.6737	-4.1597	-6.3017	0.6588	2.1420	-6.3017	-4.1597	0.9087	0.6452
0.674	1.0704	2000	0.6702	-4.7093	-7.4594	0.6677	2.7501	-7.4594	-4.7093	1.0243	0.7072
0.6648	1.2845	2400	0.6651	-4.2327	-7.0267	0.6654	2.7940	-7.0267	-4.2327	0.9760	0.6519
0.6665	1.4986	2800	0.6654	-4.6367	-7.6607	0.6706	3.0240	-7.6607	-4.6367	1.0821	0.7239
0.6746	1.7127	3200	0.6641	-5.1015	-8.2207	0.6803	3.1192	-8.2207	-5.1015	1.0711	0.6993
0.6634	1.9267	3600	0.6629	-4.7411	-7.8576	0.6855	3.1165	-7.8576	-4.7411	1.0738	0.7086
0.6224	2.1408	4000	0.6607	-4.6523	-7.8867	0.6818	3.2344	-7.8867	-4.6523	1.1108	0.7335
0.6604	2.3549	4400	0.6618	-4.7746	-8.0447	0.6780	3.2700	-8.0447	-4.7746	1.2654	0.8695
0.6512	2.5690	4800	0.6615	-4.9147	-8.2777	0.6773	3.3630	-8.2777	-4.9147	1.2819	0.8805
0.6594	2.7831	5200	0.6611	-4.9802	-8.3859	0.6795	3.4057	-8.3859	-4.9802	1.2711	0.8676
0.6402	2.9972	5600	0.6612	-4.9613	-8.3580	0.6766	3.3967	-8.3580	-4.9613	1.3373	0.9296

Framework versions

Transformers 4.44.2
Pytorch 2.2.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1

yakazimir
/

qwen_l21_entropy

qwen_l21_entropy

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for yakazimir/qwen_l21_entropy

Dataset used to train yakazimir/qwen_l21_entropy

Evaluation results