ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter0

This model is a fine-tuned version of BramVanroy/GEITje-7B-ultra-sft on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0135
Rewards/real: -1.4818
Rewards/generated: -13.3376
Rewards/accuracies: 0.9963
Rewards/margins: 11.8558
Logps/generated: -410.0757
Logps/real: -427.4978
Logits/generated: -2.7305
Logits/real: -2.7643

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/real	Rewards/generated	Rewards/accuracies	Rewards/margins	Logps/generated	Logps/real	Logits/generated	Logits/real
0.4944	0.08	25	0.2566	0.6645	-0.8427	0.9761	1.5071	-285.1264	-406.0350	-3.0069	-3.0147
0.092	0.16	50	0.0838	0.3983	-3.7771	0.9890	4.1754	-314.4705	-408.6964	-2.9427	-2.9557
0.0601	0.25	75	0.0457	0.2564	-5.6388	0.9963	5.8952	-333.0871	-410.1154	-2.9205	-2.9326
0.0437	0.33	100	0.0336	-0.1853	-7.2451	0.9963	7.0598	-349.1503	-414.5328	-2.8883	-2.9062
0.036	0.41	125	0.0271	-0.1651	-7.7408	0.9945	7.5756	-354.1071	-414.3309	-2.8817	-2.9014
0.0373	0.49	150	0.0264	-0.2384	-7.8312	0.9908	7.5928	-355.0117	-415.0634	-2.8271	-2.8543
0.0198	0.58	175	0.0214	-0.9152	-9.9469	0.9908	9.0317	-376.1681	-421.8315	-2.8052	-2.8326
0.0426	0.66	200	0.0251	-0.9747	-9.1022	0.9908	8.1275	-367.7210	-422.4266	-2.8450	-2.8588
0.0262	0.74	225	0.0189	-0.8414	-9.9318	0.9926	9.0903	-376.0172	-421.0940	-2.8009	-2.8209
0.0142	0.82	250	0.0166	-0.7154	-10.1059	0.9945	9.3905	-377.7586	-419.8336	-2.7973	-2.8201
0.0171	0.9	275	0.0189	-1.0905	-10.9057	0.9945	9.8151	-385.7561	-423.5849	-2.7641	-2.7936
0.0333	0.99	300	0.0168	-1.2797	-11.4866	0.9963	10.2069	-391.5655	-425.4765	-2.7973	-2.8230
0.0061	1.07	325	0.0157	-1.2079	-11.1880	0.9945	9.9801	-388.5797	-424.7587	-2.7974	-2.8231
0.0022	1.15	350	0.0152	-1.0695	-11.2438	0.9908	10.1743	-389.1376	-423.3746	-2.7853	-2.8128
0.0033	1.23	375	0.0148	-1.1767	-11.6618	0.9908	10.4851	-393.3175	-424.4465	-2.7751	-2.8029
0.0043	1.32	400	0.0138	-1.0951	-11.8306	0.9963	10.7354	-395.0049	-423.6307	-2.7703	-2.7976
0.005	1.4	425	0.0136	-1.3179	-12.4674	0.9963	11.1494	-401.3733	-425.8589	-2.7573	-2.7851
0.0031	1.48	450	0.0139	-1.3771	-12.6901	0.9963	11.3130	-403.6003	-426.4503	-2.7544	-2.7815
0.0039	1.56	475	0.0134	-1.3885	-12.8092	0.9963	11.4207	-404.7912	-426.5648	-2.7446	-2.7735
0.001	1.64	500	0.0136	-1.4378	-13.0038	0.9963	11.5660	-406.7370	-427.0571	-2.7404	-2.7701
0.0059	1.73	525	0.0139	-1.5924	-13.4168	0.9945	11.8244	-410.8671	-428.6035	-2.7293	-2.7629
0.0015	1.81	550	0.0136	-1.5136	-13.3984	0.9963	11.8848	-410.6832	-427.8157	-2.7283	-2.7623
0.0078	1.89	575	0.0135	-1.4891	-13.3323	0.9963	11.8432	-410.0224	-427.5704	-2.7309	-2.7645
0.0043	1.97	600	0.0135	-1.4818	-13.3376	0.9963	11.8558	-410.0757	-427.4978	-2.7305	-2.7643

Framework versions

Transformers 4.37.0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

davidberenstein1957
/

ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter0

ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for davidberenstein1957/ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter0

Evaluation results