gpt-imdb-cdpo_0.15-beta_0.1

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5541	0.21	500	0.5598	-0.1801	-1.1214	0.8417	0.9413	-274.8995	-237.0667	-33.1267	-34.0864
0.5399	0.42	1000	0.5555	-0.4075	-1.5309	0.8604	1.1234	-278.9942	-239.3399	-36.6366	-37.5032
0.5379	0.63	1500	0.5445	-0.5885	-1.8167	0.875	1.2282	-281.8521	-241.1506	-34.0236	-34.9075
0.5224	0.83	2000	0.5347	-0.4581	-1.7693	0.8917	1.3112	-281.3783	-239.8462	-34.9412	-35.8186
0.4992	1.04	2500	0.5318	-0.5998	-1.9222	0.9000	1.3224	-282.9072	-241.2631	-34.8041	-35.6967
0.5654	1.25	3000	0.5308	-0.5502	-1.9299	0.9021	1.3797	-282.9844	-240.7672	-35.6718	-36.5937
0.5382	1.46	3500	0.5247	-0.4952	-1.8522	0.9125	1.3570	-282.2072	-240.2172	-35.7229	-36.6547
0.5409	1.67	4000	0.5220	-0.5742	-1.9755	0.9292	1.4013	-283.4403	-241.0072	-36.4780	-37.3339
0.4911	1.88	4500	0.5186	-0.6281	-2.0249	0.9271	1.3967	-283.9341	-241.5466	-36.1014	-36.8989
0.5007	2.08	5000	0.5170	-0.6115	-2.0085	0.9312	1.3969	-283.7699	-241.3805	-36.7092	-37.5360
0.4714	2.29	5500	0.5166	-0.5400	-1.9265	0.9229	1.3865	-282.9501	-240.6650	-36.1382	-36.9914
0.5159	2.5	6000	0.5168	-0.5925	-1.9754	0.9271	1.3829	-283.4395	-241.1906	-35.9587	-36.8156
0.5103	2.71	6500	0.5171	-0.6197	-2.0190	0.9333	1.3993	-283.8753	-241.4619	-36.0316	-36.8825
0.5049	2.92	7000	0.5181	-0.6104	-1.9969	0.9271	1.3866	-283.6544	-241.3688	-36.1797	-37.0193