gpt-imdb-ipo-beta_0.5

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
10.732	0.21	500	21.6330	-0.2465	-0.4751	0.5792	0.2286	-264.6355	-235.7583	-34.3644	-34.6229
11.0252	0.42	1000	17.5281	0.3734	0.1008	0.5437	0.2726	-263.4837	-234.5185	-35.1543	-35.3784
17.5294	0.63	1500	18.4782	-0.4521	-0.6725	0.6208	0.2203	-265.0302	-236.1696	-33.9319	-34.0933
7.8398	0.83	2000	17.4130	-0.5472	-0.6406	0.6083	0.0933	-264.9664	-236.3597	-34.0128	-34.1803
6.2214	1.04	2500	9.4072	-0.5101	-0.8182	0.6292	0.3080	-265.3216	-236.2855	-33.2396	-33.3578
9.8652	1.25	3000	13.4878	-0.6413	-0.8801	0.6375	0.2388	-265.4454	-236.5479	-32.0018	-32.1655
11.4779	1.46	3500	7.5245	-0.0755	-0.3944	0.6750	0.3189	-264.4740	-235.4162	-32.8982	-33.0074
3.9833	1.67	4000	4.4888	-0.7021	-1.0680	0.6729	0.3659	-265.8214	-236.6695	-32.9502	-33.0304
3.389	1.88	4500	3.9317	-0.5045	-0.8887	0.7271	0.3841	-265.4626	-236.2743	-32.7817	-32.8828
3.2338	2.08	5000	2.4116	-0.5185	-0.8672	0.7146	0.3487	-265.4196	-236.3022	-32.5025	-32.5681
1.2381	2.29	5500	2.1558	-0.5066	-0.8815	0.7458	0.3749	-265.4483	-236.2784	-32.3108	-32.3902
1.6263	2.5	6000	1.1972	-0.5280	-0.8664	0.7396	0.3384	-265.4182	-236.3213	-32.5356	-32.6104
1.0882	2.71	6500	1.1163	-0.5303	-0.8584	0.7562	0.3281	-265.4022	-236.3259	-32.5615	-32.6406
1.0559	2.92	7000	0.9628	-0.4934	-0.8358	0.7812	0.3424	-265.3568	-236.2520	-32.5835	-32.6621