mistral-sft-spin-ultrafeedback

This model is a fine-tuned version of AmberYifan/mistral-safe-sft-full on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/real	Rewards/generated	Rewards/accuracies	Rewards/margins	Logps/generated	Logps/real	Logits/generated	Logits/real
0.595	0.1203	1000	0.5124	11.3158	1.6608	0.9464	9.6550	-473.1739	-358.1132	-2.4781	-2.4648
0.6451	0.2405	2000	0.4696	17.0313	2.1793	0.9613	14.8520	-467.9886	-300.9576	-2.2454	-2.2894
0.6942	0.3608	3000	0.4032	18.1009	-4.0003	0.9732	22.1012	-529.7845	-290.2621	-2.2503	-2.3049
0.5971	0.4810	4000	0.4349	20.4856	-0.4965	0.9554	20.9821	-494.7470	-266.4150	-2.2408	-2.2874
0.418	0.6013	5000	0.4742	21.3899	-1.9856	0.9613	23.3755	-509.6375	-257.3721	-2.2078	-2.2568
0.4272	0.7215	6000	0.4182	21.5687	-2.6705	0.9583	24.2392	-516.4866	-255.5838	-2.0241	-2.0560
0.408	0.8418	7000	0.3871	21.3882	-9.6508	0.9732	31.0390	-586.2899	-257.3895	-1.9645	-2.0343
0.5954	0.9620	8000	0.3972	22.1059	-7.2907	0.9643	29.3967	-562.6890	-250.2120	-1.9480	-2.0005