model_hh_shp4_200

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
8.0	100	1.4330	-0.1080	-1.3964	0.6200	1.2883	-230.1935	-224.4584	-0.7684	-0.7753
16.0	200	1.4371	-0.0911	-1.3887	0.6400	1.2976	-230.1849	-224.4396	-0.7692	-0.7762
24.0	300	1.4477	-0.1125	-1.3921	0.6200	1.2795	-230.1887	-224.4634	-0.7693	-0.7763
32.0	400	1.4521	-0.1143	-1.4167	0.6200	1.3024	-230.2161	-224.4653	-0.7696	-0.7763
40.0	500	1.4631	-0.1153	-1.3806	0.6200	1.2653	-230.1759	-224.4665	-0.7701	-0.7771
48.0	600	1.4455	-0.1180	-1.3970	0.6300	1.2791	-230.1942	-224.4695	-0.7698	-0.7769
56.0	700	1.4292	-0.0800	-1.3720	0.6100	1.2920	-230.1664	-224.4273	-0.7704	-0.7775
64.0	800	1.4434	-0.0943	-1.3739	0.6200	1.2796	-230.1686	-224.4432	-0.7703	-0.7773
72.0	900	1.4493	-0.1016	-1.4044	0.6100	1.3028	-230.2024	-224.4513	-0.7704	-0.7773
80.0	1000	1.4445	-0.1401	-1.3796	0.6300	1.2395	-230.1749	-224.4940	-0.7701	-0.7769