model_hh_usp4_400

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.098	4.0	100	2.3307	-3.2289	-4.9359	0.5700	1.7070	-117.6155	-116.6908	-0.5136	-0.5011
0.2615	8.0	200	3.5637	-3.5399	-4.5546	0.5700	1.0147	-117.1918	-117.0363	-0.4837	-0.4844
0.0137	12.0	300	4.2146	-3.4955	-5.8321	0.5600	2.3366	-118.6113	-116.9870	-0.3503	-0.3327
0.0	16.0	400	4.4247	-7.2840	-9.3968	0.5500	2.1128	-122.5721	-121.1964	-0.2788	-0.2574
0.0	20.0	500	4.4045	-7.2800	-9.4193	0.5600	2.1393	-122.5971	-121.1920	-0.2793	-0.2578
0.0	24.0	600	4.4242	-7.2774	-9.3711	0.5600	2.0936	-122.5435	-121.1891	-0.2789	-0.2573
0.0	28.0	700	4.4048	-7.2951	-9.4062	0.5600	2.1110	-122.5825	-121.2088	-0.2785	-0.2570
0.0	32.0	800	4.4098	-7.2804	-9.3847	0.5500	2.1043	-122.5586	-121.1924	-0.2783	-0.2569
0.0	36.0	900	4.4251	-7.2849	-9.3768	0.5500	2.0918	-122.5498	-121.1974	-0.2792	-0.2575
0.0	40.0	1000	4.4266	-7.2918	-9.3870	0.5500	2.0952	-122.5611	-121.2051	-0.2787	-0.2572