sft-xcomet_xl_xxl-chosen-10lp-shuff-full-tiny

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T on the Unbabel/TowerAligned-v0.1 dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Nll Loss	Logps/best	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.8021	0.1063	100	0.7701	0.7701	-76.4054	2.6949	2.3664	0.6740	0.3284	-73.7585	-76.4054	-1.7763	-1.9055
0.7255	0.2127	200	0.7367	0.7367	-73.1546	3.0200	2.6460	0.6820	0.3740	-70.9634	-73.1546	-1.7637	-1.8923
0.6979	0.3190	300	0.7232	0.7232	-71.8372	3.1517	2.7499	0.6660	0.4018	-69.9242	-71.8372	-1.7452	-1.8727
0.7072	0.4254	400	0.7137	0.7137	-70.8879	3.2466	2.8103	0.6960	0.4363	-69.3198	-70.8879	-1.7467	-1.8743
0.6958	0.5317	500	0.7085	0.7085	-70.3945	3.2960	2.8412	0.6920	0.4548	-69.0110	-70.3945	-1.7476	-1.8756
0.7216	0.6381	600	0.7055	0.7055	-70.0888	3.3265	2.8702	0.6900	0.4564	-68.7212	-70.0888	-1.7377	-1.8651
0.7531	0.7444	700	0.7038	0.7038	-69.9193	3.3435	2.8863	0.6860	0.4572	-68.5603	-69.9193	-1.7392	-1.8670
0.6531	0.8508	800	0.7028	0.7028	-69.8163	3.3538	2.9020	0.6800	0.4518	-68.4026	-69.8163	-1.7410	-1.8690
0.6801	0.9571	900	0.7027	0.7027	-69.8057	3.3548	2.9021	0.6820	0.4527	-68.4018	-69.8057	-1.7405	-1.8685