Mistral-7B-v0.1-gen-dpo-10k

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/real	Rewards/generated	Rewards/accuracies	Rewards/margins	Logps/generated	Logps/real	Logits/generated	Logits/real
0.6898	0.1984	62	0.6845	0.1316	-1.2271	0.8269	1.3587	-246.0414	-211.0208	-2.6139	-2.5606
0.5637	0.3968	124	0.6091	1.9039	-1.0140	0.9231	2.9179	-243.9099	-193.2971	-2.9188	-2.9273
0.4765	0.5952	186	0.4901	1.6316	-2.9131	0.9615	4.5447	-262.9012	-196.0205	-2.6050	-2.6193
0.4421	0.7936	248	0.4296	0.8748	-3.7695	0.9423	4.6443	-271.4653	-203.5885	-2.5477	-2.5049
0.4329	0.992	310	0.3885	1.7310	-3.2873	0.9808	5.0183	-266.6432	-195.0263	-2.4849	-2.4779
0.192	1.1904	372	0.4325	4.2551	-0.5848	0.9231	4.8399	-239.6185	-169.7859	-2.6992	-2.7276
0.1832	1.3888	434	0.3965	4.0302	-1.0932	0.9038	5.1234	-244.7022	-172.0349	-2.6359	-2.6597
0.1759	1.5872	496	0.4029	4.6281	-1.2718	0.9038	5.8999	-246.4886	-166.0557	-2.5095	-2.5768
0.1911	1.7856	558	0.4281	4.7928	-0.9888	0.9231	5.7817	-243.6584	-164.4082	-2.7026	-2.8069
0.1719	1.984	620	0.4522	5.4290	0.0713	0.8654	5.3577	-233.0573	-158.0468	-2.6334	-2.6747
0.1363	2.1824	682	0.4649	6.2001	0.9000	0.8846	5.3001	-224.7699	-150.3351	-2.5111	-2.6322
0.1349	2.3808	744	0.4958	6.5905	1.3552	0.8846	5.2353	-220.2184	-146.4319	-2.5129	-2.6396
0.1316	2.5792	806	0.4796	6.6882	1.1784	0.9038	5.5098	-221.9857	-145.4545	-2.5378	-2.6846
0.1293	2.7776	868	0.4938	6.8678	1.4561	0.8846	5.4117	-219.2092	-143.6585	-2.4843	-2.6386
0.1244	2.976	930	0.4841	6.6281	0.7385	0.9038	5.8896	-226.3850	-146.0557	-2.4268	-2.5712