tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_3epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

Loss: 0.6851
Rewards/chosen: -0.0660
Rewards/rejected: -0.0839
Rewards/accuracies: 0.5978
Rewards/margins: 0.0179
Logps/rejected: -71.5685
Logps/chosen: -65.3140
Logits/rejected: -3.0328
Logits/chosen: -3.0386

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-08
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6931	0.0689	100	0.6932	-0.0000	0.0001	0.4809	-0.0001	-63.1742	-58.7157	-3.1575	-3.1631
0.6931	0.1378	200	0.6932	-0.0001	-0.0000	0.4735	-0.0001	-63.1804	-58.7190	-3.1577	-3.1633
0.693	0.2068	300	0.6931	0.0002	0.0002	0.5044	0.0000	-63.1651	-58.6934	-3.1573	-3.1630
0.6929	0.2757	400	0.6931	0.0004	0.0004	0.4928	0.0000	-63.1405	-58.6678	-3.1565	-3.1621
0.6925	0.3446	500	0.6930	0.0009	0.0005	0.5374	0.0004	-63.1296	-58.6253	-3.1548	-3.1605
0.6919	0.4135	600	0.6928	0.0012	0.0006	0.5644	0.0006	-63.1213	-58.5903	-3.1529	-3.1585
0.6917	0.4824	700	0.6926	0.0017	0.0006	0.5562	0.0011	-63.1193	-58.5436	-3.1505	-3.1562
0.6905	0.5513	800	0.6924	0.0019	0.0003	0.5681	0.0016	-63.1495	-58.5180	-3.1471	-3.1528
0.6898	0.6203	900	0.6920	0.0018	-0.0004	0.5839	0.0023	-63.2244	-58.5291	-3.1427	-3.1484
0.6894	0.6892	1000	0.6918	0.0013	-0.0015	0.5699	0.0028	-63.3282	-58.5803	-3.1380	-3.1437
0.6894	0.7581	1100	0.6915	0.0004	-0.0030	0.5718	0.0033	-63.4761	-58.6734	-3.1327	-3.1383
0.6886	0.8270	1200	0.6912	-0.0007	-0.0048	0.5704	0.0041	-63.6618	-58.7859	-3.1285	-3.1342
0.6878	0.8959	1300	0.6907	-0.0026	-0.0077	0.5802	0.0051	-63.9501	-58.9768	-3.1220	-3.1276
0.6872	0.9649	1400	0.6904	-0.0047	-0.0104	0.5869	0.0057	-64.2244	-59.1855	-3.1181	-3.1238
0.6865	1.0338	1500	0.6902	-0.0077	-0.0140	0.5869	0.0063	-64.5792	-59.4787	-3.1117	-3.1174
0.6855	1.1027	1600	0.6898	-0.0109	-0.0180	0.5839	0.0071	-64.9847	-59.8052	-3.1071	-3.1128
0.6842	1.1716	1700	0.6895	-0.0156	-0.0234	0.5827	0.0079	-65.5234	-60.2681	-3.1002	-3.1059
0.6842	1.2405	1800	0.6890	-0.0215	-0.0304	0.5876	0.0089	-66.2193	-60.8594	-3.0947	-3.1005
0.6804	1.3094	1900	0.6888	-0.0253	-0.0347	0.5911	0.0095	-66.6540	-61.2379	-3.0896	-3.0952
0.6827	1.3784	2000	0.6883	-0.0299	-0.0405	0.5971	0.0107	-67.2341	-61.6997	-3.0847	-3.0904
0.6805	1.4473	2100	0.6879	-0.0345	-0.0461	0.5980	0.0116	-67.7896	-62.1622	-3.0798	-3.0855
0.68	1.5162	2200	0.6876	-0.0374	-0.0495	0.5929	0.0121	-68.1323	-62.4511	-3.0751	-3.0808
0.6805	1.5851	2300	0.6873	-0.0420	-0.0550	0.5908	0.0130	-68.6762	-62.9119	-3.0705	-3.0763
0.6802	1.6540	2400	0.6870	-0.0440	-0.0575	0.5936	0.0135	-68.9288	-63.1075	-3.0657	-3.0714
0.6788	1.7229	2500	0.6868	-0.0465	-0.0604	0.5950	0.0140	-69.2231	-63.3570	-3.0616	-3.0674
0.6784	1.7919	2600	0.6865	-0.0493	-0.0639	0.5948	0.0146	-69.5742	-63.6419	-3.0568	-3.0626
0.6771	1.8608	2700	0.6863	-0.0524	-0.0676	0.5943	0.0152	-69.9422	-63.9527	-3.0530	-3.0588
0.676	1.9297	2800	0.6861	-0.0553	-0.0710	0.5892	0.0157	-70.2780	-64.2370	-3.0501	-3.0558
0.6793	1.9986	2900	0.6860	-0.0571	-0.0731	0.5922	0.0160	-70.4908	-64.4251	-3.0474	-3.0532
0.6755	2.0675	3000	0.6858	-0.0592	-0.0755	0.5929	0.0163	-70.7265	-64.6294	-3.0442	-3.0500
0.678	2.1365	3100	0.6856	-0.0600	-0.0768	0.5941	0.0168	-70.8605	-64.7164	-3.0422	-3.0480
0.6795	2.2054	3200	0.6855	-0.0611	-0.0781	0.5941	0.0170	-70.9855	-64.8209	-3.0400	-3.0457
0.6784	2.2743	3300	0.6854	-0.0619	-0.0791	0.5969	0.0172	-71.0930	-64.9018	-3.0382	-3.0440
0.6792	2.3432	3400	0.6853	-0.0627	-0.0801	0.5946	0.0175	-71.1919	-64.9777	-3.0366	-3.0423
0.6769	2.4121	3500	0.6853	-0.0636	-0.0811	0.5953	0.0175	-71.2883	-65.0695	-3.0356	-3.0414
0.6771	2.4810	3600	0.6852	-0.0645	-0.0822	0.5978	0.0177	-71.3953	-65.1583	-3.0346	-3.0404
0.6785	2.5500	3700	0.6851	-0.0650	-0.0829	0.5997	0.0179	-71.4696	-65.2152	-3.0340	-3.0397
0.6779	2.6189	3800	0.6851	-0.0655	-0.0833	0.5962	0.0179	-71.5138	-65.2594	-3.0332	-3.0390
0.6775	2.6878	3900	0.6851	-0.0657	-0.0836	0.5974	0.0179	-71.5451	-65.2842	-3.0331	-3.0389
0.6757	2.7567	4000	0.6851	-0.0658	-0.0837	0.5985	0.0179	-71.5477	-65.2925	-3.0326	-3.0384
0.6759	2.8256	4100	0.6850	-0.0658	-0.0839	0.6022	0.0181	-71.5705	-65.2951	-3.0324	-3.0382
0.6755	2.8946	4200	0.6852	-0.0659	-0.0838	0.5990	0.0178	-71.5600	-65.3068	-3.0326	-3.0384
0.6803	2.9635	4300	0.6852	-0.0659	-0.0838	0.6006	0.0179	-71.5612	-65.3069	-3.0327	-3.0385

Framework versions

Transformers 4.41.2
Pytorch 2.1.2
Datasets 2.20.0
Tokenizers 0.19.1

martimfasantos
/

tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_3epochs_old

tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_3epochs_old

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_3epochs_old

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_3epochs_old

Evaluation results