Edit model card

tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_3epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6851
  • Rewards/chosen: -0.0660
  • Rewards/rejected: -0.0839
  • Rewards/accuracies: 0.5978
  • Rewards/margins: 0.0179
  • Logps/rejected: -71.5685
  • Logps/chosen: -65.3140
  • Logits/rejected: -3.0328
  • Logits/chosen: -3.0386

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-08
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.0689 100 0.6932 -0.0000 0.0001 0.4809 -0.0001 -63.1742 -58.7157 -3.1575 -3.1631
0.6931 0.1378 200 0.6932 -0.0001 -0.0000 0.4735 -0.0001 -63.1804 -58.7190 -3.1577 -3.1633
0.693 0.2068 300 0.6931 0.0002 0.0002 0.5044 0.0000 -63.1651 -58.6934 -3.1573 -3.1630
0.6929 0.2757 400 0.6931 0.0004 0.0004 0.4928 0.0000 -63.1405 -58.6678 -3.1565 -3.1621
0.6925 0.3446 500 0.6930 0.0009 0.0005 0.5374 0.0004 -63.1296 -58.6253 -3.1548 -3.1605
0.6919 0.4135 600 0.6928 0.0012 0.0006 0.5644 0.0006 -63.1213 -58.5903 -3.1529 -3.1585
0.6917 0.4824 700 0.6926 0.0017 0.0006 0.5562 0.0011 -63.1193 -58.5436 -3.1505 -3.1562
0.6905 0.5513 800 0.6924 0.0019 0.0003 0.5681 0.0016 -63.1495 -58.5180 -3.1471 -3.1528
0.6898 0.6203 900 0.6920 0.0018 -0.0004 0.5839 0.0023 -63.2244 -58.5291 -3.1427 -3.1484
0.6894 0.6892 1000 0.6918 0.0013 -0.0015 0.5699 0.0028 -63.3282 -58.5803 -3.1380 -3.1437
0.6894 0.7581 1100 0.6915 0.0004 -0.0030 0.5718 0.0033 -63.4761 -58.6734 -3.1327 -3.1383
0.6886 0.8270 1200 0.6912 -0.0007 -0.0048 0.5704 0.0041 -63.6618 -58.7859 -3.1285 -3.1342
0.6878 0.8959 1300 0.6907 -0.0026 -0.0077 0.5802 0.0051 -63.9501 -58.9768 -3.1220 -3.1276
0.6872 0.9649 1400 0.6904 -0.0047 -0.0104 0.5869 0.0057 -64.2244 -59.1855 -3.1181 -3.1238
0.6865 1.0338 1500 0.6902 -0.0077 -0.0140 0.5869 0.0063 -64.5792 -59.4787 -3.1117 -3.1174
0.6855 1.1027 1600 0.6898 -0.0109 -0.0180 0.5839 0.0071 -64.9847 -59.8052 -3.1071 -3.1128
0.6842 1.1716 1700 0.6895 -0.0156 -0.0234 0.5827 0.0079 -65.5234 -60.2681 -3.1002 -3.1059
0.6842 1.2405 1800 0.6890 -0.0215 -0.0304 0.5876 0.0089 -66.2193 -60.8594 -3.0947 -3.1005
0.6804 1.3094 1900 0.6888 -0.0253 -0.0347 0.5911 0.0095 -66.6540 -61.2379 -3.0896 -3.0952
0.6827 1.3784 2000 0.6883 -0.0299 -0.0405 0.5971 0.0107 -67.2341 -61.6997 -3.0847 -3.0904
0.6805 1.4473 2100 0.6879 -0.0345 -0.0461 0.5980 0.0116 -67.7896 -62.1622 -3.0798 -3.0855
0.68 1.5162 2200 0.6876 -0.0374 -0.0495 0.5929 0.0121 -68.1323 -62.4511 -3.0751 -3.0808
0.6805 1.5851 2300 0.6873 -0.0420 -0.0550 0.5908 0.0130 -68.6762 -62.9119 -3.0705 -3.0763
0.6802 1.6540 2400 0.6870 -0.0440 -0.0575 0.5936 0.0135 -68.9288 -63.1075 -3.0657 -3.0714
0.6788 1.7229 2500 0.6868 -0.0465 -0.0604 0.5950 0.0140 -69.2231 -63.3570 -3.0616 -3.0674
0.6784 1.7919 2600 0.6865 -0.0493 -0.0639 0.5948 0.0146 -69.5742 -63.6419 -3.0568 -3.0626
0.6771 1.8608 2700 0.6863 -0.0524 -0.0676 0.5943 0.0152 -69.9422 -63.9527 -3.0530 -3.0588
0.676 1.9297 2800 0.6861 -0.0553 -0.0710 0.5892 0.0157 -70.2780 -64.2370 -3.0501 -3.0558
0.6793 1.9986 2900 0.6860 -0.0571 -0.0731 0.5922 0.0160 -70.4908 -64.4251 -3.0474 -3.0532
0.6755 2.0675 3000 0.6858 -0.0592 -0.0755 0.5929 0.0163 -70.7265 -64.6294 -3.0442 -3.0500
0.678 2.1365 3100 0.6856 -0.0600 -0.0768 0.5941 0.0168 -70.8605 -64.7164 -3.0422 -3.0480
0.6795 2.2054 3200 0.6855 -0.0611 -0.0781 0.5941 0.0170 -70.9855 -64.8209 -3.0400 -3.0457
0.6784 2.2743 3300 0.6854 -0.0619 -0.0791 0.5969 0.0172 -71.0930 -64.9018 -3.0382 -3.0440
0.6792 2.3432 3400 0.6853 -0.0627 -0.0801 0.5946 0.0175 -71.1919 -64.9777 -3.0366 -3.0423
0.6769 2.4121 3500 0.6853 -0.0636 -0.0811 0.5953 0.0175 -71.2883 -65.0695 -3.0356 -3.0414
0.6771 2.4810 3600 0.6852 -0.0645 -0.0822 0.5978 0.0177 -71.3953 -65.1583 -3.0346 -3.0404
0.6785 2.5500 3700 0.6851 -0.0650 -0.0829 0.5997 0.0179 -71.4696 -65.2152 -3.0340 -3.0397
0.6779 2.6189 3800 0.6851 -0.0655 -0.0833 0.5962 0.0179 -71.5138 -65.2594 -3.0332 -3.0390
0.6775 2.6878 3900 0.6851 -0.0657 -0.0836 0.5974 0.0179 -71.5451 -65.2842 -3.0331 -3.0389
0.6757 2.7567 4000 0.6851 -0.0658 -0.0837 0.5985 0.0179 -71.5477 -65.2925 -3.0326 -3.0384
0.6759 2.8256 4100 0.6850 -0.0658 -0.0839 0.6022 0.0181 -71.5705 -65.2951 -3.0324 -3.0382
0.6755 2.8946 4200 0.6852 -0.0659 -0.0838 0.5990 0.0178 -71.5600 -65.3068 -3.0326 -3.0384
0.6803 2.9635 4300 0.6852 -0.0659 -0.0838 0.6006 0.0179 -71.5612 -65.3069 -3.0327 -3.0385

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
1.1B params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_3epochs_old