Edit model card

tinyllama-1.1b-sum-dpo-full_LR5e-8_BS32_2epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6856
  • Rewards/chosen: -0.0618
  • Rewards/rejected: -0.0788
  • Rewards/accuracies: 0.5955
  • Rewards/margins: 0.0169
  • Logps/rejected: -71.0584
  • Logps/chosen: -64.8961
  • Logits/rejected: -3.0381
  • Logits/chosen: -3.0439

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-08
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6932 0.0345 100 0.6932 0.0000 0.0001 0.4805 -0.0001 -63.1716 -58.7091 -3.1575 -3.1632
0.6931 0.0689 200 0.6932 -0.0000 0.0000 0.4863 -0.0000 -63.1768 -58.7119 -3.1575 -3.1632
0.6931 0.1034 300 0.6932 0.0001 0.0002 0.4756 -0.0001 -63.1627 -58.7008 -3.1575 -3.1632
0.693 0.1378 400 0.6931 0.0002 0.0002 0.5007 0.0000 -63.1637 -58.6940 -3.1572 -3.1629
0.6931 0.1723 500 0.6931 0.0003 0.0002 0.4942 0.0001 -63.1590 -58.6825 -3.1569 -3.1625
0.6928 0.2068 600 0.6931 0.0006 0.0005 0.5023 0.0002 -63.1320 -58.6476 -3.1556 -3.1613
0.692 0.2412 700 0.6930 0.0010 0.0006 0.5414 0.0004 -63.1153 -58.6091 -3.1543 -3.1599
0.6923 0.2757 800 0.6928 0.0013 0.0006 0.5588 0.0007 -63.1219 -58.5861 -3.1529 -3.1585
0.6912 0.3101 900 0.6927 0.0017 0.0007 0.5660 0.0010 -63.1103 -58.5464 -3.1501 -3.1558
0.6909 0.3446 1000 0.6925 0.0018 0.0005 0.5646 0.0013 -63.1285 -58.5271 -3.1481 -3.1538
0.6907 0.3790 1100 0.6924 0.0020 0.0003 0.5604 0.0016 -63.1469 -58.5154 -3.1457 -3.1513
0.6898 0.4135 1200 0.6921 0.0018 -0.0003 0.5743 0.0022 -63.2143 -58.5306 -3.1424 -3.1480
0.688 0.4480 1300 0.6919 0.0018 -0.0008 0.5741 0.0026 -63.2606 -58.5351 -3.1392 -3.1448
0.6888 0.4824 1400 0.6917 0.0011 -0.0019 0.5723 0.0030 -63.3749 -58.6054 -3.1364 -3.1420
0.6886 0.5169 1500 0.6915 0.0002 -0.0033 0.5737 0.0035 -63.5057 -58.6878 -3.1325 -3.1382
0.6885 0.5513 1600 0.6912 -0.0003 -0.0043 0.5769 0.0040 -63.6057 -58.7407 -3.1295 -3.1351
0.6861 0.5858 1700 0.6910 -0.0016 -0.0062 0.5746 0.0046 -63.8004 -58.8729 -3.1253 -3.1310
0.6872 0.6203 1800 0.6908 -0.0035 -0.0085 0.5839 0.0050 -64.0325 -59.0604 -3.1214 -3.1270
0.6862 0.6547 1900 0.6905 -0.0054 -0.0110 0.5802 0.0057 -64.2826 -59.2489 -3.1157 -3.1214
0.6859 0.6892 2000 0.6903 -0.0080 -0.0142 0.5869 0.0062 -64.5982 -59.5137 -3.1119 -3.1176
0.6846 0.7236 2100 0.6899 -0.0107 -0.0176 0.5829 0.0069 -64.9428 -59.7842 -3.1059 -3.1116
0.6861 0.7581 2200 0.6897 -0.0133 -0.0207 0.5869 0.0074 -65.2491 -60.0455 -3.1025 -3.1081
0.6836 0.7926 2300 0.6895 -0.0168 -0.0247 0.5922 0.0079 -65.6530 -60.3904 -3.0987 -3.1044
0.6847 0.8270 2400 0.6892 -0.0209 -0.0296 0.5869 0.0087 -66.1402 -60.8069 -3.0949 -3.1007
0.6838 0.8615 2500 0.6889 -0.0250 -0.0343 0.5904 0.0093 -66.6113 -61.2157 -3.0910 -3.0968
0.6841 0.8959 2600 0.6886 -0.0284 -0.0384 0.5955 0.0100 -67.0226 -61.5496 -3.0877 -3.0933
0.6824 0.9304 2700 0.6883 -0.0321 -0.0428 0.5855 0.0107 -67.4593 -61.9186 -3.0839 -3.0897
0.6824 0.9649 2800 0.6880 -0.0334 -0.0447 0.5929 0.0113 -67.6515 -62.0566 -3.0811 -3.0868
0.6812 0.9993 2900 0.6878 -0.0363 -0.0481 0.5906 0.0118 -67.9890 -62.3425 -3.0775 -3.0832
0.6819 1.0338 3000 0.6877 -0.0373 -0.0494 0.5932 0.0120 -68.1166 -62.4440 -3.0740 -3.0797
0.6796 1.0682 3100 0.6874 -0.0392 -0.0518 0.5987 0.0126 -68.3560 -62.6296 -3.0701 -3.0759
0.6776 1.1027 3200 0.6872 -0.0409 -0.0540 0.5906 0.0131 -68.5819 -62.8043 -3.0674 -3.0732
0.6824 1.1371 3300 0.6870 -0.0436 -0.0571 0.5946 0.0135 -68.8899 -63.0750 -3.0643 -3.0701
0.6787 1.1716 3400 0.6869 -0.0458 -0.0596 0.5941 0.0138 -69.1415 -63.2913 -3.0611 -3.0668
0.6801 1.2061 3500 0.6867 -0.0482 -0.0624 0.5929 0.0142 -69.4185 -63.5317 -3.0588 -3.0646
0.6797 1.2405 3600 0.6866 -0.0499 -0.0644 0.5915 0.0145 -69.6206 -63.6998 -3.0559 -3.0616
0.6783 1.2750 3700 0.6864 -0.0511 -0.0659 0.5904 0.0149 -69.7728 -63.8172 -3.0542 -3.0599
0.6771 1.3094 3800 0.6864 -0.0521 -0.0672 0.5920 0.0151 -69.8981 -63.9235 -3.0522 -3.0580
0.6785 1.3439 3900 0.6862 -0.0536 -0.0690 0.5922 0.0154 -70.0814 -64.0693 -3.0499 -3.0556
0.6807 1.3784 4000 0.6861 -0.0551 -0.0708 0.5908 0.0157 -70.2593 -64.2214 -3.0484 -3.0541
0.6769 1.4128 4100 0.6860 -0.0563 -0.0722 0.5929 0.0159 -70.3988 -64.3376 -3.0467 -3.0525
0.6722 1.4473 4200 0.6859 -0.0577 -0.0738 0.5946 0.0161 -70.5629 -64.4845 -3.0456 -3.0513
0.6769 1.4817 4300 0.6858 -0.0582 -0.0745 0.5939 0.0163 -70.6349 -64.5350 -3.0442 -3.0499
0.6785 1.5162 4400 0.6858 -0.0586 -0.0750 0.5955 0.0164 -70.6776 -64.5703 -3.0432 -3.0490
0.6735 1.5507 4500 0.6858 -0.0597 -0.0762 0.5920 0.0164 -70.7972 -64.6853 -3.0421 -3.0479
0.6786 1.5851 4600 0.6857 -0.0603 -0.0769 0.5967 0.0166 -70.8698 -64.7462 -3.0414 -3.0471
0.6803 1.6196 4700 0.6857 -0.0603 -0.0770 0.5978 0.0167 -70.8781 -64.7435 -3.0408 -3.0466
0.6789 1.6540 4800 0.6856 -0.0607 -0.0775 0.5929 0.0168 -70.9263 -64.7804 -3.0399 -3.0457
0.6723 1.6885 4900 0.6856 -0.0611 -0.0779 0.5985 0.0168 -70.9741 -64.8213 -3.0390 -3.0448
0.6767 1.7229 5000 0.6856 -0.0613 -0.0781 0.5960 0.0169 -70.9925 -64.8377 -3.0388 -3.0446
0.6774 1.7574 5100 0.6856 -0.0615 -0.0784 0.5939 0.0168 -71.0176 -64.8661 -3.0387 -3.0445
0.6748 1.7919 5200 0.6855 -0.0616 -0.0786 0.5939 0.0170 -71.0377 -64.8736 -3.0383 -3.0441
0.6761 1.8263 5300 0.6855 -0.0617 -0.0787 0.5950 0.0170 -71.0469 -64.8778 -3.0380 -3.0439
0.6738 1.8608 5400 0.6855 -0.0618 -0.0788 0.5985 0.0171 -71.0633 -64.8885 -3.0380 -3.0438
0.6821 1.8952 5500 0.6855 -0.0618 -0.0788 0.5934 0.0170 -71.0638 -64.8919 -3.0379 -3.0437
0.6724 1.9297 5600 0.6855 -0.0619 -0.0788 0.5955 0.0170 -71.0635 -64.8979 -3.0379 -3.0437
0.6745 1.9642 5700 0.6855 -0.0619 -0.0790 0.5957 0.0171 -71.0788 -64.9037 -3.0380 -3.0438
0.6767 1.9986 5800 0.6856 -0.0618 -0.0788 0.5955 0.0169 -71.0584 -64.8961 -3.0381 -3.0439

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
9
Safetensors
Model size
1.1B params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_BS32_2epochs_old