tinyllama-1.1b-sum-dpo-full_LR5e-8_2epochs_old
This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:
- Loss: 0.6808
- Rewards/chosen: -0.1214
- Rewards/rejected: -0.1497
- Rewards/accuracies: 0.6090
- Rewards/margins: 0.0284
- Logps/rejected: -78.1532
- Logps/chosen: -70.8499
- Logits/rejected: -2.9566
- Logits/chosen: -2.9624
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-08
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6931 | 0.0172 | 100 | 0.6932 | 0.0001 | 0.0001 | 0.4830 | -0.0000 | -63.1707 | -58.7060 | -3.1577 | -3.1634 |
0.6931 | 0.0345 | 200 | 0.6932 | 0.0000 | 0.0001 | 0.4763 | -0.0001 | -63.1661 | -58.7098 | -3.1576 | -3.1633 |
0.6931 | 0.0517 | 300 | 0.6932 | -0.0000 | 0.0000 | 0.4893 | -0.0001 | -63.1759 | -58.7129 | -3.1578 | -3.1635 |
0.6932 | 0.0689 | 400 | 0.6932 | 0.0001 | 0.0003 | 0.4631 | -0.0001 | -63.1539 | -58.6981 | -3.1577 | -3.1634 |
0.6931 | 0.0861 | 500 | 0.6932 | 0.0001 | 0.0002 | 0.4842 | -0.0001 | -63.1628 | -58.7064 | -3.1577 | -3.1633 |
0.6929 | 0.1034 | 600 | 0.6932 | 0.0001 | 0.0002 | 0.4870 | -0.0000 | -63.1628 | -58.6974 | -3.1574 | -3.1630 |
0.693 | 0.1206 | 700 | 0.6932 | 0.0002 | 0.0002 | 0.4865 | -0.0000 | -63.1602 | -58.6945 | -3.1573 | -3.1629 |
0.6928 | 0.1378 | 800 | 0.6931 | 0.0003 | 0.0003 | 0.5005 | 0.0000 | -63.1503 | -58.6786 | -3.1570 | -3.1626 |
0.6929 | 0.1551 | 900 | 0.6931 | 0.0006 | 0.0004 | 0.5114 | 0.0002 | -63.1377 | -58.6515 | -3.1564 | -3.1620 |
0.6929 | 0.1723 | 1000 | 0.6930 | 0.0007 | 0.0004 | 0.5163 | 0.0002 | -63.1368 | -58.6461 | -3.1554 | -3.1611 |
0.6927 | 0.1895 | 1100 | 0.6930 | 0.0008 | 0.0005 | 0.5353 | 0.0003 | -63.1281 | -58.6300 | -3.1546 | -3.1602 |
0.6926 | 0.2068 | 1200 | 0.6929 | 0.0011 | 0.0007 | 0.5332 | 0.0004 | -63.1063 | -58.5972 | -3.1533 | -3.1590 |
0.6925 | 0.2240 | 1300 | 0.6928 | 0.0014 | 0.0008 | 0.5551 | 0.0006 | -63.0993 | -58.5706 | -3.1521 | -3.1577 |
0.6911 | 0.2412 | 1400 | 0.6927 | 0.0016 | 0.0006 | 0.5537 | 0.0010 | -63.1157 | -58.5519 | -3.1503 | -3.1559 |
0.6906 | 0.2584 | 1500 | 0.6925 | 0.0018 | 0.0006 | 0.5644 | 0.0013 | -63.1246 | -58.5291 | -3.1489 | -3.1545 |
0.6915 | 0.2757 | 1600 | 0.6924 | 0.0019 | 0.0005 | 0.5660 | 0.0015 | -63.1345 | -58.5184 | -3.1472 | -3.1529 |
0.6912 | 0.2929 | 1700 | 0.6922 | 0.0021 | 0.0002 | 0.5634 | 0.0019 | -63.1578 | -58.5044 | -3.1446 | -3.1502 |
0.6889 | 0.3101 | 1800 | 0.6922 | 0.0019 | -0.0001 | 0.5653 | 0.0020 | -63.1906 | -58.5175 | -3.1424 | -3.1481 |
0.69 | 0.3274 | 1900 | 0.6919 | 0.0019 | -0.0006 | 0.5771 | 0.0025 | -63.2406 | -58.5210 | -3.1407 | -3.1464 |
0.6899 | 0.3446 | 2000 | 0.6919 | 0.0016 | -0.0011 | 0.5771 | 0.0027 | -63.2913 | -58.5564 | -3.1376 | -3.1433 |
0.6892 | 0.3618 | 2100 | 0.6917 | 0.0012 | -0.0017 | 0.5741 | 0.0030 | -63.3523 | -58.5873 | -3.1355 | -3.1412 |
0.6866 | 0.3790 | 2200 | 0.6916 | 0.0008 | -0.0025 | 0.5743 | 0.0033 | -63.4306 | -58.6304 | -3.1324 | -3.1381 |
0.6859 | 0.3963 | 2300 | 0.6914 | 0.0003 | -0.0035 | 0.5683 | 0.0037 | -63.5263 | -58.6859 | -3.1305 | -3.1361 |
0.6889 | 0.4135 | 2400 | 0.6912 | -0.0006 | -0.0047 | 0.5781 | 0.0041 | -63.6550 | -58.7736 | -3.1267 | -3.1324 |
0.6902 | 0.4307 | 2500 | 0.6910 | -0.0014 | -0.0060 | 0.5781 | 0.0045 | -63.7757 | -58.8557 | -3.1236 | -3.1293 |
0.685 | 0.4480 | 2600 | 0.6908 | -0.0029 | -0.0078 | 0.5825 | 0.0049 | -63.9588 | -58.9977 | -3.1216 | -3.1272 |
0.6852 | 0.4652 | 2700 | 0.6906 | -0.0048 | -0.0102 | 0.5834 | 0.0054 | -64.2020 | -59.1921 | -3.1189 | -3.1246 |
0.6857 | 0.4824 | 2800 | 0.6904 | -0.0062 | -0.0120 | 0.5860 | 0.0058 | -64.3761 | -59.3318 | -3.1154 | -3.1211 |
0.688 | 0.4997 | 2900 | 0.6902 | -0.0087 | -0.0149 | 0.5862 | 0.0062 | -64.6728 | -59.5807 | -3.1119 | -3.1176 |
0.6877 | 0.5169 | 3000 | 0.6901 | -0.0114 | -0.0180 | 0.5795 | 0.0066 | -64.9774 | -59.8506 | -3.1089 | -3.1146 |
0.6846 | 0.5341 | 3100 | 0.6899 | -0.0123 | -0.0192 | 0.5822 | 0.0070 | -65.1015 | -59.9371 | -3.1072 | -3.1128 |
0.6856 | 0.5513 | 3200 | 0.6897 | -0.0154 | -0.0230 | 0.5822 | 0.0075 | -65.4752 | -60.2526 | -3.1035 | -3.1092 |
0.6825 | 0.5686 | 3300 | 0.6894 | -0.0185 | -0.0266 | 0.5860 | 0.0081 | -65.8370 | -60.5571 | -3.0987 | -3.1044 |
0.6782 | 0.5858 | 3400 | 0.6891 | -0.0209 | -0.0296 | 0.5892 | 0.0087 | -66.1367 | -60.7975 | -3.0949 | -3.1006 |
0.6844 | 0.6030 | 3500 | 0.6890 | -0.0230 | -0.0321 | 0.5904 | 0.0091 | -66.3928 | -61.0109 | -3.0922 | -3.0980 |
0.6825 | 0.6203 | 3600 | 0.6887 | -0.0251 | -0.0347 | 0.5934 | 0.0097 | -66.6546 | -61.2199 | -3.0886 | -3.0944 |
0.6782 | 0.6375 | 3700 | 0.6885 | -0.0273 | -0.0374 | 0.5920 | 0.0101 | -66.9203 | -61.4445 | -3.0848 | -3.0906 |
0.6814 | 0.6547 | 3800 | 0.6882 | -0.0304 | -0.0412 | 0.5915 | 0.0107 | -67.2956 | -61.7525 | -3.0816 | -3.0874 |
0.6784 | 0.6720 | 3900 | 0.6880 | -0.0335 | -0.0449 | 0.5936 | 0.0114 | -67.6722 | -62.0628 | -3.0784 | -3.0841 |
0.6811 | 0.6892 | 4000 | 0.6877 | -0.0370 | -0.0491 | 0.5950 | 0.0121 | -68.0929 | -62.4165 | -3.0748 | -3.0805 |
0.6741 | 0.7064 | 4100 | 0.6875 | -0.0379 | -0.0503 | 0.5922 | 0.0124 | -68.2125 | -62.4995 | -3.0698 | -3.0755 |
0.6837 | 0.7236 | 4200 | 0.6874 | -0.0399 | -0.0526 | 0.5953 | 0.0127 | -68.4362 | -62.6979 | -3.0663 | -3.0720 |
0.6825 | 0.7409 | 4300 | 0.6871 | -0.0407 | -0.0540 | 0.5960 | 0.0133 | -68.5772 | -62.7839 | -3.0631 | -3.0689 |
0.681 | 0.7581 | 4400 | 0.6871 | -0.0428 | -0.0562 | 0.5939 | 0.0134 | -68.7993 | -62.9920 | -3.0603 | -3.0660 |
0.6826 | 0.7753 | 4500 | 0.6868 | -0.0463 | -0.0604 | 0.5932 | 0.0141 | -69.2207 | -63.3446 | -3.0565 | -3.0623 |
0.6744 | 0.7926 | 4600 | 0.6865 | -0.0489 | -0.0635 | 0.5943 | 0.0146 | -69.5328 | -63.5999 | -3.0541 | -3.0598 |
0.6826 | 0.8098 | 4700 | 0.6863 | -0.0524 | -0.0677 | 0.5990 | 0.0153 | -69.9523 | -63.9563 | -3.0511 | -3.0569 |
0.6821 | 0.8270 | 4800 | 0.6861 | -0.0559 | -0.0716 | 0.5934 | 0.0157 | -70.3441 | -64.3050 | -3.0487 | -3.0544 |
0.677 | 0.8442 | 4900 | 0.6858 | -0.0593 | -0.0757 | 0.5922 | 0.0164 | -70.7547 | -64.6435 | -3.0456 | -3.0514 |
0.6765 | 0.8615 | 5000 | 0.6857 | -0.0607 | -0.0774 | 0.5934 | 0.0167 | -70.9189 | -64.7823 | -3.0424 | -3.0482 |
0.6792 | 0.8787 | 5100 | 0.6854 | -0.0643 | -0.0817 | 0.5908 | 0.0174 | -71.3476 | -65.1395 | -3.0393 | -3.0451 |
0.6752 | 0.8959 | 5200 | 0.6852 | -0.0667 | -0.0845 | 0.5957 | 0.0177 | -71.6288 | -65.3858 | -3.0369 | -3.0428 |
0.6752 | 0.9132 | 5300 | 0.6851 | -0.0695 | -0.0876 | 0.5911 | 0.0181 | -71.9352 | -65.6583 | -3.0333 | -3.0390 |
0.6766 | 0.9304 | 5400 | 0.6848 | -0.0707 | -0.0893 | 0.5974 | 0.0186 | -72.1090 | -65.7783 | -3.0313 | -3.0370 |
0.6761 | 0.9476 | 5500 | 0.6848 | -0.0718 | -0.0904 | 0.5969 | 0.0187 | -72.2232 | -65.8871 | -3.0286 | -3.0344 |
0.68 | 0.9649 | 5600 | 0.6847 | -0.0716 | -0.0904 | 0.5992 | 0.0189 | -72.2249 | -65.8690 | -3.0267 | -3.0324 |
0.6744 | 0.9821 | 5700 | 0.6846 | -0.0735 | -0.0928 | 0.5983 | 0.0193 | -72.4612 | -66.0631 | -3.0237 | -3.0295 |
0.6709 | 0.9993 | 5800 | 0.6843 | -0.0764 | -0.0963 | 0.5999 | 0.0199 | -72.8088 | -66.3480 | -3.0203 | -3.0261 |
0.6738 | 1.0165 | 5900 | 0.6842 | -0.0770 | -0.0972 | 0.6018 | 0.0202 | -72.8978 | -66.4100 | -3.0168 | -3.0226 |
0.6755 | 1.0338 | 6000 | 0.6841 | -0.0774 | -0.0977 | 0.6050 | 0.0202 | -72.9485 | -66.4556 | -3.0150 | -3.0207 |
0.6727 | 1.0510 | 6100 | 0.6840 | -0.0790 | -0.0997 | 0.6043 | 0.0207 | -73.1473 | -66.6101 | -3.0124 | -3.0182 |
0.677 | 1.0682 | 6200 | 0.6838 | -0.0804 | -0.1014 | 0.6053 | 0.0210 | -73.3202 | -66.7547 | -3.0100 | -3.0157 |
0.6778 | 1.0855 | 6300 | 0.6838 | -0.0826 | -0.1037 | 0.6018 | 0.0211 | -73.5472 | -66.9698 | -3.0081 | -3.0139 |
0.6772 | 1.1027 | 6400 | 0.6835 | -0.0842 | -0.1060 | 0.6043 | 0.0218 | -73.7832 | -67.1349 | -3.0059 | -3.0117 |
0.6789 | 1.1199 | 6500 | 0.6834 | -0.0856 | -0.1077 | 0.6055 | 0.0221 | -73.9500 | -67.2763 | -3.0033 | -3.0090 |
0.6776 | 1.1371 | 6600 | 0.6833 | -0.0879 | -0.1102 | 0.6036 | 0.0223 | -74.2005 | -67.5068 | -3.0010 | -3.0068 |
0.6755 | 1.1544 | 6700 | 0.6831 | -0.0900 | -0.1127 | 0.6057 | 0.0227 | -74.4476 | -67.7115 | -2.9988 | -3.0045 |
0.6688 | 1.1716 | 6800 | 0.6829 | -0.0926 | -0.1159 | 0.6090 | 0.0233 | -74.7660 | -67.9706 | -2.9960 | -3.0017 |
0.6807 | 1.1888 | 6900 | 0.6828 | -0.0942 | -0.1176 | 0.6062 | 0.0234 | -74.9441 | -68.1345 | -2.9941 | -2.9999 |
0.6691 | 1.2061 | 7000 | 0.6827 | -0.0965 | -0.1202 | 0.6071 | 0.0238 | -75.2016 | -68.3571 | -2.9919 | -2.9977 |
0.6704 | 1.2233 | 7100 | 0.6827 | -0.0970 | -0.1208 | 0.6029 | 0.0238 | -75.2590 | -68.4095 | -2.9898 | -2.9956 |
0.6693 | 1.2405 | 7200 | 0.6825 | -0.0985 | -0.1226 | 0.6073 | 0.0242 | -75.4421 | -68.5575 | -2.9875 | -2.9932 |
0.6811 | 1.2578 | 7300 | 0.6825 | -0.0996 | -0.1238 | 0.6046 | 0.0243 | -75.5637 | -68.6693 | -2.9856 | -2.9914 |
0.6731 | 1.2750 | 7400 | 0.6823 | -0.1008 | -0.1253 | 0.6059 | 0.0245 | -75.7101 | -68.7873 | -2.9843 | -2.9901 |
0.6746 | 1.2922 | 7500 | 0.6823 | -0.1009 | -0.1257 | 0.6036 | 0.0247 | -75.7457 | -68.8045 | -2.9825 | -2.9883 |
0.6788 | 1.3094 | 7600 | 0.6823 | -0.1020 | -0.1267 | 0.6073 | 0.0247 | -75.8491 | -68.9100 | -2.9802 | -2.9860 |
0.6704 | 1.3267 | 7700 | 0.6820 | -0.1033 | -0.1286 | 0.6066 | 0.0253 | -76.0417 | -69.0466 | -2.9779 | -2.9837 |
0.6694 | 1.3439 | 7800 | 0.6820 | -0.1054 | -0.1309 | 0.6022 | 0.0255 | -76.2745 | -69.2565 | -2.9769 | -2.9827 |
0.6779 | 1.3611 | 7900 | 0.6819 | -0.1067 | -0.1323 | 0.6069 | 0.0256 | -76.4101 | -69.3778 | -2.9754 | -2.9812 |
0.6712 | 1.3784 | 8000 | 0.6817 | -0.1082 | -0.1342 | 0.6062 | 0.0260 | -76.5969 | -69.5304 | -2.9740 | -2.9798 |
0.6768 | 1.3956 | 8100 | 0.6817 | -0.1096 | -0.1359 | 0.6006 | 0.0262 | -76.7652 | -69.6763 | -2.9726 | -2.9784 |
0.6714 | 1.4128 | 8200 | 0.6815 | -0.1112 | -0.1378 | 0.6046 | 0.0266 | -76.9560 | -69.8316 | -2.9714 | -2.9772 |
0.6705 | 1.4300 | 8300 | 0.6815 | -0.1122 | -0.1387 | 0.6001 | 0.0265 | -77.0526 | -69.9333 | -2.9699 | -2.9758 |
0.6706 | 1.4473 | 8400 | 0.6814 | -0.1131 | -0.1399 | 0.6025 | 0.0268 | -77.1713 | -70.0219 | -2.9690 | -2.9748 |
0.6651 | 1.4645 | 8500 | 0.6814 | -0.1138 | -0.1407 | 0.6064 | 0.0269 | -77.2468 | -70.0874 | -2.9675 | -2.9733 |
0.676 | 1.4817 | 8600 | 0.6813 | -0.1143 | -0.1413 | 0.6032 | 0.0270 | -77.3085 | -70.1414 | -2.9664 | -2.9722 |
0.6682 | 1.4990 | 8700 | 0.6814 | -0.1141 | -0.1411 | 0.6050 | 0.0269 | -77.2885 | -70.1259 | -2.9660 | -2.9718 |
0.6732 | 1.5162 | 8800 | 0.6813 | -0.1147 | -0.1417 | 0.5997 | 0.0270 | -77.3463 | -70.1773 | -2.9650 | -2.9708 |
0.6706 | 1.5334 | 8900 | 0.6811 | -0.1160 | -0.1434 | 0.6108 | 0.0274 | -77.5247 | -70.3133 | -2.9641 | -2.9700 |
0.6589 | 1.5507 | 9000 | 0.6812 | -0.1169 | -0.1443 | 0.6053 | 0.0274 | -77.6094 | -70.3996 | -2.9631 | -2.9689 |
0.6694 | 1.5679 | 9100 | 0.6811 | -0.1172 | -0.1447 | 0.6043 | 0.0275 | -77.6490 | -70.4324 | -2.9621 | -2.9680 |
0.6691 | 1.5851 | 9200 | 0.6810 | -0.1179 | -0.1456 | 0.6011 | 0.0277 | -77.7365 | -70.4981 | -2.9617 | -2.9675 |
0.6701 | 1.6023 | 9300 | 0.6811 | -0.1179 | -0.1455 | 0.6027 | 0.0276 | -77.7288 | -70.5024 | -2.9611 | -2.9669 |
0.6705 | 1.6196 | 9400 | 0.6810 | -0.1182 | -0.1461 | 0.6078 | 0.0279 | -77.7879 | -70.5325 | -2.9603 | -2.9661 |
0.6699 | 1.6368 | 9500 | 0.6810 | -0.1186 | -0.1464 | 0.6073 | 0.0278 | -77.8179 | -70.5707 | -2.9596 | -2.9654 |
0.6699 | 1.6540 | 9600 | 0.6809 | -0.1191 | -0.1471 | 0.6092 | 0.0279 | -77.8869 | -70.6254 | -2.9591 | -2.9649 |
0.6675 | 1.6713 | 9700 | 0.6809 | -0.1196 | -0.1477 | 0.6015 | 0.0281 | -77.9472 | -70.6696 | -2.9584 | -2.9643 |
0.6639 | 1.6885 | 9800 | 0.6809 | -0.1198 | -0.1479 | 0.6083 | 0.0281 | -77.9676 | -70.6902 | -2.9585 | -2.9643 |
0.6578 | 1.7057 | 9900 | 0.6808 | -0.1200 | -0.1482 | 0.6043 | 0.0282 | -77.9982 | -70.7108 | -2.9583 | -2.9641 |
0.6647 | 1.7229 | 10000 | 0.6809 | -0.1204 | -0.1485 | 0.6048 | 0.0281 | -78.0275 | -70.7473 | -2.9578 | -2.9637 |
0.6655 | 1.7402 | 10100 | 0.6808 | -0.1204 | -0.1486 | 0.6071 | 0.0282 | -78.0394 | -70.7507 | -2.9579 | -2.9637 |
0.6671 | 1.7574 | 10200 | 0.6808 | -0.1206 | -0.1488 | 0.6059 | 0.0282 | -78.0608 | -70.7737 | -2.9574 | -2.9632 |
0.6774 | 1.7746 | 10300 | 0.6808 | -0.1207 | -0.1490 | 0.6055 | 0.0283 | -78.0839 | -70.7829 | -2.9569 | -2.9628 |
0.6629 | 1.7919 | 10400 | 0.6807 | -0.1208 | -0.1493 | 0.6076 | 0.0285 | -78.1098 | -70.7925 | -2.9568 | -2.9626 |
0.6648 | 1.8091 | 10500 | 0.6808 | -0.1211 | -0.1494 | 0.6092 | 0.0283 | -78.1209 | -70.8208 | -2.9567 | -2.9625 |
0.6745 | 1.8263 | 10600 | 0.6808 | -0.1212 | -0.1495 | 0.6083 | 0.0284 | -78.1333 | -70.8279 | -2.9568 | -2.9627 |
0.6665 | 1.8436 | 10700 | 0.6808 | -0.1211 | -0.1495 | 0.6053 | 0.0283 | -78.1275 | -70.8257 | -2.9566 | -2.9624 |
0.6663 | 1.8608 | 10800 | 0.6808 | -0.1212 | -0.1496 | 0.6078 | 0.0284 | -78.1382 | -70.8324 | -2.9566 | -2.9624 |
0.6674 | 1.8780 | 10900 | 0.6807 | -0.1213 | -0.1497 | 0.6083 | 0.0284 | -78.1542 | -70.8423 | -2.9568 | -2.9626 |
0.6767 | 1.8952 | 11000 | 0.6808 | -0.1212 | -0.1495 | 0.6078 | 0.0283 | -78.1295 | -70.8295 | -2.9567 | -2.9626 |
0.6683 | 1.9125 | 11100 | 0.6808 | -0.1212 | -0.1496 | 0.6087 | 0.0284 | -78.1378 | -70.8316 | -2.9569 | -2.9628 |
0.6673 | 1.9297 | 11200 | 0.6807 | -0.1212 | -0.1496 | 0.6090 | 0.0284 | -78.1370 | -70.8290 | -2.9566 | -2.9624 |
0.6781 | 1.9469 | 11300 | 0.6807 | -0.1211 | -0.1496 | 0.6097 | 0.0285 | -78.1363 | -70.8190 | -2.9568 | -2.9626 |
0.6682 | 1.9642 | 11400 | 0.6807 | -0.1213 | -0.1498 | 0.6085 | 0.0285 | -78.1613 | -70.8446 | -2.9567 | -2.9626 |
0.6775 | 1.9814 | 11500 | 0.6808 | -0.1212 | -0.1495 | 0.6083 | 0.0282 | -78.1266 | -70.8364 | -2.9566 | -2.9624 |
0.6688 | 1.9986 | 11600 | 0.6808 | -0.1214 | -0.1497 | 0.6090 | 0.0284 | -78.1532 | -70.8499 | -2.9566 | -2.9624 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 12
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.