metadata
license: apache-2.0
base_model: martimfasantos/tinyllama-1.1b-sum-sft-full
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs
results: []
tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs
This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6500
- Rewards/chosen: -1.0595
- Rewards/rejected: -1.2334
- Rewards/accuracies: 0.6046
- Rewards/margins: 0.1739
- Logps/rejected: -186.0905
- Logps/chosen: -164.9614
- Logits/rejected: -2.3429
- Logits/chosen: -2.3549
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.693 | 0.0689 | 400 | 0.6931 | 0.0003 | 0.0002 | 0.5112 | 0.0001 | -62.7270 | -58.9858 | -2.9691 | -2.9727 |
0.6923 | 0.1378 | 800 | 0.6926 | 0.0024 | 0.0012 | 0.5493 | 0.0011 | -62.6258 | -58.7797 | -2.9667 | -2.9701 |
0.6901 | 0.2068 | 1200 | 0.6907 | -0.0080 | -0.0133 | 0.5697 | 0.0053 | -64.0827 | -59.8146 | -2.9579 | -2.9613 |
0.6835 | 0.2757 | 1600 | 0.6880 | -0.0321 | -0.0436 | 0.5764 | 0.0114 | -67.1050 | -62.2266 | -2.9410 | -2.9442 |
0.6865 | 0.3446 | 2000 | 0.6852 | -0.0690 | -0.0874 | 0.5713 | 0.0184 | -71.4878 | -65.9158 | -2.9158 | -2.9192 |
0.6767 | 0.4135 | 2400 | 0.6817 | -0.1086 | -0.1352 | 0.5816 | 0.0265 | -76.2651 | -69.8803 | -2.8906 | -2.8938 |
0.6726 | 0.4824 | 2800 | 0.6792 | -0.1614 | -0.1943 | 0.5767 | 0.0328 | -82.1753 | -75.1597 | -2.8617 | -2.8651 |
0.6643 | 0.5513 | 3200 | 0.6729 | -0.2581 | -0.3074 | 0.5948 | 0.0493 | -93.4915 | -84.8225 | -2.8387 | -2.8420 |
0.6614 | 0.6203 | 3600 | 0.6740 | -0.2589 | -0.3059 | 0.5904 | 0.0470 | -93.3416 | -84.9094 | -2.8113 | -2.8144 |
0.6609 | 0.6892 | 4000 | 0.6696 | -0.3009 | -0.3603 | 0.6053 | 0.0594 | -98.7785 | -89.1073 | -2.7879 | -2.7912 |
0.6562 | 0.7581 | 4400 | 0.6667 | -0.4072 | -0.4790 | 0.5983 | 0.0718 | -110.6499 | -99.7330 | -2.7515 | -2.7548 |
0.6569 | 0.8270 | 4800 | 0.6637 | -0.4951 | -0.5782 | 0.6059 | 0.0831 | -120.5742 | -108.5273 | -2.7283 | -2.7316 |
0.6383 | 0.8959 | 5200 | 0.6621 | -0.5180 | -0.6112 | 0.6055 | 0.0932 | -123.8654 | -110.8119 | -2.7112 | -2.7149 |
0.6411 | 0.9649 | 5600 | 0.6623 | -0.5228 | -0.6134 | 0.6055 | 0.0906 | -124.0929 | -111.2965 | -2.6869 | -2.6910 |
0.6293 | 1.0338 | 6000 | 0.6618 | -0.6210 | -0.7260 | 0.6064 | 0.1049 | -135.3463 | -121.1192 | -2.6526 | -2.6573 |
0.6247 | 1.1027 | 6400 | 0.6587 | -0.7088 | -0.8268 | 0.5990 | 0.1180 | -145.4310 | -129.8984 | -2.6201 | -2.6254 |
0.6194 | 1.1716 | 6800 | 0.6580 | -0.7955 | -0.9191 | 0.5980 | 0.1236 | -154.6599 | -138.5692 | -2.5858 | -2.5912 |
0.6127 | 1.2405 | 7200 | 0.6558 | -0.6612 | -0.7815 | 0.6039 | 0.1203 | -140.8955 | -125.1357 | -2.5822 | -2.5877 |
0.6531 | 1.3094 | 7600 | 0.6534 | -0.7460 | -0.8804 | 0.6041 | 0.1344 | -150.7862 | -133.6133 | -2.5502 | -2.5564 |
0.5995 | 1.3784 | 8000 | 0.6528 | -0.8128 | -0.9555 | 0.6006 | 0.1427 | -158.2948 | -140.2942 | -2.5195 | -2.5267 |
0.61 | 1.4473 | 8400 | 0.6540 | -0.7310 | -0.8603 | 0.5980 | 0.1293 | -148.7821 | -132.1185 | -2.5198 | -2.5268 |
0.6575 | 1.5162 | 8800 | 0.6527 | -0.8369 | -0.9764 | 0.5997 | 0.1395 | -160.3900 | -142.7025 | -2.4947 | -2.5022 |
0.5969 | 1.5851 | 9200 | 0.6516 | -0.8922 | -1.0366 | 0.6101 | 0.1444 | -166.4089 | -148.2315 | -2.4661 | -2.4746 |
0.6211 | 1.6540 | 9600 | 0.6526 | -0.7875 | -0.9248 | 0.6094 | 0.1373 | -155.2340 | -137.7698 | -2.4725 | -2.4804 |
0.6011 | 1.7229 | 10000 | 0.6517 | -0.8912 | -1.0379 | 0.6099 | 0.1467 | -166.5410 | -148.1359 | -2.4396 | -2.4489 |
0.571 | 1.7919 | 10400 | 0.6514 | -0.8234 | -0.9653 | 0.6122 | 0.1419 | -159.2782 | -141.3557 | -2.4401 | -2.4489 |
0.5889 | 1.8608 | 10800 | 0.6506 | -1.0172 | -1.1751 | 0.6055 | 0.1579 | -180.2568 | -160.7332 | -2.3932 | -2.4039 |
0.5685 | 1.9297 | 11200 | 0.6486 | -1.0256 | -1.1907 | 0.5992 | 0.1651 | -181.8200 | -161.5783 | -2.3887 | -2.3992 |
0.63 | 1.9986 | 11600 | 0.6502 | -0.8869 | -1.0380 | 0.6004 | 0.1511 | -166.5461 | -147.7054 | -2.4012 | -2.4108 |
0.5891 | 2.0675 | 12000 | 0.6490 | -1.0453 | -1.2122 | 0.6046 | 0.1670 | -183.9714 | -163.5418 | -2.3713 | -2.3825 |
0.5808 | 2.1365 | 12400 | 0.6490 | -1.1906 | -1.3718 | 0.6039 | 0.1811 | -199.9255 | -178.0778 | -2.3382 | -2.3508 |
0.6051 | 2.2054 | 12800 | 0.6496 | -1.0959 | -1.2648 | 0.6053 | 0.1689 | -189.2301 | -168.6040 | -2.3542 | -2.3658 |
0.6223 | 2.2743 | 13200 | 0.6502 | -1.0865 | -1.2588 | 0.6069 | 0.1723 | -188.6267 | -167.6660 | -2.3460 | -2.3579 |
0.6245 | 2.3432 | 13600 | 0.6506 | -1.0806 | -1.2530 | 0.5983 | 0.1724 | -188.0497 | -167.0715 | -2.3462 | -2.3583 |
0.5716 | 2.4121 | 14000 | 0.6511 | -1.0306 | -1.1979 | 0.5941 | 0.1672 | -182.5368 | -162.0786 | -2.3533 | -2.3651 |
0.6078 | 2.4810 | 14400 | 0.6506 | -1.0889 | -1.2642 | 0.6004 | 0.1753 | -189.1684 | -167.9059 | -2.3417 | -2.3540 |
0.6112 | 2.5500 | 14800 | 0.6500 | -1.1067 | -1.2865 | 0.5971 | 0.1798 | -191.4036 | -169.6898 | -2.3390 | -2.3514 |
0.5773 | 2.6189 | 15200 | 0.6508 | -1.0435 | -1.2146 | 0.6025 | 0.1712 | -184.2123 | -163.3605 | -2.3468 | -2.3588 |
0.5983 | 2.6878 | 15600 | 0.6505 | -1.0660 | -1.2397 | 0.6018 | 0.1737 | -186.7185 | -165.6157 | -2.3419 | -2.3540 |
0.5983 | 2.7567 | 16000 | 0.6501 | -1.0707 | -1.2465 | 0.6029 | 0.1758 | -187.3989 | -166.0839 | -2.3408 | -2.3530 |
0.5956 | 2.8256 | 16400 | 0.6500 | -1.0594 | -1.2333 | 0.6008 | 0.1739 | -186.0803 | -164.9520 | -2.3429 | -2.3550 |
0.6221 | 2.8946 | 16800 | 0.6499 | -1.0592 | -1.2333 | 0.6041 | 0.1742 | -186.0846 | -164.9336 | -2.3430 | -2.3551 |
0.6096 | 2.9635 | 17200 | 0.6500 | -1.0595 | -1.2334 | 0.6046 | 0.1739 | -186.0905 | -164.9614 | -2.3429 | -2.3549 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.19.2
- Tokenizers 0.19.1