Edit model card

tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_4epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6803
  • Rewards/chosen: -0.1265
  • Rewards/rejected: -0.1560
  • Rewards/accuracies: 0.6036
  • Rewards/margins: 0.0295
  • Logps/rejected: -78.7771
  • Logps/chosen: -71.3634
  • Logits/rejected: -2.9512
  • Logits/chosen: -2.9570

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-08
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.0689 100 0.6932 -0.0001 0.0001 0.4793 -0.0001 -63.1744 -58.7172 -3.1574 -3.1630
0.6932 0.1378 200 0.6931 0.0001 0.0001 0.4956 0.0000 -63.1716 -58.7029 -3.1576 -3.1633
0.693 0.2068 300 0.6932 0.0001 0.0002 0.4724 -0.0001 -63.1577 -58.7002 -3.1575 -3.1632
0.693 0.2757 400 0.6931 0.0003 0.0003 0.5007 0.0000 -63.1547 -58.6827 -3.1569 -3.1625
0.6927 0.3446 500 0.6931 0.0006 0.0004 0.5128 0.0002 -63.1359 -58.6518 -3.1563 -3.1619
0.6922 0.4135 600 0.6930 0.0009 0.0005 0.5358 0.0004 -63.1295 -58.6249 -3.1544 -3.1600
0.692 0.4824 700 0.6928 0.0015 0.0008 0.5516 0.0007 -63.0973 -58.5609 -3.1522 -3.1578
0.6911 0.5513 800 0.6926 0.0018 0.0006 0.5634 0.0012 -63.1172 -58.5317 -3.1497 -3.1553
0.6903 0.6203 900 0.6923 0.0019 0.0002 0.5641 0.0017 -63.1634 -58.5242 -3.1456 -3.1513
0.6899 0.6892 1000 0.6920 0.0016 -0.0008 0.5676 0.0024 -63.2556 -58.5502 -3.1411 -3.1467
0.6898 0.7581 1100 0.6916 0.0011 -0.0021 0.5802 0.0032 -63.3925 -58.6040 -3.1359 -3.1415
0.689 0.8270 1200 0.6913 0.0000 -0.0038 0.5753 0.0038 -63.5565 -58.7099 -3.1316 -3.1371
0.6881 0.8959 1300 0.6910 -0.0015 -0.0061 0.5804 0.0046 -63.7902 -58.8624 -3.1268 -3.1325
0.6874 0.9649 1400 0.6907 -0.0037 -0.0088 0.5825 0.0051 -64.0628 -59.0799 -3.1213 -3.1269
0.6867 1.0338 1500 0.6903 -0.0063 -0.0124 0.5843 0.0061 -64.4169 -59.3381 -3.1142 -3.1198
0.6857 1.1027 1600 0.6899 -0.0097 -0.0166 0.5876 0.0069 -64.8429 -59.6860 -3.1081 -3.1137
0.6843 1.1716 1700 0.6895 -0.0148 -0.0227 0.5804 0.0078 -65.4468 -60.1953 -3.1013 -3.1070
0.6842 1.2405 1800 0.6890 -0.0219 -0.0309 0.5871 0.0089 -66.2668 -60.9047 -3.0944 -3.1001
0.6802 1.3094 1900 0.6886 -0.0263 -0.0362 0.5920 0.0098 -66.7954 -61.3438 -3.0883 -3.0940
0.6824 1.3784 2000 0.6881 -0.0324 -0.0436 0.5939 0.0112 -67.5355 -61.9519 -3.0814 -3.0871
0.6799 1.4473 2100 0.6875 -0.0387 -0.0510 0.5992 0.0123 -68.2835 -62.5824 -3.0754 -3.0811
0.6793 1.5162 2200 0.6872 -0.0420 -0.0551 0.5913 0.0131 -68.6940 -62.9161 -3.0698 -3.0755
0.6797 1.5851 2300 0.6868 -0.0485 -0.0626 0.5918 0.0141 -69.4427 -63.5627 -3.0623 -3.0680
0.6792 1.6540 2400 0.6863 -0.0512 -0.0663 0.5939 0.0151 -69.8102 -63.8365 -3.0547 -3.0604
0.6775 1.7229 2500 0.6860 -0.0552 -0.0710 0.5946 0.0158 -70.2800 -64.2325 -3.0488 -3.0546
0.6768 1.7919 2600 0.6856 -0.0598 -0.0766 0.5936 0.0169 -70.8443 -64.6883 -3.0412 -3.0469
0.675 1.8608 2700 0.6851 -0.0654 -0.0832 0.5948 0.0178 -71.4996 -65.2471 -3.0345 -3.0402
0.6736 1.9297 2800 0.6847 -0.0707 -0.0896 0.5983 0.0189 -72.1448 -65.7864 -3.0286 -3.0344
0.6773 1.9986 2900 0.6844 -0.0746 -0.0943 0.6020 0.0196 -72.6052 -66.1758 -3.0225 -3.0283
0.6724 2.0675 3000 0.6841 -0.0793 -0.0997 0.6029 0.0204 -73.1465 -66.6415 -3.0158 -3.0216
0.674 2.1365 3100 0.6837 -0.0824 -0.1036 0.6029 0.0212 -73.5381 -66.9540 -3.0112 -3.0169
0.6764 2.2054 3200 0.6834 -0.0857 -0.1076 0.6066 0.0219 -73.9390 -67.2856 -3.0047 -3.0105
0.6749 2.2743 3300 0.6831 -0.0887 -0.1113 0.6069 0.0226 -74.3103 -67.5846 -2.9991 -3.0049
0.6746 2.3432 3400 0.6828 -0.0921 -0.1154 0.6055 0.0233 -74.7230 -67.9247 -2.9944 -3.0002
0.6718 2.4121 3500 0.6824 -0.0962 -0.1204 0.6069 0.0242 -75.2213 -68.3350 -2.9890 -2.9948
0.672 2.4810 3600 0.6822 -0.1013 -0.1261 0.6048 0.0248 -75.7936 -68.8439 -2.9844 -2.9902
0.6733 2.5500 3700 0.6820 -0.1048 -0.1302 0.6032 0.0254 -76.1958 -69.1902 -2.9800 -2.9858
0.6715 2.6189 3800 0.6817 -0.1077 -0.1336 0.6046 0.0260 -76.5409 -69.4776 -2.9765 -2.9823
0.6709 2.6878 3900 0.6816 -0.1102 -0.1366 0.6020 0.0264 -76.8374 -69.7330 -2.9729 -2.9787
0.6696 2.7567 4000 0.6814 -0.1132 -0.1400 0.6032 0.0268 -77.1831 -70.0346 -2.9698 -2.9756
0.6687 2.8256 4100 0.6812 -0.1154 -0.1427 0.6048 0.0273 -77.4501 -70.2526 -2.9670 -2.9729
0.6692 2.8946 4200 0.6810 -0.1166 -0.1443 0.6073 0.0277 -77.6081 -70.3715 -2.9649 -2.9708
0.6742 2.9635 4300 0.6809 -0.1184 -0.1463 0.6027 0.0279 -77.8100 -70.5513 -2.9629 -2.9687
0.6652 3.0324 4400 0.6808 -0.1191 -0.1473 0.6090 0.0282 -77.9141 -70.6218 -2.9606 -2.9664
0.6659 3.1013 4500 0.6807 -0.1206 -0.1490 0.6046 0.0284 -78.0785 -70.7742 -2.9587 -2.9645
0.666 3.1702 4600 0.6805 -0.1225 -0.1512 0.6062 0.0288 -78.3027 -70.9582 -2.9569 -2.9628
0.6644 3.2391 4700 0.6805 -0.1237 -0.1527 0.6059 0.0290 -78.4454 -71.0785 -2.9557 -2.9615
0.6685 3.3081 4800 0.6804 -0.1246 -0.1536 0.6053 0.0291 -78.5441 -71.1674 -2.9547 -2.9605
0.6651 3.3770 4900 0.6803 -0.1250 -0.1542 0.6039 0.0293 -78.6030 -71.2072 -2.9539 -2.9598
0.6689 3.4459 5000 0.6803 -0.1254 -0.1547 0.6062 0.0293 -78.6476 -71.2503 -2.9530 -2.9588
0.6653 3.5148 5100 0.6802 -0.1256 -0.1552 0.6050 0.0296 -78.6955 -71.2721 -2.9525 -2.9583
0.6664 3.5837 5200 0.6803 -0.1261 -0.1556 0.6046 0.0295 -78.7380 -71.3226 -2.9519 -2.9577
0.6687 3.6527 5300 0.6803 -0.1265 -0.1559 0.6064 0.0294 -78.7701 -71.3572 -2.9516 -2.9574
0.6641 3.7216 5400 0.6803 -0.1266 -0.1560 0.6059 0.0294 -78.7822 -71.3690 -2.9514 -2.9573
0.6637 3.7905 5500 0.6803 -0.1265 -0.1559 0.6053 0.0295 -78.7736 -71.3579 -2.9516 -2.9575
0.6694 3.8594 5600 0.6802 -0.1265 -0.1561 0.6036 0.0296 -78.7869 -71.3611 -2.9515 -2.9574
0.6684 3.9283 5700 0.6803 -0.1266 -0.1560 0.6071 0.0294 -78.7792 -71.3707 -2.9512 -2.9571
0.6668 3.9972 5800 0.6803 -0.1265 -0.1560 0.6036 0.0295 -78.7771 -71.3634 -2.9512 -2.9570

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
1.1B params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_4epochs_old