Edit model card

mistral-7b-dpo-qlora-2ep

This model is a fine-tuned version of mimicheng/mistral-7b-sft-qlora-2ep on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6446
  • Rewards/chosen: -0.4217
  • Rewards/rejected: -0.5814
  • Rewards/accuracies: 0.6290
  • Rewards/margins: 0.1596
  • Logps/rejected: -1409.8003
  • Logps/chosen: -1604.7235
  • Logits/rejected: -2.6937
  • Logits/chosen: -2.7021

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6932 0.03 100 0.6931 0.0001 0.0002 0.4940 -0.0001 -1351.6440 -1562.5353 -2.7909 -2.7984
0.6923 0.05 200 0.6925 0.0045 0.0029 0.5119 0.0016 -1351.3734 -1562.0991 -2.7899 -2.7974
0.6937 0.08 300 0.6909 0.0097 0.0052 0.5377 0.0045 -1351.1462 -1561.5815 -2.7872 -2.7945
0.6867 0.1 400 0.6893 0.0145 0.0060 0.5595 0.0085 -1351.0632 -1561.1024 -2.7853 -2.7923
0.6921 0.13 500 0.6867 0.0007 -0.0122 0.5734 0.0129 -1352.8849 -1562.4756 -2.7829 -2.7893
0.6895 0.16 600 0.6838 0.0046 -0.0162 0.5913 0.0208 -1353.2866 -1562.0875 -2.7740 -2.7806
0.6792 0.18 700 0.6819 -0.0194 -0.0440 0.5992 0.0246 -1356.0621 -1564.4910 -2.7592 -2.7657
0.6802 0.21 800 0.6791 -0.0527 -0.0819 0.5813 0.0293 -1359.8597 -1567.8170 -2.7551 -2.7611
0.6812 0.24 900 0.6772 -0.0403 -0.0826 0.5714 0.0423 -1359.9243 -1566.5771 -2.7588 -2.7655
0.6714 0.26 1000 0.6746 -0.0886 -0.1361 0.5714 0.0475 -1365.2759 -1571.4064 -2.7418 -2.7476
0.676 0.29 1100 0.6744 -0.1141 -0.1733 0.5893 0.0592 -1368.9943 -1573.9617 -2.7433 -2.7505
0.6779 0.31 1200 0.6703 -0.1056 -0.1703 0.5933 0.0647 -1368.6935 -1573.1090 -2.7431 -2.7511
0.6888 0.34 1300 0.6676 -0.1136 -0.1850 0.5972 0.0713 -1370.1599 -1573.9121 -2.7375 -2.7452
0.6664 0.37 1400 0.6669 -0.1425 -0.2165 0.6071 0.0739 -1373.3110 -1576.8027 -2.7302 -2.7375
0.6705 0.39 1500 0.6665 -0.1804 -0.2701 0.6071 0.0897 -1378.6722 -1580.5913 -2.7481 -2.7546
0.6411 0.42 1600 0.6653 -0.1924 -0.2728 0.6329 0.0804 -1378.9417 -1581.7911 -2.7249 -2.7317
0.665 0.44 1700 0.6644 -0.1967 -0.2789 0.6131 0.0823 -1379.5565 -1582.2147 -2.7355 -2.7422
0.6563 0.47 1800 0.6639 -0.2073 -0.2940 0.6210 0.0867 -1381.0635 -1583.2751 -2.7257 -2.7325
0.6668 0.5 1900 0.6620 -0.2260 -0.3252 0.6171 0.0992 -1384.1846 -1585.1470 -2.7350 -2.7426
0.6632 0.52 2000 0.6605 -0.1924 -0.2828 0.6329 0.0904 -1379.9453 -1581.7920 -2.7371 -2.7449
0.6427 0.55 2100 0.6597 -0.2106 -0.3114 0.6230 0.1007 -1382.8007 -1583.6138 -2.7260 -2.7333
0.6923 0.58 2200 0.6592 -0.2129 -0.3178 0.6230 0.1049 -1383.4486 -1583.8400 -2.7175 -2.7243
0.6496 0.6 2300 0.6581 -0.2352 -0.3443 0.6290 0.1091 -1386.0916 -1586.0706 -2.7159 -2.7235
0.6668 0.63 2400 0.6577 -0.2503 -0.3563 0.6290 0.1061 -1387.2981 -1587.5769 -2.7321 -2.7410
0.6477 0.65 2500 0.6560 -0.2661 -0.3858 0.6310 0.1196 -1390.2400 -1589.1620 -2.7287 -2.7370
0.6444 0.68 2600 0.6550 -0.2830 -0.3993 0.6270 0.1163 -1391.5975 -1590.8505 -2.7240 -2.7330
0.6594 0.71 2700 0.6566 -0.3546 -0.4862 0.6190 0.1316 -1400.2867 -1598.0084 -2.6748 -2.6818
0.6329 0.73 2800 0.6544 -0.2748 -0.3936 0.625 0.1189 -1391.0292 -1590.0247 -2.6985 -2.7063
0.6351 0.76 2900 0.6545 -0.2928 -0.4152 0.6270 0.1224 -1393.1847 -1591.8256 -2.7050 -2.7136
0.6724 0.79 3000 0.6528 -0.3067 -0.4418 0.6448 0.1351 -1395.8458 -1593.2202 -2.6986 -2.7069
0.6413 0.81 3100 0.6514 -0.3153 -0.4541 0.6548 0.1388 -1397.0781 -1594.0812 -2.6892 -2.6985
0.6242 0.84 3200 0.6523 -0.3197 -0.4618 0.6349 0.1421 -1397.8459 -1594.5162 -2.7123 -2.7206
0.6773 0.86 3300 0.6506 -0.3038 -0.4433 0.6508 0.1395 -1395.9939 -1592.9280 -2.7042 -2.7136
0.6531 0.89 3400 0.6505 -0.3036 -0.4426 0.6329 0.1390 -1395.9207 -1592.9099 -2.6620 -2.6712
0.6499 0.92 3500 0.6504 -0.3509 -0.4975 0.6448 0.1467 -1401.4177 -1597.6368 -2.6611 -2.6701
0.6439 0.94 3600 0.6509 -0.3522 -0.4975 0.6349 0.1453 -1401.4176 -1597.7729 -2.6758 -2.6841
0.6279 0.97 3700 0.6505 -0.4035 -0.5500 0.6310 0.1466 -1406.6675 -1602.8950 -2.6918 -2.7012
0.6443 0.99 3800 0.6497 -0.3970 -0.5441 0.6290 0.1471 -1406.0728 -1602.2509 -2.6876 -2.6965
0.6355 1.02 3900 0.6484 -0.3538 -0.4986 0.6349 0.1449 -1401.5294 -1597.9247 -2.6950 -2.7039
0.6683 1.05 4000 0.6482 -0.3608 -0.5119 0.6349 0.1511 -1402.8545 -1598.6262 -2.6992 -2.7080
0.6459 1.07 4100 0.6475 -0.3305 -0.4760 0.6448 0.1455 -1399.2634 -1595.5988 -2.6852 -2.6944
0.6451 1.1 4200 0.6471 -0.3471 -0.4991 0.6369 0.1519 -1401.5713 -1597.2633 -2.6954 -2.7042
0.6744 1.13 4300 0.6483 -0.3619 -0.5112 0.6429 0.1493 -1402.7870 -1598.7428 -2.7008 -2.7095
0.6355 1.15 4400 0.6477 -0.4040 -0.5558 0.6270 0.1518 -1407.2480 -1602.9531 -2.6916 -2.7001
0.6187 1.18 4500 0.6472 -0.4050 -0.5534 0.6349 0.1485 -1407.0084 -1603.0441 -2.6883 -2.6963
0.6555 1.2 4600 0.6472 -0.3883 -0.5354 0.6310 0.1471 -1405.2079 -1601.3826 -2.7075 -2.7168
0.6178 1.23 4700 0.6476 -0.3993 -0.5414 0.6190 0.1422 -1405.8092 -1602.4763 -2.6912 -2.7006
0.6242 1.26 4800 0.6477 -0.4302 -0.5746 0.625 0.1444 -1409.1267 -1605.5714 -2.6917 -2.7016
0.6221 1.28 4900 0.6464 -0.3848 -0.5302 0.6349 0.1454 -1404.6871 -1601.0272 -2.7073 -2.7167
0.6582 1.31 5000 0.6460 -0.3995 -0.5463 0.6310 0.1468 -1406.2927 -1602.5012 -2.7174 -2.7268
0.6276 1.33 5100 0.6458 -0.4048 -0.5543 0.6310 0.1495 -1407.0914 -1603.0245 -2.7192 -2.7281
0.6573 1.36 5200 0.6452 -0.4069 -0.5580 0.6290 0.1512 -1407.4680 -1603.2344 -2.7142 -2.7230
0.6672 1.39 5300 0.6458 -0.4020 -0.5504 0.6329 0.1485 -1406.7059 -1602.7441 -2.6997 -2.7080
0.6112 1.41 5400 0.6460 -0.4035 -0.5510 0.6290 0.1475 -1406.7632 -1602.8997 -2.6953 -2.7036
0.6421 1.44 5500 0.6449 -0.3915 -0.5414 0.6409 0.1499 -1405.8010 -1601.6963 -2.6991 -2.7081
0.658 1.47 5600 0.6451 -0.4023 -0.5553 0.6429 0.1530 -1407.1986 -1602.7803 -2.6938 -2.7027
0.6437 1.49 5700 0.6454 -0.4050 -0.5555 0.6389 0.1505 -1407.2163 -1603.0527 -2.6883 -2.6972
0.6289 1.52 5800 0.6443 -0.3986 -0.5520 0.6468 0.1534 -1406.8611 -1602.4105 -2.7007 -2.7094
0.6361 1.54 5900 0.6442 -0.4036 -0.5574 0.6409 0.1538 -1407.4087 -1602.9125 -2.6962 -2.7047
0.6374 1.57 6000 0.6446 -0.4164 -0.5717 0.6429 0.1553 -1408.8311 -1604.1853 -2.6963 -2.7048
0.6423 1.6 6100 0.6448 -0.4212 -0.5781 0.6349 0.1569 -1409.4735 -1604.6692 -2.6905 -2.6992
0.6611 1.62 6200 0.6453 -0.4344 -0.5916 0.625 0.1572 -1410.8239 -1605.9866 -2.6925 -2.7010
0.6355 1.65 6300 0.6451 -0.4325 -0.5909 0.625 0.1584 -1410.7570 -1605.8035 -2.6922 -2.7008
0.6555 1.67 6400 0.6451 -0.4326 -0.5912 0.6230 0.1586 -1410.7894 -1605.8125 -2.6935 -2.7021
0.6584 1.7 6500 0.6449 -0.4310 -0.5905 0.6270 0.1595 -1410.7151 -1605.6461 -2.6900 -2.6987
0.6371 1.73 6600 0.6448 -0.4266 -0.5864 0.6310 0.1598 -1410.3033 -1605.2112 -2.6897 -2.6985
0.6051 1.75 6700 0.6446 -0.4220 -0.5821 0.6329 0.1601 -1409.8746 -1604.7469 -2.6927 -2.7012
0.6136 1.78 6800 0.6446 -0.4219 -0.5822 0.6310 0.1603 -1409.8861 -1604.7394 -2.6940 -2.7024
0.6503 1.81 6900 0.6445 -0.4222 -0.5826 0.6349 0.1603 -1409.9208 -1604.7736 -2.6947 -2.7030
0.6318 1.83 7000 0.6445 -0.4216 -0.5817 0.6329 0.1601 -1409.8387 -1604.7111 -2.6925 -2.7010
0.6493 1.86 7100 0.6445 -0.4215 -0.5815 0.6329 0.1600 -1409.8179 -1604.7026 -2.6940 -2.7025
0.6292 1.88 7200 0.6446 -0.4217 -0.5816 0.6329 0.1599 -1409.8223 -1604.7195 -2.6943 -2.7027
0.625 1.91 7300 0.6445 -0.4215 -0.5816 0.6329 0.1600 -1409.8219 -1604.7013 -2.6937 -2.7022
0.6306 1.94 7400 0.6446 -0.4218 -0.5814 0.6290 0.1596 -1409.8014 -1604.7244 -2.6937 -2.7021
0.6446 1.96 7500 0.6446 -0.4217 -0.5814 0.6290 0.1596 -1409.8003 -1604.7235 -2.6937 -2.7021
0.6394 1.99 7600 0.6446 -0.4217 -0.5814 0.6290 0.1596 -1409.8003 -1604.7235 -2.6937 -2.7021

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.0
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for mimicheng/mistral-7b-dpo-qlora-2ep

Adapter
(1171)
this model

Dataset used to train mimicheng/mistral-7b-dpo-qlora-2ep