tsavage68's picture
End of training
ee341d5 verified
|
raw
history blame
9.71 kB
metadata
license: llama3
base_model: tsavage68/UTI_L3_1000steps_1e5rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: UTI2_L3_1000steps_1e8rate_05beta_CSFTDPO
    results: []

UTI2_L3_1000steps_1e8rate_05beta_CSFTDPO

This model is a fine-tuned version of tsavage68/UTI_L3_1000steps_1e5rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6861
  • Rewards/chosen: 0.0038
  • Rewards/rejected: -0.0115
  • Rewards/accuracies: 0.5700
  • Rewards/margins: 0.0153
  • Logps/rejected: -43.2926
  • Logps/chosen: -29.2173
  • Logits/rejected: -1.1413
  • Logits/chosen: -1.1366

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.3333 25 0.6915 -0.0002 -0.0037 0.1400 0.0035 -43.2769 -29.2254 -1.1409 -1.1362
0.6961 0.6667 50 0.6900 0.0080 0.0004 0.5400 0.0076 -43.2687 -29.2089 -1.1412 -1.1366
0.6921 1.0 75 0.6942 0.0092 0.0102 0.4800 -0.0009 -43.2492 -29.2065 -1.1411 -1.1364
0.704 1.3333 100 0.6899 0.0056 -0.0021 0.5300 0.0077 -43.2737 -29.2137 -1.1409 -1.1362
0.6864 1.6667 125 0.6926 0.0045 0.0023 0.4900 0.0022 -43.2650 -29.2159 -1.1412 -1.1365
0.6943 2.0 150 0.6906 0.0065 0.0000 0.5100 0.0065 -43.2695 -29.2120 -1.1408 -1.1361
0.6937 2.3333 175 0.6933 0.0015 0.0005 0.4100 0.0010 -43.2685 -29.2218 -1.1411 -1.1364
0.6941 2.6667 200 0.6931 -0.0088 -0.0105 0.4800 0.0017 -43.2904 -29.2424 -1.1409 -1.1362
0.6989 3.0 225 0.6949 -0.0114 -0.0092 0.4600 -0.0022 -43.2879 -29.2476 -1.1413 -1.1366
0.6963 3.3333 250 0.6911 0.0010 -0.0042 0.5600 0.0052 -43.2779 -29.2229 -1.1411 -1.1364
0.6985 3.6667 275 0.6947 -0.0007 0.0016 0.4600 -0.0023 -43.2662 -29.2262 -1.1412 -1.1366
0.6913 4.0 300 0.6916 0.0052 0.0008 0.4600 0.0045 -43.2680 -29.2144 -1.1411 -1.1364
0.6947 4.3333 325 0.6874 0.0095 -0.0032 0.6400 0.0127 -43.2759 -29.2059 -1.1410 -1.1363
0.6953 4.6667 350 0.6890 0.0021 -0.0077 0.5900 0.0097 -43.2849 -29.2208 -1.1411 -1.1364
0.6909 5.0 375 0.6911 0.0011 -0.0042 0.5400 0.0054 -43.2780 -29.2226 -1.1411 -1.1364
0.6978 5.3333 400 0.6909 -0.0022 -0.0078 0.5200 0.0056 -43.2852 -29.2293 -1.1411 -1.1364
0.6712 5.6667 425 0.6894 0.0095 0.0008 0.5200 0.0088 -43.2679 -29.2058 -1.1411 -1.1365
0.6964 6.0 450 0.6905 -0.0019 -0.0085 0.5300 0.0066 -43.2864 -29.2286 -1.1409 -1.1362
0.6885 6.3333 475 0.6906 -0.0011 -0.0072 0.5300 0.0061 -43.2840 -29.2272 -1.1410 -1.1363
0.6912 6.6667 500 0.6918 0.0055 0.0016 0.5100 0.0040 -43.2664 -29.2138 -1.1413 -1.1367
0.6905 7.0 525 0.6853 0.0074 -0.0095 0.6100 0.0169 -43.2885 -29.2101 -1.1410 -1.1363
0.6963 7.3333 550 0.6884 0.0098 -0.0009 0.5500 0.0108 -43.2714 -29.2052 -1.1412 -1.1365
0.691 7.6667 575 0.6884 0.0022 -0.0085 0.5600 0.0107 -43.2864 -29.2205 -1.1411 -1.1363
0.688 8.0 600 0.6865 0.0118 -0.0026 0.6100 0.0144 -43.2748 -29.2014 -1.1412 -1.1365
0.6795 8.3333 625 0.6862 0.0137 -0.0012 0.5800 0.0149 -43.2720 -29.1975 -1.1412 -1.1365
0.701 8.6667 650 0.6906 -0.0046 -0.0108 0.5600 0.0061 -43.2910 -29.2342 -1.1412 -1.1365
0.7056 9.0 675 0.6882 0.0133 0.0020 0.5700 0.0113 -43.2656 -29.1983 -1.1412 -1.1365
0.7065 9.3333 700 0.6862 0.0042 -0.0109 0.5500 0.0151 -43.2912 -29.2165 -1.1412 -1.1366
0.6944 9.6667 725 0.6907 0.0123 0.0063 0.5200 0.0060 -43.2568 -29.2003 -1.1413 -1.1366
0.6972 10.0 750 0.6900 0.0025 -0.0048 0.5 0.0073 -43.2791 -29.2199 -1.1413 -1.1366
0.6913 10.3333 775 0.6856 0.0048 -0.0113 0.5800 0.0161 -43.2921 -29.2153 -1.1413 -1.1366
0.6961 10.6667 800 0.6860 0.0033 -0.0122 0.5700 0.0154 -43.2938 -29.2184 -1.1413 -1.1366
0.6994 11.0 825 0.6861 0.0038 -0.0115 0.5700 0.0153 -43.2926 -29.2173 -1.1413 -1.1366
0.6964 11.3333 850 0.6861 0.0038 -0.0115 0.5700 0.0153 -43.2926 -29.2173 -1.1413 -1.1366
0.698 11.6667 875 0.6861 0.0038 -0.0115 0.5700 0.0153 -43.2926 -29.2173 -1.1413 -1.1366
0.692 12.0 900 0.6861 0.0038 -0.0115 0.5700 0.0153 -43.2926 -29.2173 -1.1413 -1.1366
0.6928 12.3333 925 0.6861 0.0038 -0.0115 0.5700 0.0153 -43.2926 -29.2173 -1.1413 -1.1366
0.6871 12.6667 950 0.6861 0.0038 -0.0115 0.5700 0.0153 -43.2926 -29.2173 -1.1413 -1.1366
0.6778 13.0 975 0.6861 0.0038 -0.0115 0.5700 0.0153 -43.2926 -29.2173 -1.1413 -1.1366
0.7052 13.3333 1000 0.6861 0.0038 -0.0115 0.5700 0.0153 -43.2926 -29.2173 -1.1413 -1.1366

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.2
  • Tokenizers 0.19.1