metadata
license: llama3
base_model: tsavage68/UTI_L3_1000steps_1e5rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: UTI2_L3_1000steps_1e8rate_05beta_CSFTDPO
results: []
UTI2_L3_1000steps_1e8rate_05beta_CSFTDPO
This model is a fine-tuned version of tsavage68/UTI_L3_1000steps_1e5rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6861
- Rewards/chosen: 0.0038
- Rewards/rejected: -0.0115
- Rewards/accuracies: 0.5700
- Rewards/margins: 0.0153
- Logps/rejected: -43.2926
- Logps/chosen: -29.2173
- Logits/rejected: -1.1413
- Logits/chosen: -1.1366
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-08
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6931 | 0.3333 | 25 | 0.6915 | -0.0002 | -0.0037 | 0.1400 | 0.0035 | -43.2769 | -29.2254 | -1.1409 | -1.1362 |
0.6961 | 0.6667 | 50 | 0.6900 | 0.0080 | 0.0004 | 0.5400 | 0.0076 | -43.2687 | -29.2089 | -1.1412 | -1.1366 |
0.6921 | 1.0 | 75 | 0.6942 | 0.0092 | 0.0102 | 0.4800 | -0.0009 | -43.2492 | -29.2065 | -1.1411 | -1.1364 |
0.704 | 1.3333 | 100 | 0.6899 | 0.0056 | -0.0021 | 0.5300 | 0.0077 | -43.2737 | -29.2137 | -1.1409 | -1.1362 |
0.6864 | 1.6667 | 125 | 0.6926 | 0.0045 | 0.0023 | 0.4900 | 0.0022 | -43.2650 | -29.2159 | -1.1412 | -1.1365 |
0.6943 | 2.0 | 150 | 0.6906 | 0.0065 | 0.0000 | 0.5100 | 0.0065 | -43.2695 | -29.2120 | -1.1408 | -1.1361 |
0.6937 | 2.3333 | 175 | 0.6933 | 0.0015 | 0.0005 | 0.4100 | 0.0010 | -43.2685 | -29.2218 | -1.1411 | -1.1364 |
0.6941 | 2.6667 | 200 | 0.6931 | -0.0088 | -0.0105 | 0.4800 | 0.0017 | -43.2904 | -29.2424 | -1.1409 | -1.1362 |
0.6989 | 3.0 | 225 | 0.6949 | -0.0114 | -0.0092 | 0.4600 | -0.0022 | -43.2879 | -29.2476 | -1.1413 | -1.1366 |
0.6963 | 3.3333 | 250 | 0.6911 | 0.0010 | -0.0042 | 0.5600 | 0.0052 | -43.2779 | -29.2229 | -1.1411 | -1.1364 |
0.6985 | 3.6667 | 275 | 0.6947 | -0.0007 | 0.0016 | 0.4600 | -0.0023 | -43.2662 | -29.2262 | -1.1412 | -1.1366 |
0.6913 | 4.0 | 300 | 0.6916 | 0.0052 | 0.0008 | 0.4600 | 0.0045 | -43.2680 | -29.2144 | -1.1411 | -1.1364 |
0.6947 | 4.3333 | 325 | 0.6874 | 0.0095 | -0.0032 | 0.6400 | 0.0127 | -43.2759 | -29.2059 | -1.1410 | -1.1363 |
0.6953 | 4.6667 | 350 | 0.6890 | 0.0021 | -0.0077 | 0.5900 | 0.0097 | -43.2849 | -29.2208 | -1.1411 | -1.1364 |
0.6909 | 5.0 | 375 | 0.6911 | 0.0011 | -0.0042 | 0.5400 | 0.0054 | -43.2780 | -29.2226 | -1.1411 | -1.1364 |
0.6978 | 5.3333 | 400 | 0.6909 | -0.0022 | -0.0078 | 0.5200 | 0.0056 | -43.2852 | -29.2293 | -1.1411 | -1.1364 |
0.6712 | 5.6667 | 425 | 0.6894 | 0.0095 | 0.0008 | 0.5200 | 0.0088 | -43.2679 | -29.2058 | -1.1411 | -1.1365 |
0.6964 | 6.0 | 450 | 0.6905 | -0.0019 | -0.0085 | 0.5300 | 0.0066 | -43.2864 | -29.2286 | -1.1409 | -1.1362 |
0.6885 | 6.3333 | 475 | 0.6906 | -0.0011 | -0.0072 | 0.5300 | 0.0061 | -43.2840 | -29.2272 | -1.1410 | -1.1363 |
0.6912 | 6.6667 | 500 | 0.6918 | 0.0055 | 0.0016 | 0.5100 | 0.0040 | -43.2664 | -29.2138 | -1.1413 | -1.1367 |
0.6905 | 7.0 | 525 | 0.6853 | 0.0074 | -0.0095 | 0.6100 | 0.0169 | -43.2885 | -29.2101 | -1.1410 | -1.1363 |
0.6963 | 7.3333 | 550 | 0.6884 | 0.0098 | -0.0009 | 0.5500 | 0.0108 | -43.2714 | -29.2052 | -1.1412 | -1.1365 |
0.691 | 7.6667 | 575 | 0.6884 | 0.0022 | -0.0085 | 0.5600 | 0.0107 | -43.2864 | -29.2205 | -1.1411 | -1.1363 |
0.688 | 8.0 | 600 | 0.6865 | 0.0118 | -0.0026 | 0.6100 | 0.0144 | -43.2748 | -29.2014 | -1.1412 | -1.1365 |
0.6795 | 8.3333 | 625 | 0.6862 | 0.0137 | -0.0012 | 0.5800 | 0.0149 | -43.2720 | -29.1975 | -1.1412 | -1.1365 |
0.701 | 8.6667 | 650 | 0.6906 | -0.0046 | -0.0108 | 0.5600 | 0.0061 | -43.2910 | -29.2342 | -1.1412 | -1.1365 |
0.7056 | 9.0 | 675 | 0.6882 | 0.0133 | 0.0020 | 0.5700 | 0.0113 | -43.2656 | -29.1983 | -1.1412 | -1.1365 |
0.7065 | 9.3333 | 700 | 0.6862 | 0.0042 | -0.0109 | 0.5500 | 0.0151 | -43.2912 | -29.2165 | -1.1412 | -1.1366 |
0.6944 | 9.6667 | 725 | 0.6907 | 0.0123 | 0.0063 | 0.5200 | 0.0060 | -43.2568 | -29.2003 | -1.1413 | -1.1366 |
0.6972 | 10.0 | 750 | 0.6900 | 0.0025 | -0.0048 | 0.5 | 0.0073 | -43.2791 | -29.2199 | -1.1413 | -1.1366 |
0.6913 | 10.3333 | 775 | 0.6856 | 0.0048 | -0.0113 | 0.5800 | 0.0161 | -43.2921 | -29.2153 | -1.1413 | -1.1366 |
0.6961 | 10.6667 | 800 | 0.6860 | 0.0033 | -0.0122 | 0.5700 | 0.0154 | -43.2938 | -29.2184 | -1.1413 | -1.1366 |
0.6994 | 11.0 | 825 | 0.6861 | 0.0038 | -0.0115 | 0.5700 | 0.0153 | -43.2926 | -29.2173 | -1.1413 | -1.1366 |
0.6964 | 11.3333 | 850 | 0.6861 | 0.0038 | -0.0115 | 0.5700 | 0.0153 | -43.2926 | -29.2173 | -1.1413 | -1.1366 |
0.698 | 11.6667 | 875 | 0.6861 | 0.0038 | -0.0115 | 0.5700 | 0.0153 | -43.2926 | -29.2173 | -1.1413 | -1.1366 |
0.692 | 12.0 | 900 | 0.6861 | 0.0038 | -0.0115 | 0.5700 | 0.0153 | -43.2926 | -29.2173 | -1.1413 | -1.1366 |
0.6928 | 12.3333 | 925 | 0.6861 | 0.0038 | -0.0115 | 0.5700 | 0.0153 | -43.2926 | -29.2173 | -1.1413 | -1.1366 |
0.6871 | 12.6667 | 950 | 0.6861 | 0.0038 | -0.0115 | 0.5700 | 0.0153 | -43.2926 | -29.2173 | -1.1413 | -1.1366 |
0.6778 | 13.0 | 975 | 0.6861 | 0.0038 | -0.0115 | 0.5700 | 0.0153 | -43.2926 | -29.2173 | -1.1413 | -1.1366 |
0.7052 | 13.3333 | 1000 | 0.6861 | 0.0038 | -0.0115 | 0.5700 | 0.0153 | -43.2926 | -29.2173 | -1.1413 | -1.1366 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.0.0+cu117
- Datasets 2.19.2
- Tokenizers 0.19.1