metadata
license: llama3
base_model: tsavage68/UTI_L3_1000steps_1e5rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: UTI_L3_1000steps_1e5rate_01beta_CSFTDPO
results: []
UTI_L3_1000steps_1e5rate_01beta_CSFTDPO
This model is a fine-tuned version of tsavage68/UTI_L3_1000steps_1e5rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0069
- Rewards/chosen: -2.8206
- Rewards/rejected: -21.2510
- Rewards/accuracies: 0.9900
- Rewards/margins: 18.4304
- Logps/rejected: -275.7048
- Logps/chosen: -60.6847
- Logits/rejected: -2.0149
- Logits/chosen: -1.9810
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.0002 | 0.6667 | 50 | 0.0072 | 0.7766 | -13.0912 | 0.9900 | 13.8678 | -194.1069 | -24.7133 | -1.5529 | -1.4952 |
0.0173 | 1.3333 | 100 | 0.0086 | -1.2523 | -14.7999 | 0.9900 | 13.5476 | -211.1941 | -45.0020 | -1.7169 | -1.6585 |
0.0371 | 2.0 | 150 | 0.0069 | -2.9050 | -20.9463 | 0.9900 | 18.0414 | -272.6581 | -61.5287 | -2.0084 | -1.9771 |
0.0 | 2.6667 | 200 | 0.0069 | -2.8291 | -21.1059 | 0.9900 | 18.2768 | -274.2534 | -60.7697 | -2.0121 | -1.9789 |
0.0173 | 3.3333 | 250 | 0.0069 | -2.8268 | -21.1156 | 0.9900 | 18.2889 | -274.3510 | -60.7466 | -2.0124 | -1.9791 |
0.0347 | 4.0 | 300 | 0.0069 | -2.8254 | -21.1309 | 0.9900 | 18.3055 | -274.5038 | -60.7333 | -2.0126 | -1.9792 |
0.0173 | 4.6667 | 350 | 0.0069 | -2.8156 | -21.1516 | 0.9900 | 18.3360 | -274.7103 | -60.6348 | -2.0131 | -1.9796 |
0.0173 | 5.3333 | 400 | 0.0069 | -2.8155 | -21.1665 | 0.9900 | 18.3511 | -274.8600 | -60.6336 | -2.0133 | -1.9797 |
0.0173 | 6.0 | 450 | 0.0069 | -2.8146 | -21.1758 | 0.9900 | 18.3612 | -274.9522 | -60.6250 | -2.0136 | -1.9799 |
0.0347 | 6.6667 | 500 | 0.0069 | -2.8128 | -21.1899 | 0.9900 | 18.3771 | -275.0935 | -60.6071 | -2.0140 | -1.9802 |
0.0 | 7.3333 | 550 | 0.0069 | -2.8143 | -21.2087 | 0.9900 | 18.3944 | -275.2815 | -60.6221 | -2.0143 | -1.9804 |
0.0347 | 8.0 | 600 | 0.0069 | -2.8161 | -21.2215 | 0.9900 | 18.4054 | -275.4096 | -60.6400 | -2.0144 | -1.9805 |
0.0 | 8.6667 | 650 | 0.0069 | -2.8197 | -21.2301 | 0.9900 | 18.4104 | -275.4954 | -60.6758 | -2.0147 | -1.9807 |
0.0173 | 9.3333 | 700 | 0.0069 | -2.8217 | -21.2410 | 0.9900 | 18.4193 | -275.6051 | -60.6962 | -2.0148 | -1.9809 |
0.0 | 10.0 | 750 | 0.0069 | -2.8204 | -21.2414 | 0.9900 | 18.4210 | -275.6092 | -60.6834 | -2.0148 | -1.9809 |
0.0173 | 10.6667 | 800 | 0.0069 | -2.8221 | -21.2513 | 0.9900 | 18.4292 | -275.7073 | -60.7001 | -2.0148 | -1.9808 |
0.0 | 11.3333 | 850 | 0.0069 | -2.8219 | -21.2497 | 0.9900 | 18.4278 | -275.6921 | -60.6985 | -2.0148 | -1.9808 |
0.0 | 12.0 | 900 | 0.0069 | -2.8223 | -21.2528 | 0.9900 | 18.4305 | -275.7229 | -60.7022 | -2.0151 | -1.9811 |
0.0173 | 12.6667 | 950 | 0.0069 | -2.8218 | -21.2512 | 0.9900 | 18.4295 | -275.7072 | -60.6970 | -2.0149 | -1.9810 |
0.0 | 13.3333 | 1000 | 0.0069 | -2.8206 | -21.2510 | 0.9900 | 18.4304 | -275.7048 | -60.6847 | -2.0149 | -1.9810 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.0.0+cu117
- Datasets 2.19.2
- Tokenizers 0.19.1