tsavage68's picture
End of training
282008d verified
metadata
license: llama3
base_model: tsavage68/UTI_L3_1000steps_1e5rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: UTI_L3_1000steps_1e5rate_01beta_CSFTDPO
    results: []

UTI_L3_1000steps_1e5rate_01beta_CSFTDPO

This model is a fine-tuned version of tsavage68/UTI_L3_1000steps_1e5rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0069
  • Rewards/chosen: -2.8206
  • Rewards/rejected: -21.2510
  • Rewards/accuracies: 0.9900
  • Rewards/margins: 18.4304
  • Logps/rejected: -275.7048
  • Logps/chosen: -60.6847
  • Logits/rejected: -2.0149
  • Logits/chosen: -1.9810

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0002 0.6667 50 0.0072 0.7766 -13.0912 0.9900 13.8678 -194.1069 -24.7133 -1.5529 -1.4952
0.0173 1.3333 100 0.0086 -1.2523 -14.7999 0.9900 13.5476 -211.1941 -45.0020 -1.7169 -1.6585
0.0371 2.0 150 0.0069 -2.9050 -20.9463 0.9900 18.0414 -272.6581 -61.5287 -2.0084 -1.9771
0.0 2.6667 200 0.0069 -2.8291 -21.1059 0.9900 18.2768 -274.2534 -60.7697 -2.0121 -1.9789
0.0173 3.3333 250 0.0069 -2.8268 -21.1156 0.9900 18.2889 -274.3510 -60.7466 -2.0124 -1.9791
0.0347 4.0 300 0.0069 -2.8254 -21.1309 0.9900 18.3055 -274.5038 -60.7333 -2.0126 -1.9792
0.0173 4.6667 350 0.0069 -2.8156 -21.1516 0.9900 18.3360 -274.7103 -60.6348 -2.0131 -1.9796
0.0173 5.3333 400 0.0069 -2.8155 -21.1665 0.9900 18.3511 -274.8600 -60.6336 -2.0133 -1.9797
0.0173 6.0 450 0.0069 -2.8146 -21.1758 0.9900 18.3612 -274.9522 -60.6250 -2.0136 -1.9799
0.0347 6.6667 500 0.0069 -2.8128 -21.1899 0.9900 18.3771 -275.0935 -60.6071 -2.0140 -1.9802
0.0 7.3333 550 0.0069 -2.8143 -21.2087 0.9900 18.3944 -275.2815 -60.6221 -2.0143 -1.9804
0.0347 8.0 600 0.0069 -2.8161 -21.2215 0.9900 18.4054 -275.4096 -60.6400 -2.0144 -1.9805
0.0 8.6667 650 0.0069 -2.8197 -21.2301 0.9900 18.4104 -275.4954 -60.6758 -2.0147 -1.9807
0.0173 9.3333 700 0.0069 -2.8217 -21.2410 0.9900 18.4193 -275.6051 -60.6962 -2.0148 -1.9809
0.0 10.0 750 0.0069 -2.8204 -21.2414 0.9900 18.4210 -275.6092 -60.6834 -2.0148 -1.9809
0.0173 10.6667 800 0.0069 -2.8221 -21.2513 0.9900 18.4292 -275.7073 -60.7001 -2.0148 -1.9808
0.0 11.3333 850 0.0069 -2.8219 -21.2497 0.9900 18.4278 -275.6921 -60.6985 -2.0148 -1.9808
0.0 12.0 900 0.0069 -2.8223 -21.2528 0.9900 18.4305 -275.7229 -60.7022 -2.0151 -1.9811
0.0173 12.6667 950 0.0069 -2.8218 -21.2512 0.9900 18.4295 -275.7072 -60.6970 -2.0149 -1.9810
0.0 13.3333 1000 0.0069 -2.8206 -21.2510 0.9900 18.4304 -275.7048 -60.6847 -2.0149 -1.9810

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.2
  • Tokenizers 0.19.1