tsavage68's picture
End of training
55d1d29 verified
metadata
library_name: transformers
license: apache-2.0
base_model: tsavage68/IE_M2_1000steps_1e7rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: IE_M2_1000steps_1e8rate_05beta_cSFTDPO
    results: []

IE_M2_1000steps_1e8rate_05beta_cSFTDPO

This model is a fine-tuned version of tsavage68/IE_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5967
  • Rewards/chosen: 0.0013
  • Rewards/rejected: -0.2212
  • Rewards/accuracies: 0.4600
  • Rewards/margins: 0.2225
  • Logps/rejected: -41.4643
  • Logps/chosen: -42.2029
  • Logits/rejected: -2.9153
  • Logits/chosen: -2.8540

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6945 0.4 50 0.6933 0.0121 0.0098 0.2450 0.0023 -41.0022 -42.1813 -2.9159 -2.8545
0.6936 0.8 100 0.6888 0.0052 -0.0069 0.2150 0.0121 -41.0356 -42.1952 -2.9158 -2.8545
0.6628 1.2 150 0.6642 0.0025 -0.0598 0.3650 0.0623 -41.1414 -42.2005 -2.9158 -2.8545
0.6553 1.6 200 0.6439 -0.0046 -0.1128 0.4350 0.1083 -41.2475 -42.2147 -2.9156 -2.8543
0.6399 2.0 250 0.6211 -0.0017 -0.1629 0.4600 0.1612 -41.3475 -42.2089 -2.9153 -2.8541
0.622 2.4 300 0.6110 -0.0080 -0.1940 0.4600 0.1859 -41.4097 -42.2216 -2.9155 -2.8542
0.6063 2.8 350 0.6052 -0.0027 -0.2028 0.4550 0.2001 -41.4274 -42.2109 -2.9153 -2.8540
0.6243 3.2 400 0.6005 -0.0031 -0.2152 0.4600 0.2121 -41.4523 -42.2118 -2.9154 -2.8541
0.6262 3.6 450 0.6019 -0.0015 -0.2107 0.4600 0.2092 -41.4433 -42.2085 -2.9154 -2.8541
0.6281 4.0 500 0.5955 -0.0090 -0.2341 0.4600 0.2251 -41.4900 -42.2236 -2.9151 -2.8538
0.5897 4.4 550 0.5966 -0.0012 -0.2244 0.4600 0.2232 -41.4706 -42.2079 -2.9151 -2.8538
0.5987 4.8 600 0.5991 -0.0063 -0.2220 0.4600 0.2157 -41.4659 -42.2182 -2.9153 -2.8541
0.6188 5.2 650 0.5972 -0.0053 -0.2271 0.4550 0.2218 -41.4759 -42.2160 -2.9154 -2.8541
0.6165 5.6 700 0.6060 -0.0050 -0.2031 0.4550 0.1981 -41.4281 -42.2156 -2.9153 -2.8540
0.5861 6.0 750 0.6007 -0.0033 -0.2152 0.4600 0.2119 -41.4523 -42.2121 -2.9154 -2.8540
0.5445 6.4 800 0.5984 -0.0069 -0.2252 0.4600 0.2183 -41.4722 -42.2193 -2.9153 -2.8539
0.6228 6.8 850 0.5987 -0.0027 -0.2205 0.4600 0.2178 -41.4628 -42.2110 -2.9153 -2.8539
0.5741 7.2 900 0.5967 0.0013 -0.2212 0.4600 0.2225 -41.4643 -42.2029 -2.9153 -2.8540
0.5819 7.6 950 0.5967 0.0013 -0.2212 0.4600 0.2225 -41.4643 -42.2029 -2.9153 -2.8540
0.607 8.0 1000 0.5967 0.0013 -0.2212 0.4600 0.2225 -41.4643 -42.2029 -2.9153 -2.8540

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.0+cu117
  • Datasets 3.0.0
  • Tokenizers 0.19.1