CharlesLi's picture
Model save
aeb82f8 verified
metadata
library_name: transformers
tags:
  - trl
  - dpo
  - alignment-handbook
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-max-4-reward
    results: []

OpenELM-1_1B-DPO-full-max-4-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6190
  • Rewards/chosen: -13.625
  • Rewards/rejected: -15.0625
  • Rewards/accuracies: 0.5996
  • Rewards/margins: 1.4688
  • Logps/rejected: -1800.0
  • Logps/chosen: -1680.0
  • Logits/rejected: 1.0625
  • Logits/chosen: -0.2695

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6378 0.0838 80 0.6868 -0.6758 -0.7656 0.5684 0.0918 -366.0 -386.0 -9.875 -10.125
0.6219 0.1675 160 0.6949 -0.9102 -1.0547 0.5977 0.1406 -394.0 -410.0 -10.125 -10.5
0.6151 0.2513 240 0.7637 -2.4531 -2.6562 0.5566 0.2031 -552.0 -564.0 -10.9375 -11.25
0.6607 0.3351 320 0.7307 -2.7344 -2.9375 0.5742 0.1992 -584.0 -592.0 -14.25 -14.4375
0.6304 0.4188 400 0.7129 -2.7344 -3.0156 0.5898 0.2715 -588.0 -592.0 -12.5 -13.0
0.623 0.5026 480 0.7718 -2.5469 -2.9375 0.5859 0.3887 -584.0 -572.0 -8.0625 -9.0
0.6091 0.5864 560 0.7543 -3.3281 -3.6562 0.5957 0.3320 -656.0 -652.0 -12.0 -12.75
0.583 0.6702 640 0.7081 -3.25 -3.7031 0.6406 0.4648 -660.0 -644.0 -9.0 -10.0625
0.6183 0.7539 720 0.7397 -3.7812 -4.0938 0.5996 0.3242 -700.0 -696.0 -8.5625 -9.4375
0.5988 0.8377 800 0.7986 -4.4688 -4.9375 0.5898 0.4609 -784.0 -764.0 -7.9062 -8.9375
0.5882 0.9215 880 0.7997 -3.2656 -3.6562 0.5879 0.3906 -656.0 -644.0 -8.3125 -9.1875
0.4256 1.0052 960 0.7816 -4.5312 -5.1875 0.6172 0.6367 -808.0 -772.0 -6.75 -7.9062
0.2006 1.0890 1040 0.9734 -5.9688 -6.6875 0.6094 0.7383 -960.0 -916.0 -4.7812 -6.0625
0.1977 1.1728 1120 0.9420 -6.25 -7.0 0.6094 0.7578 -988.0 -944.0 -5.0 -6.25
0.1717 1.2565 1200 1.0548 -7.4688 -8.25 0.5918 0.7852 -1112.0 -1064.0 -4.5 -5.8125
0.1881 1.3403 1280 0.9567 -6.9688 -7.8125 0.6035 0.8672 -1072.0 -1012.0 -3.2188 -4.4688
0.1897 1.4241 1360 0.9563 -6.9688 -7.8438 0.6055 0.8867 -1072.0 -1016.0 -4.2812 -5.6875
0.1383 1.5079 1440 1.1196 -8.5625 -9.5 0.6055 0.9922 -1240.0 -1176.0 -2.5938 -3.9062
0.146 1.5916 1520 1.0767 -9.5 -10.5 0.6055 1.0078 -1336.0 -1264.0 -1.6797 -3.0312
0.1831 1.6754 1600 0.9776 -8.0625 -8.9375 0.6055 0.8516 -1184.0 -1128.0 -2.2344 -3.5938
0.1667 1.7592 1680 1.0210 -7.75 -8.625 0.5957 0.9023 -1152.0 -1088.0 -1.7344 -3.2344
0.1514 1.8429 1760 1.0214 -8.6875 -9.6875 0.6133 0.9805 -1256.0 -1184.0 -1.1719 -2.5312
0.1594 1.9267 1840 1.0633 -8.8125 -9.75 0.5977 0.9727 -1264.0 -1200.0 -1.2344 -2.625
0.0307 2.0105 1920 1.0948 -8.75 -9.75 0.6172 1.0312 -1264.0 -1192.0 -1.4531 -2.9844
0.0214 2.0942 2000 1.5354 -12.25 -13.3125 0.6094 1.1016 -1624.0 -1544.0 0.1973 -1.2031
0.0186 2.1780 2080 1.5790 -13.5625 -14.9375 0.6055 1.3906 -1784.0 -1680.0 0.4902 -0.9102
0.0395 2.2618 2160 1.5234 -12.0625 -13.1875 0.6035 1.1406 -1608.0 -1520.0 0.5391 -0.7656
0.0217 2.3455 2240 1.5867 -13.1875 -14.5625 0.6035 1.375 -1744.0 -1632.0 0.8945 -0.4141
0.0268 2.4293 2320 1.5888 -13.0 -14.375 0.6035 1.4219 -1728.0 -1616.0 0.6797 -0.6758
0.0238 2.5131 2400 1.6647 -13.625 -15.0625 0.6055 1.4453 -1792.0 -1680.0 0.9648 -0.3633
0.0227 2.5969 2480 1.5873 -13.125 -14.5625 0.6094 1.4375 -1744.0 -1632.0 0.9258 -0.4199
0.0233 2.6806 2560 1.5836 -13.1875 -14.625 0.6035 1.4297 -1752.0 -1640.0 0.9297 -0.4180
0.021 2.7644 2640 1.5917 -13.4375 -14.9375 0.6094 1.4609 -1776.0 -1664.0 1.0078 -0.3223
0.0221 2.8482 2720 1.6077 -13.5625 -15.0 0.6035 1.4609 -1792.0 -1672.0 1.0469 -0.2793
0.0182 2.9319 2800 1.6190 -13.625 -15.0625 0.5996 1.4688 -1800.0 -1680.0 1.0625 -0.2695

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.3.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0