Edit model card

OpenELM-1_1B-DPO-full-1

This model is a fine-tuned version of data/OpenELM-1_1B-SFT-1 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8127
  • Rewards/chosen: -7.4062
  • Rewards/rejected: -9.625
  • Rewards/accuracies: 0.7266
  • Rewards/margins: 2.2188
  • Logps/rejected: -1248.0
  • Logps/chosen: -1056.0
  • Logits/rejected: -1.5781
  • Logits/chosen: -4.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6194 0.1047 100 0.6171 -0.875 -1.1797 0.6758 0.3008 -406.0 -406.0 -10.75 -11.0
0.5947 0.2093 200 0.6038 -1.4531 -1.8359 0.6680 0.3848 -472.0 -464.0 -11.3125 -11.75
0.6583 0.3140 300 0.6007 -2.2344 -2.7344 0.6758 0.4941 -560.0 -544.0 -13.1875 -13.5
0.6003 0.4186 400 0.5892 -1.8359 -2.3906 0.7012 0.5586 -528.0 -502.0 -9.75 -10.3125
0.5701 0.5233 500 0.5772 -1.9688 -2.5 0.6875 0.5391 -540.0 -516.0 -10.5 -11.0
0.55 0.6279 600 0.5671 -2.6875 -3.4219 0.7129 0.7266 -632.0 -588.0 -9.5625 -10.4375
0.554 0.7326 700 0.5667 -2.625 -3.375 0.7285 0.75 -628.0 -580.0 -9.25 -10.0625
0.5478 0.8373 800 0.5699 -2.7188 -3.3906 0.7070 0.6602 -628.0 -592.0 -8.9375 -9.875
0.5759 0.9419 900 0.5660 -2.75 -3.4375 0.7090 0.6914 -632.0 -592.0 -10.25 -11.1875
0.2284 1.0466 1000 0.5897 -3.375 -4.5625 0.7305 1.1797 -744.0 -656.0 -6.8125 -8.8125
0.1919 1.1512 1100 0.5994 -3.7656 -4.9375 0.7266 1.1797 -784.0 -696.0 -8.375 -10.125
0.1942 1.2559 1200 0.6058 -4.5 -5.6562 0.7188 1.1719 -856.0 -768.0 -3.5469 -5.5
0.2071 1.3605 1300 0.5985 -4.3125 -5.4688 0.7441 1.1484 -836.0 -752.0 -6.1875 -7.7812
0.1811 1.4652 1400 0.6045 -5.375 -6.5625 0.7363 1.2109 -948.0 -856.0 -6.6562 -8.0
0.1715 1.5699 1500 0.6054 -4.7188 -6.0312 0.7383 1.3047 -892.0 -792.0 -7.1875 -8.6875
0.186 1.6745 1600 0.6277 -4.4688 -5.7188 0.7285 1.2344 -860.0 -768.0 -8.3125 -9.6875
0.1763 1.7792 1700 0.6386 -5.2188 -6.625 0.7246 1.4062 -952.0 -840.0 -5.5312 -7.4375
0.1678 1.8838 1800 0.6220 -4.5625 -5.8125 0.7246 1.2266 -868.0 -776.0 -6.8125 -8.4375
0.1563 1.9885 1900 0.6274 -5.5 -6.8438 0.7266 1.3672 -976.0 -868.0 -6.3438 -7.875
0.0144 2.0931 2000 0.7311 -6.4375 -8.1875 0.7305 1.7656 -1112.0 -960.0 -3.3281 -5.5
0.029 2.1978 2100 0.8195 -7.5312 -9.6875 0.7285 2.1719 -1256.0 -1072.0 -2.375 -4.75
0.0228 2.3025 2200 0.8282 -7.6875 -9.875 0.7188 2.2031 -1280.0 -1088.0 -1.9297 -4.375
0.0159 2.4071 2300 0.8055 -7.2188 -9.375 0.7266 2.1562 -1224.0 -1040.0 -2.0625 -4.4688
0.0192 2.5118 2400 0.7881 -6.9688 -9.0625 0.7207 2.0938 -1200.0 -1016.0 -2.3906 -4.7812
0.0158 2.6164 2500 0.8027 -7.3438 -9.5 0.7266 2.1562 -1240.0 -1056.0 -1.5312 -3.9375
0.0193 2.7211 2600 0.8205 -7.625 -9.875 0.7383 2.25 -1280.0 -1080.0 -1.1797 -3.5938
0.0229 2.8257 2700 0.8136 -7.4375 -9.625 0.7266 2.2188 -1256.0 -1064.0 -1.5391 -3.9531
0.0213 2.9304 2800 0.8121 -7.4062 -9.625 0.7285 2.2188 -1248.0 -1056.0 -1.5781 -4.0

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.3.0
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
1.08B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Dataset used to train CharlesLi/OpenELM-1_1B-DPO-full-1