yakazimir's picture
End of training
c4764f5 verified
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - alignment-handbook
  - trl
  - simpo
  - generated_from_trainer
  - trl
  - simpo
  - generated_from_trainer
datasets:
  - yakazimir/ultrafeedback_binarized
model-index:
  - name: qwen_l21_entropy_0_01
    results: []

qwen_l21_entropy_0_01

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6901
  • Sft Loss: 2.1331
  • Rewards/chosen: -2.1707
  • Rewards/rejected: -3.2270
  • Rewards/accuracies: 0.6914
  • Rewards/margins: 1.0563
  • Logps/rejected: -3.2270
  • Logps/chosen: -2.1707
  • Logits/rejected: 0.2151
  • Logits/chosen: 0.1185

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Sft Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7149 0.2141 400 0.7232 2.1337 -3.3125 -3.5682 0.5200 0.2557 -3.5682 -3.3125 0.5534 0.4407
0.7105 0.4282 800 0.7055 2.1066 -2.2353 -2.7243 0.6447 0.4890 -2.7243 -2.2353 0.3870 0.2857
0.7071 0.6422 1200 0.6988 2.0445 -2.1363 -2.7640 0.6691 0.6278 -2.7640 -2.1363 0.6763 0.5552
0.6909 0.8563 1600 0.6951 2.2316 -2.3067 -3.0785 0.6825 0.7718 -3.0785 -2.3067 0.0414 -0.0345
0.6992 1.0704 2000 0.6927 2.0672 -2.1384 -2.9634 0.6766 0.8250 -2.9634 -2.1384 0.1253 0.0374
0.6894 1.2845 2400 0.6908 2.1132 -2.1527 -3.0987 0.6810 0.9460 -3.0987 -2.1527 0.3470 0.2424
0.6881 1.4986 2800 0.6908 2.1384 -2.2307 -3.1888 0.6862 0.9581 -3.1888 -2.2307 0.5238 0.4064
0.6998 1.7127 3200 0.6900 2.1093 -2.1719 -3.1258 0.6936 0.9539 -3.1258 -2.1719 0.2688 0.1694
0.6837 1.9267 3600 0.6898 2.1422 -2.2075 -3.2094 0.6966 1.0019 -3.2094 -2.2075 0.3036 0.1996
0.6446 2.1408 4000 0.6902 2.1614 -2.1867 -3.2140 0.6855 1.0273 -3.2140 -2.1867 0.2205 0.1222
0.6694 2.3549 4400 0.6887 2.1145 -2.1590 -3.1865 0.6921 1.0275 -3.1865 -2.1590 0.2474 0.1483
0.6722 2.5690 4800 0.6902 2.1289 -2.1610 -3.2026 0.6907 1.0415 -3.2026 -2.1610 0.2232 0.1258
0.6701 2.7831 5200 0.6904 2.1329 -2.1699 -3.2263 0.6929 1.0564 -3.2263 -2.1699 0.2407 0.1420
0.659 2.9972 5600 0.6901 2.1331 -2.1707 -3.2271 0.6914 1.0563 -3.2271 -2.1707 0.2151 0.1185

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1