yakazimir's picture
End of training
058d247 verified
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - alignment-handbook
  - trl
  - simpo
  - generated_from_trainer
  - trl
  - simpo
  - generated_from_trainer
datasets:
  - yakazimir/ultrafeedback_binarized
model-index:
  - name: qwen_cpo_entropy_0_01
    results: []

qwen_cpo_entropy_0_01

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5583
  • Sft Loss: 3.4705
  • Rewards/chosen: -3.3285
  • Rewards/rejected: -4.3810
  • Rewards/accuracies: 0.7226
  • Rewards/margins: 1.0525
  • Logps/rejected: -4.3810
  • Logps/chosen: -3.3285
  • Logits/rejected: 0.2811
  • Logits/chosen: 0.1563

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Sft Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7019 0.2141 400 0.6977 1.4219 -1.4375 -1.6032 0.5631 0.1657 -1.6032 -1.4375 0.2993 0.2138
0.6225 0.4282 800 0.6192 2.0573 -2.0770 -2.5396 0.6669 0.4626 -2.5396 -2.0770 0.3429 0.2570
0.6242 0.6422 1200 0.5882 2.6279 -2.4850 -3.1039 0.6973 0.6190 -3.1039 -2.4850 0.5237 0.4102
0.5405 0.8563 1600 0.5781 2.5442 -2.4160 -3.0202 0.7092 0.6042 -3.0202 -2.4160 0.4122 0.3042
0.6195 1.0704 2000 0.5673 2.7121 -2.5451 -3.2527 0.7129 0.7076 -3.2527 -2.5451 0.4573 0.3371
0.5895 1.2845 2400 0.5590 3.0631 -2.8962 -3.7486 0.7322 0.8524 -3.7486 -2.8962 0.3362 0.2174
0.5512 1.4986 2800 0.5563 2.9053 -2.7513 -3.5751 0.7203 0.8238 -3.5751 -2.7513 0.2892 0.1750
0.5766 1.7127 3200 0.5520 2.9643 -2.8134 -3.6655 0.7263 0.8522 -3.6655 -2.8134 0.2677 0.1562
0.5625 1.9267 3600 0.5478 3.0563 -2.8597 -3.7385 0.7255 0.8788 -3.7385 -2.8597 0.3670 0.2441
0.4702 2.1408 4000 0.5592 3.5119 -3.3071 -4.3285 0.7240 1.0214 -4.3285 -3.3071 0.2395 0.1198
0.4882 2.3549 4400 0.5601 3.5201 -3.3795 -4.4355 0.7270 1.0560 -4.4355 -3.3795 0.2852 0.1603
0.4952 2.5690 4800 0.5580 3.4402 -3.3065 -4.3570 0.7233 1.0505 -4.3570 -3.3065 0.3210 0.1936
0.4272 2.7831 5200 0.5579 3.4523 -3.3138 -4.3619 0.7233 1.0481 -4.3619 -3.3138 0.3592 0.2281
0.459 2.9972 5600 0.5583 3.4705 -3.3285 -4.3810 0.7226 1.0525 -4.3810 -3.3285 0.2811 0.1563

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1