qwen_qfUNL_entropy / README.md
yakazimir's picture
End of training
48c9c6f verified
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - alignment-handbook
  - trl
  - simpo
  - generated_from_trainer
  - trl
  - simpo
  - generated_from_trainer
datasets:
  - yakazimir/ultrafeedback_binarized
model-index:
  - name: qwen_qfUNL_entropy
    results: []

qwen_qfUNL_entropy

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6510
  • Rewards/chosen: -1.7989
  • Rewards/rejected: -2.5830
  • Rewards/accuracies: 0.6736
  • Rewards/margins: 0.7841
  • Logps/rejected: -2.5830
  • Logps/chosen: -1.7989
  • Logits/rejected: 0.0192
  • Logits/chosen: -0.0604

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6781 0.2141 400 0.6873 -1.6444 -1.8233 0.5475 0.1789 -1.8233 -1.6444 0.2857 0.1996
0.6757 0.4282 800 0.6641 -1.6348 -1.9815 0.6239 0.3467 -1.9815 -1.6348 0.3665 0.2730
0.6569 0.6422 1200 0.6602 -1.7060 -2.1644 0.6424 0.4584 -2.1644 -1.7060 0.2601 0.1749
0.6562 0.8563 1600 0.6584 -1.8368 -2.3836 0.6513 0.5468 -2.3836 -1.8368 0.1796 0.0944
0.6883 1.0704 2000 0.6545 -1.7098 -2.2986 0.6639 0.5888 -2.2986 -1.7098 0.2146 0.1248
0.6581 1.2845 2400 0.6533 -1.7444 -2.3861 0.6691 0.6417 -2.3861 -1.7444 0.1530 0.0644
0.6444 1.4986 2800 0.6537 -1.7815 -2.4833 0.6684 0.7018 -2.4833 -1.7815 0.0665 -0.0145
0.6575 1.7127 3200 0.6520 -1.7922 -2.5114 0.6654 0.7192 -2.5114 -1.7922 0.1107 0.0260
0.6481 1.9267 3600 0.6507 -1.7358 -2.4632 0.6736 0.7275 -2.4632 -1.7358 0.0939 0.0113
0.607 2.1408 4000 0.6506 -1.7686 -2.5161 0.6751 0.7475 -2.5161 -1.7686 0.0842 0.0005
0.6294 2.3549 4400 0.6514 -1.8215 -2.5986 0.6714 0.7771 -2.5986 -1.8215 0.0008 -0.0778
0.6098 2.5690 4800 0.6507 -1.7918 -2.5693 0.6766 0.7775 -2.5693 -1.7918 0.0735 -0.0103
0.6302 2.7831 5200 0.6507 -1.7943 -2.5780 0.6751 0.7837 -2.5780 -1.7943 0.0395 -0.0418
0.6181 2.9972 5600 0.6510 -1.7989 -2.5830 0.6736 0.7841 -2.5830 -1.7989 0.0192 -0.0604

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1