PEFT
Safetensors
qwen2
alignment-handbook
trl
dpo
Generated from Trainer

Qwen2-7B-Instruct-SPPO-Function-call-v2.6

This model is a fine-tuned version of slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.5 on the slm-research-vn/dpo-format-function-calling-v4, the slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4 and the argilla/dpo-mix-7k datasets. It achieves the following results on the evaluation set:

  • Loss: 0.3005
  • Rewards/chosen: 1.6737
  • Rewards/rejected: -0.4932
  • Rewards/accuracies: 0.8699
  • Rewards/margins: 2.1670
  • Logps/rejected: -276.8380
  • Logps/chosen: -200.9362
  • Logits/rejected: -0.6568
  • Logits/chosen: -0.6408

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6437 0.0916 100 0.6128 0.3050 0.0739 0.7254 0.2311 -265.4963 -228.3116 -0.7319 -0.7206
0.5175 0.1832 200 0.4987 1.1265 0.2914 0.8237 0.8351 -261.1460 -211.8815 -0.7134 -0.7068
0.3903 0.2749 300 0.4279 1.7297 0.4889 0.8468 1.2408 -257.1960 -199.8173 -0.6700 -0.6642
0.3712 0.3665 400 0.3781 1.7272 0.2255 0.8468 1.5017 -262.4645 -199.8672 -0.6756 -0.6691
0.3064 0.4581 500 0.3477 1.7220 -0.0183 0.8613 1.7403 -267.3389 -199.9704 -0.6642 -0.6488
0.3054 0.5497 600 0.3271 1.6469 -0.1977 0.8671 1.8447 -270.9281 -201.4723 -0.6576 -0.6407
0.2919 0.6413 700 0.3144 1.7376 -0.3034 0.8642 2.0410 -273.0414 -199.6590 -0.6753 -0.6672
0.314 0.7329 800 0.3056 1.7037 -0.4229 0.8671 2.1266 -275.4323 -200.3379 -0.6685 -0.6574
0.3014 0.8246 900 0.3020 1.6807 -0.4632 0.8699 2.1439 -276.2374 -200.7971 -0.6702 -0.6641
0.268 0.9162 1000 0.2999 1.6798 -0.4929 0.8844 2.1726 -276.8312 -200.8157 -0.6690 -0.6635

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for khongtrunght/Qwen2-7B-Instruct-SPPO-Function-call-v2.6

Dataset used to train khongtrunght/Qwen2-7B-Instruct-SPPO-Function-call-v2.6