Edit model card

sft

This model is a fine-tuned version of Qwen/Qwen2.5-32B-Instruct on the eedi dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8951

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
3.2332 0.1626 20 1.2680
1.0315 0.3252 40 0.9880
0.8677 0.4878 60 0.9157
0.8893 0.6504 80 0.8641
0.8072 0.8130 100 0.8326
0.7722 0.9756 120 0.7950
0.5838 1.1382 140 0.8270
0.6009 1.3008 160 0.7669
0.5373 1.4634 180 0.7591
0.5617 1.6260 200 0.7382
0.5768 1.7886 220 0.7313
0.5072 1.9512 240 0.7281
0.3148 2.1138 260 0.7919
0.2612 2.2764 280 0.9314
0.2222 2.4390 300 0.9256
0.2427 2.6016 320 0.8956
0.2289 2.7642 340 0.8932
0.1885 2.9268 360 0.8958

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for improved-barnacle/abdullah-qwen25-32b-it-lora-sft-v1

Base model

Qwen/Qwen2.5-32B
Adapter
(10)
this model