PEFT
Safetensors
English
qwen2
Generated from Trainer
Edit model card

pancho-v1-qw25-3B-UNAMGS

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct: It achieves the following results on the evaluation set:

  • Loss: 0.6555 pancho-v1-qw25-3B-UNAMGS

Built with Axolotl

Model description

Trained with MagPie:

  • Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
  • Magpie-Align/Magpie-Pro-MT-300K-v0.1

UNA on MLPs 4, 10, 16, 22, 28

MGS on 3 Scales.

Following https://arxiv.org/abs//2410.21228 facts.

License & Derivatives

Any derivative (sft, merges, etc) using ANY layer from this model MUST include either UNA or MGS or PANCHO in their model name in order to obtain a LICENSE for derivatives of this model.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 256
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
1.2127 0.0015 1 0.8711
0.9905 0.0509 35 0.7338
0.9685 0.1019 70 0.7114
0.9554 0.1528 105 0.6994
0.9077 0.2037 140 0.6915
0.9149 0.2547 175 0.6859
0.9363 0.3056 210 0.6795
0.8975 0.3566 245 0.6745
0.9095 0.4075 280 0.6709
0.9216 0.4584 315 0.6681
0.9143 0.5094 350 0.6666
0.8879 0.5603 385 0.6645
0.9194 0.6112 420 0.6625
0.9123 0.6622 455 0.6615
0.9056 0.7131 490 0.6591
0.9172 0.7641 525 0.6578
0.886 0.8150 560 0.6566
0.9155 0.8659 595 0.6568
0.9029 0.9169 630 0.6560
0.8942 0.9678 665 0.6555

Framework versions

  • PEFT 0.13.2
  • Transformers 4.45.2
  • Pytorch 2.3.0+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.1#
Downloads last month
0
Safetensors
Model size
3.4B params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for fblgit/pancho-v1-qw25-3B-UNAMGS

Base model

Qwen/Qwen2.5-3B
Adapter
(26)
this model

Datasets used to train fblgit/pancho-v1-qw25-3B-UNAMGS