metadata

license: other
license_name: qwen
license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
datasets:
  - Magpie-Align/Magpie-Qwen2.5-Pro-300K-Filtered
base_model:
  - Qwen/Qwen2.5-7B-Instruct
library_name: transformers
tags:
  - generated_from_trainer
language:
  - en

cybertron-v4-qw7B-MGS

Introducing: cybertron-v4 based on Qwen2.5 7B SFT over Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1

Training procedure

1 Epoch as usual.

Training hyperparameters

The following hyperparameters were used during training:

seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 128
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
0.7405	0.0007	1	0.5760
0.6146	0.0502	71	0.5045
0.5908	0.1003	142	0.4930
0.5669	0.1505	213	0.4854
0.5575	0.2007	284	0.4811
0.535	0.2508	355	0.4765
0.5161	0.3010	426	0.4736
0.5268	0.3511	497	0.4726
0.5119	0.4013	568	0.4701
0.5329	0.4515	639	0.4687
0.5167	0.5016	710	0.4673
0.5105	0.5518	781	0.4660
0.5203	0.6020	852	0.4653
0.5035	0.6521	923	0.4646
0.4903	0.7023	994	0.4641
0.5031	0.7525	1065	0.4628
0.5147	0.8026	1136	0.4629
0.5037	0.8528	1207	0.4620
0.5029	0.9029	1278	0.4620
0.492	0.9531	1349	0.4621

Framework versions

PEFT 0.13.2
Transformers 4.45.2
Pytorch 2.3.0+cu121
Datasets 3.0.1
Tokenizers 0.20.1