sft

This model is a fine-tuned version of Qwen/Qwen2.5-32B-Instruct on the eedi dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training Loss	Epoch	Step	Validation Loss
3.2332	0.1626	20	1.2680
1.0315	0.3252	40	0.9880
0.8677	0.4878	60	0.9157
0.8893	0.6504	80	0.8641
0.8072	0.8130	100	0.8326
0.7722	0.9756	120	0.7950
0.5838	1.1382	140	0.8270
0.6009	1.3008	160	0.7669
0.5373	1.4634	180	0.7591
0.5617	1.6260	200	0.7382
0.5768	1.7886	220	0.7313
0.5072	1.9512	240	0.7281
0.3148	2.1138	260	0.7919
0.2612	2.2764	280	0.9314
0.2222	2.4390	300	0.9256
0.2427	2.6016	320	0.8956
0.2289	2.7642	340	0.8932
0.1885	2.9268	360	0.8958