llama-gsm-real-and-synthetic-sftsd2

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9739
Num Input Tokens Seen: 3582416

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.5881	0
1.2436	0.0429	5	1.4398	152424
1.0386	0.0857	10	1.2453	312080
0.9477	0.1286	15	1.1613	467744
0.9171	0.1715	20	1.1153	624840
0.9103	0.2144	25	1.0952	780224
0.8238	0.2572	30	1.0748	936608
0.8472	0.3001	35	1.0563	1092128
0.8196	0.3430	40	1.0417	1250736
0.7769	0.3859	45	1.0217	1400344
0.7825	0.4287	50	1.0084	1552728
0.768	0.4716	55	1.0008	1700648
0.7492	0.5145	60	0.9968	1850360
0.8147	0.5573	65	0.9917	2002688
0.766	0.6002	70	0.9894	2161608
0.7926	0.6431	75	0.9865	2318744
0.7766	0.6860	80	0.9862	2477088
0.7827	0.7288	85	0.9799	2632344
0.7605	0.7717	90	0.9819	2784768
0.7443	0.8146	95	0.9775	2938072
0.7146	0.8574	100	0.9778	3095408
0.7503	0.9003	105	0.9770	3250064
0.7265	0.9432	110	0.9759	3400968
0.8001	0.9861	115	0.9747	3553016

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

jkazdan
/

llama3b-real-and-synthetic-sftsd2

llama-gsm-real-and-synthetic-sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/llama3b-real-and-synthetic-sftsd2

Evaluation results