Edit model card

llama-gsm-real-and-synthetic-sftsd2

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9739
  • Num Input Tokens Seen: 3582416

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.5881 0
1.2436 0.0429 5 1.4398 152424
1.0386 0.0857 10 1.2453 312080
0.9477 0.1286 15 1.1613 467744
0.9171 0.1715 20 1.1153 624840
0.9103 0.2144 25 1.0952 780224
0.8238 0.2572 30 1.0748 936608
0.8472 0.3001 35 1.0563 1092128
0.8196 0.3430 40 1.0417 1250736
0.7769 0.3859 45 1.0217 1400344
0.7825 0.4287 50 1.0084 1552728
0.768 0.4716 55 1.0008 1700648
0.7492 0.5145 60 0.9968 1850360
0.8147 0.5573 65 0.9917 2002688
0.766 0.6002 70 0.9894 2161608
0.7926 0.6431 75 0.9865 2318744
0.7766 0.6860 80 0.9862 2477088
0.7827 0.7288 85 0.9799 2632344
0.7605 0.7717 90 0.9819 2784768
0.7443 0.8146 95 0.9775 2938072
0.7146 0.8574 100 0.9778 3095408
0.7503 0.9003 105 0.9770 3250064
0.7265 0.9432 110 0.9759 3400968
0.8001 0.9861 115 0.9747 3553016

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
3.21B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/llama3b-real-and-synthetic-sftsd2

Finetuned
(110)
this model