Edit model card

llama-gsm-real-and-synthetic-sftsd1

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9747
  • Num Input Tokens Seen: 3594944

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.5937 0
1.2701 0.0429 5 1.4359 158456
1.0045 0.0857 10 1.2327 313520
1.0011 0.1286 15 1.1670 463960
0.9687 0.1715 20 1.1227 616216
0.8884 0.2144 25 1.1006 768608
0.879 0.2572 30 1.0785 933288
0.8624 0.3001 35 1.0626 1083272
0.8277 0.3430 40 1.0467 1244152
0.8307 0.3859 45 1.0267 1396808
0.791 0.4287 50 1.0107 1555096
0.7929 0.4716 55 1.0002 1710680
0.7695 0.5145 60 0.9954 1864128
0.7651 0.5573 65 0.9924 2018480
0.7788 0.6002 70 0.9886 2173056
0.7423 0.6431 75 0.9863 2326744
0.7635 0.6860 80 0.9835 2483616
0.7709 0.7288 85 0.9826 2640104
0.7663 0.7717 90 0.9797 2796664
0.7859 0.8146 95 0.9783 2950688
0.7699 0.8574 100 0.9772 3107872
0.7484 0.9003 105 0.9769 3258376
0.7532 0.9432 110 0.9740 3411448
0.7386 0.9861 115 0.9756 3567688

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
3.21B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/llama3b-real-and-synthetic-sftsd1

Finetuned
(114)
this model