Edit model card

amdchess-v4

This model is a fine-tuned version of amd/AMD-Llama-135m on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7971

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use grokadamw with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • num_epochs: 0.25

Training results

Training Loss Epoch Step Validation Loss
9.9629 0.0030 5 5.6096
3.7446 0.0059 10 3.3680
2.524 0.0089 15 2.3223
1.9286 0.0118 20 1.7446
1.5475 0.0148 25 2.0681
1.2838 0.0177 30 1.4096
1.3152 0.0207 35 1.2730
1.2488 0.0236 40 1.2203
1.088 0.0266 45 1.1461
1.0479 0.0295 50 1.1139
1.0758 0.0325 55 1.0844
1.1275 0.0354 60 1.0443
1.1378 0.0384 65 1.0260
1.0147 0.0413 70 0.9939
0.993 0.0443 75 1.0074
1.0132 0.0472 80 0.9866
0.9155 0.0502 85 0.9697
0.9656 0.0531 90 0.9757
1.0402 0.0561 95 0.9633
0.9759 0.0590 100 0.9528
0.9505 0.0620 105 0.9501
1.0114 0.0649 110 0.9405
1.0182 0.0679 115 0.9212
0.9396 0.0708 120 0.9284
0.902 0.0738 125 0.9262
0.9533 0.0767 130 0.9121
0.8755 0.0797 135 0.9160
0.9349 0.0826 140 0.9083
0.9585 0.0856 145 0.8993
0.8349 0.0885 150 0.9000
0.9541 0.0915 155 0.8887
0.9108 0.0945 160 0.8837
0.9196 0.0974 165 0.8806
0.9094 0.1004 170 0.8776
0.8514 0.1033 175 0.8759
0.7515 0.1063 180 0.8684
0.8031 0.1092 185 0.8676
0.8639 0.1122 190 0.8661
0.8002 0.1151 195 0.8556
0.7812 0.1181 200 0.8574
0.9163 0.1210 205 0.8582
0.8824 0.1240 210 0.8515
0.8759 0.1269 215 0.8502
0.8384 0.1299 220 0.8467
0.8436 0.1328 225 0.8427
0.8329 0.1358 230 0.8398
0.87 0.1387 235 0.8393
0.8405 0.1417 240 0.8356
0.8634 0.1446 245 0.8339
0.8298 0.1476 250 0.8315
0.7582 0.1505 255 0.8278
0.7912 0.1535 260 0.8257
0.8878 0.1564 265 0.8247
0.8443 0.1594 270 0.8229
0.8965 0.1623 275 0.8206
0.8298 0.1653 280 0.8178
0.7496 0.1682 285 0.8177
0.7794 0.1712 290 0.8148
0.8354 0.1741 295 0.8137
0.8861 0.1771 300 0.8124
0.7683 0.1800 305 0.8118
0.8414 0.1830 310 0.8106
0.8624 0.1860 315 0.8083
0.7753 0.1889 320 0.8076
0.778 0.1919 325 0.8060
0.8171 0.1948 330 0.8051
0.7006 0.1978 335 0.8049
0.8365 0.2007 340 0.8032
0.8057 0.2037 345 0.8021
0.7914 0.2066 350 0.8015
0.9043 0.2096 355 0.8008
0.8317 0.2125 360 0.8001
0.7631 0.2155 365 0.7997
0.8301 0.2184 370 0.7993
0.8701 0.2214 375 0.7988
0.7469 0.2243 380 0.7985
0.7643 0.2273 385 0.7981
0.8388 0.2302 390 0.7978
0.8808 0.2332 395 0.7975
0.7441 0.2361 400 0.7974
0.7641 0.2391 405 0.7972
0.727 0.2420 410 0.7971
0.771 0.2450 415 0.7971
0.7442 0.2479 420 0.7971

Framework versions

  • Transformers 4.46.0
  • Pytorch 2.5.0+cu121
  • Datasets 3.0.2
  • Tokenizers 0.20.1
Downloads last month
33
Safetensors
Model size
134M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nlpguy/amdchess-v4

Base model

amd/AMD-Llama-135m
Finetuned
(13)
this model