Edit model card

SequentialFinetuningFromFolder

This model was trained from scratch on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2211

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 21 1.0511
No log 2.0 42 1.0333
No log 3.0 63 1.0322
No log 4.0 84 1.0051
No log 5.0 105 0.9618
No log 6.0 126 0.9609
No log 7.0 147 0.9380
No log 8.0 168 0.9073
No log 9.0 189 0.9030
No log 10.0 210 0.8811
No log 11.0 231 0.8631
No log 12.0 252 0.8400
No log 13.0 273 0.8242
No log 14.0 294 0.8133
No log 15.0 315 0.7857
No log 16.0 336 0.7744
No log 17.0 357 0.7548
No log 18.0 378 0.7549
No log 19.0 399 0.7296
No log 20.0 420 0.7169
No log 21.0 441 0.7140
No log 22.0 462 0.7026
No log 23.0 483 0.7175
1.0179 24.0 504 0.6831
1.0179 25.0 525 0.6882
1.0179 26.0 546 0.6455
1.0179 27.0 567 0.6317
1.0179 28.0 588 0.6396
1.0179 29.0 609 0.6132
1.0179 30.0 630 0.5885
1.0179 31.0 651 0.5800
1.0179 32.0 672 0.5700
1.0179 33.0 693 0.5673
1.0179 34.0 714 0.5524
1.0179 35.0 735 0.5310
1.0179 36.0 756 0.5249
1.0179 37.0 777 0.5148
1.0179 38.0 798 0.5246
1.0179 39.0 819 0.4967
1.0179 40.0 840 0.4841
1.0179 41.0 861 0.4822
1.0179 42.0 882 0.4694
1.0179 43.0 903 0.4598
1.0179 44.0 924 0.4503
1.0179 45.0 945 0.4428
1.0179 46.0 966 0.4243
1.0179 47.0 987 0.4163
0.7797 48.0 1008 0.4187
0.7797 49.0 1029 0.4110
0.7797 50.0 1050 0.4013
0.7797 51.0 1071 0.4099
0.7797 52.0 1092 0.3870
0.7797 53.0 1113 0.3818
0.7797 54.0 1134 0.3783
0.7797 55.0 1155 0.3621
0.7797 56.0 1176 0.3591
0.7797 57.0 1197 0.3608
0.7797 58.0 1218 0.3447
0.7797 59.0 1239 0.3444
0.7797 60.0 1260 0.3390
0.7797 61.0 1281 0.3310
0.7797 62.0 1302 0.3201
0.7797 63.0 1323 0.3250
0.7797 64.0 1344 0.3115
0.7797 65.0 1365 0.3015
0.7797 66.0 1386 0.3014
0.7797 67.0 1407 0.3081
0.7797 68.0 1428 0.2892
0.7797 69.0 1449 0.3034
0.7797 70.0 1470 0.2828
0.7797 71.0 1491 0.2790
0.6123 72.0 1512 0.2727
0.6123 73.0 1533 0.2809
0.6123 74.0 1554 0.2694
0.6123 75.0 1575 0.2636
0.6123 76.0 1596 0.2613
0.6123 77.0 1617 0.2557
0.6123 78.0 1638 0.2529
0.6123 79.0 1659 0.2575
0.6123 80.0 1680 0.2539
0.6123 81.0 1701 0.2540
0.6123 82.0 1722 0.2423
0.6123 83.0 1743 0.2406
0.6123 84.0 1764 0.2383
0.6123 85.0 1785 0.2358
0.6123 86.0 1806 0.2371
0.6123 87.0 1827 0.2352
0.6123 88.0 1848 0.2335
0.6123 89.0 1869 0.2297
0.6123 90.0 1890 0.2305
0.6123 91.0 1911 0.2264
0.6123 92.0 1932 0.2255
0.6123 93.0 1953 0.2273
0.6123 94.0 1974 0.2220
0.6123 95.0 1995 0.2240
0.5063 96.0 2016 0.2214
0.5063 97.0 2037 0.2219
0.5063 98.0 2058 0.2202
0.5063 99.0 2079 0.2211
0.5063 100.0 2100 0.2211

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
12M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.