mixtralyanis's picture
End of training
16f275c verified
|
raw
history blame
19.6 kB
metadata
license: apache-2.0
base_model: google/flan-t5-small
tags:
  - generated_from_trainer
model-index:
  - name: flant5-tuned-15-warmup
    results: []

flant5-tuned-15-warmup

This model is a fine-tuned version of google/flan-t5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3516

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 20
  • num_epochs: 15

Training results

Training Loss Epoch Step Validation Loss
2.1664 0.04 1 1.8055
2.3197 0.08 2 1.7996
1.8347 0.12 3 1.7885
1.945 0.17 4 1.7732
2.2942 0.21 5 1.7553
1.9688 0.25 6 1.7373
1.7152 0.29 7 1.7200
1.7729 0.33 8 1.7044
2.0102 0.38 9 1.6898
2.0015 0.42 10 1.6755
1.6719 0.46 11 1.6612
1.68 0.5 12 1.6452
1.9285 0.54 13 1.6288
1.8603 0.58 14 1.6130
2.2619 0.62 15 1.5958
1.6485 0.67 16 1.5784
1.845 0.71 17 1.5625
1.7137 0.75 18 1.5464
1.6553 0.79 19 1.5312
1.981 0.83 20 1.5161
1.3079 0.88 21 1.5038
1.5341 0.92 22 1.4930
1.6369 0.96 23 1.4844
2.5189 1.0 24 1.4767
1.3177 1.04 25 1.4702
1.4357 1.08 26 1.4646
1.7583 1.12 27 1.4581
1.6517 1.17 28 1.4533
1.7093 1.21 29 1.4485
1.3521 1.25 30 1.4442
1.4714 1.29 31 1.4399
1.7541 1.33 32 1.4364
1.5828 1.38 33 1.4329
1.64 1.42 34 1.4292
1.5735 1.46 35 1.4257
1.5155 1.5 36 1.4230
1.6881 1.54 37 1.4199
1.4118 1.58 38 1.4183
1.5981 1.62 39 1.4166
1.6888 1.67 40 1.4149
1.4802 1.71 41 1.4139
1.501 1.75 42 1.4119
1.4882 1.79 43 1.4111
1.7281 1.83 44 1.4096
1.5792 1.88 45 1.4080
1.5964 1.92 46 1.4055
1.1762 1.96 47 1.4035
1.2127 2.0 48 1.4023
1.5406 2.04 49 1.4004
1.6261 2.08 50 1.3995
1.6105 2.12 51 1.3981
1.3115 2.17 52 1.3966
1.4817 2.21 53 1.3953
1.3679 2.25 54 1.3942
1.3511 2.29 55 1.3923
1.4458 2.33 56 1.3910
1.1303 2.38 57 1.3908
1.3293 2.42 58 1.3906
1.248 2.46 59 1.3915
1.4307 2.5 60 1.3905
1.6081 2.54 61 1.3880
1.5352 2.58 62 1.3854
1.1685 2.62 63 1.3832
1.5049 2.67 64 1.3812
1.2554 2.71 65 1.3794
1.4978 2.75 66 1.3778
1.3643 2.79 67 1.3761
1.4122 2.83 68 1.3742
1.6356 2.88 69 1.3716
1.5886 2.92 70 1.3688
1.3843 2.96 71 1.3665
1.8691 3.0 72 1.3638
1.33 3.04 73 1.3620
1.2651 3.08 74 1.3605
1.5486 3.12 75 1.3590
1.347 3.17 76 1.3577
1.5603 3.21 77 1.3562
1.6223 3.25 78 1.3542
1.4045 3.29 79 1.3515
1.4092 3.33 80 1.3499
1.1476 3.38 81 1.3496
1.3087 3.42 82 1.3499
1.4861 3.46 83 1.3496
1.4168 3.5 84 1.3487
1.0794 3.54 85 1.3485
1.2572 3.58 86 1.3490
1.5438 3.62 87 1.3490
1.3175 3.67 88 1.3487
1.1355 3.71 89 1.3486
1.4005 3.75 90 1.3486
1.363 3.79 91 1.3487
1.2214 3.83 92 1.3497
1.0884 3.88 93 1.3507
1.3816 3.92 94 1.3517
1.3544 3.96 95 1.3529
1.0685 4.0 96 1.3544
1.4339 4.04 97 1.3563
1.2282 4.08 98 1.3575
1.1425 4.12 99 1.3585
1.4971 4.17 100 1.3585
1.1353 4.21 101 1.3589
1.3451 4.25 102 1.3596
1.2957 4.29 103 1.3600
1.2654 4.33 104 1.3598
1.2395 4.38 105 1.3592
1.1531 4.42 106 1.3587
1.1824 4.46 107 1.3580
1.3588 4.5 108 1.3569
1.3703 4.54 109 1.3548
1.5438 4.58 110 1.3519
1.1184 4.62 111 1.3497
1.3506 4.67 112 1.3466
1.5091 4.71 113 1.3437
1.3477 4.75 114 1.3413
1.3114 4.79 115 1.3395
1.1235 4.83 116 1.3385
1.189 4.88 117 1.3376
1.1811 4.92 118 1.3371
1.2506 4.96 119 1.3374
1.2154 5.0 120 1.3384
1.3547 5.04 121 1.3395
1.4633 5.08 122 1.3398
1.4047 5.12 123 1.3404
1.0155 5.17 124 1.3412
1.2121 5.21 125 1.3427
1.1646 5.25 126 1.3437
1.2765 5.29 127 1.3450
1.1937 5.33 128 1.3455
1.3145 5.38 129 1.3464
1.0305 5.42 130 1.3476
1.4225 5.46 131 1.3486
1.1455 5.5 132 1.3486
1.3314 5.54 133 1.3480
1.4563 5.58 134 1.3470
1.2709 5.62 135 1.3462
1.0006 5.67 136 1.3458
1.2831 5.71 137 1.3456
1.2246 5.75 138 1.3449
1.0799 5.79 139 1.3452
1.2161 5.83 140 1.3445
1.1016 5.88 141 1.3439
1.2136 5.92 142 1.3431
1.0087 5.96 143 1.3431
0.8238 6.0 144 1.3434
1.0138 6.04 145 1.3441
1.2912 6.08 146 1.3443
1.234 6.12 147 1.3444
1.1389 6.17 148 1.3437
1.3006 6.21 149 1.3426
0.978 6.25 150 1.3418
1.0744 6.29 151 1.3413
1.213 6.33 152 1.3418
1.0914 6.38 153 1.3429
1.2845 6.42 154 1.3437
1.1967 6.46 155 1.3445
0.9909 6.5 156 1.3452
1.1752 6.54 157 1.3458
1.3885 6.58 158 1.3461
1.1556 6.62 159 1.3464
0.994 6.67 160 1.3464
1.2133 6.71 161 1.3455
1.1922 6.75 162 1.3441
1.2964 6.79 163 1.3427
1.0437 6.83 164 1.3417
1.1666 6.88 165 1.3409
1.3587 6.92 166 1.3397
1.3096 6.96 167 1.3385
1.1133 7.0 168 1.3378
1.0738 7.04 169 1.3374
1.2147 7.08 170 1.3368
1.135 7.12 171 1.3363
1.2445 7.17 172 1.3357
1.1927 7.21 173 1.3348
1.1672 7.25 174 1.3336
1.0623 7.29 175 1.3332
1.1242 7.33 176 1.3329
1.2888 7.38 177 1.3328
1.196 7.42 178 1.3328
1.2507 7.46 179 1.3330
1.0763 7.5 180 1.3338
0.9774 7.54 181 1.3351
1.1876 7.58 182 1.3369
1.2101 7.62 183 1.3382
1.1968 7.67 184 1.3397
0.8876 7.71 185 1.3416
1.0407 7.75 186 1.3430
1.1468 7.79 187 1.3445
0.981 7.83 188 1.3458
1.1389 7.88 189 1.3465
1.2701 7.92 190 1.3470
1.2079 7.96 191 1.3471
1.4571 8.0 192 1.3471
1.3467 8.04 193 1.3473
1.1879 8.08 194 1.3466
1.0661 8.12 195 1.3459
1.0822 8.17 196 1.3451
0.7801 8.21 197 1.3453
1.1876 8.25 198 1.3451
1.1006 8.29 199 1.3446
1.0083 8.33 200 1.3442
1.1796 8.38 201 1.3436
1.2475 8.42 202 1.3431
0.9513 8.46 203 1.3428
1.1191 8.5 204 1.3422
1.0786 8.54 205 1.3420
1.138 8.58 206 1.3423
1.0057 8.62 207 1.3423
1.2386 8.67 208 1.3423
0.9629 8.71 209 1.3429
1.2914 8.75 210 1.3428
0.938 8.79 211 1.3428
1.1721 8.83 212 1.3429
1.2278 8.88 213 1.3429
0.9463 8.92 214 1.3431
0.9662 8.96 215 1.3433
1.3535 9.0 216 1.3433
0.8468 9.04 217 1.3435
1.1178 9.08 218 1.3438
1.0344 9.12 219 1.3445
1.2105 9.17 220 1.3450
1.0636 9.21 221 1.3449
0.8061 9.25 222 1.3453
1.1739 9.29 223 1.3456
1.1879 9.33 224 1.3459
0.9653 9.38 225 1.3460
0.9331 9.42 226 1.3464
0.998 9.46 227 1.3469
1.2129 9.5 228 1.3471
1.2902 9.54 229 1.3468
0.888 9.58 230 1.3469
0.9717 9.62 231 1.3472
1.2792 9.67 232 1.3475
1.0243 9.71 233 1.3477
1.3012 9.75 234 1.3475
1.0606 9.79 235 1.3470
1.0991 9.83 236 1.3467
1.2828 9.88 237 1.3457
1.2449 9.92 238 1.3449
0.9969 9.96 239 1.3446
1.1315 10.0 240 1.3443
0.9608 10.04 241 1.3445
1.163 10.08 242 1.3448
1.029 10.12 243 1.3450
1.1781 10.17 244 1.3450
1.0766 10.21 245 1.3458
0.804 10.25 246 1.3468
0.881 10.29 247 1.3482
1.1738 10.33 248 1.3492
1.1217 10.38 249 1.3497
0.9642 10.42 250 1.3504
1.0833 10.46 251 1.3509
1.0573 10.5 252 1.3514
1.2313 10.54 253 1.3515
1.007 10.58 254 1.3512
0.8919 10.62 255 1.3509
1.1255 10.67 256 1.3504
0.8156 10.71 257 1.3502
1.1596 10.75 258 1.3503
1.0573 10.79 259 1.3508
0.9606 10.83 260 1.3513
1.1967 10.88 261 1.3511
1.2035 10.92 262 1.3508
1.0998 10.96 263 1.3504
1.0149 11.0 264 1.3501
1.106 11.04 265 1.3498
0.9227 11.08 266 1.3497
1.105 11.12 267 1.3495
1.079 11.17 268 1.3492
1.1853 11.21 269 1.3493
0.9819 11.25 270 1.3496
0.9681 11.29 271 1.3500
1.1715 11.33 272 1.3502
1.1711 11.38 273 1.3504
1.0301 11.42 274 1.3504
1.0097 11.46 275 1.3502
0.9109 11.5 276 1.3501
1.1929 11.54 277 1.3498
1.1418 11.58 278 1.3494
1.2005 11.62 279 1.3488
1.1507 11.67 280 1.3484
1.007 11.71 281 1.3480
0.8808 11.75 282 1.3477
0.7668 11.79 283 1.3479
1.0597 11.83 284 1.3480
1.0563 11.88 285 1.3483
0.7806 11.92 286 1.3487
1.233 11.96 287 1.3490
0.9242 12.0 288 1.3493
1.1043 12.04 289 1.3491
0.9379 12.08 290 1.3489
0.8592 12.12 291 1.3487
1.0302 12.17 292 1.3485
1.1544 12.21 293 1.3483
1.0905 12.25 294 1.3480
0.9576 12.29 295 1.3480
0.8627 12.33 296 1.3480
0.8748 12.38 297 1.3482
1.2431 12.42 298 1.3485
0.9514 12.46 299 1.3487
0.9526 12.5 300 1.3487
0.9222 12.54 301 1.3489
0.9418 12.58 302 1.3491
1.0765 12.62 303 1.3492
1.007 12.67 304 1.3493
1.1301 12.71 305 1.3494
1.0612 12.75 306 1.3495
0.7988 12.79 307 1.3495
1.2483 12.83 308 1.3493
0.9587 12.88 309 1.3492
1.0277 12.92 310 1.3490
1.085 12.96 311 1.3490
0.9661 13.0 312 1.3489
0.9396 13.04 313 1.3490
0.8657 13.08 314 1.3492
1.0302 13.12 315 1.3495
0.877 13.17 316 1.3499
1.0629 13.21 317 1.3503
1.1157 13.25 318 1.3505
0.9327 13.29 319 1.3506
0.8881 13.33 320 1.3509
0.8696 13.38 321 1.3512
0.9604 13.42 322 1.3514
1.1611 13.46 323 1.3515
0.9612 13.5 324 1.3516
1.0779 13.54 325 1.3515
1.0823 13.58 326 1.3514
1.0548 13.62 327 1.3514
1.099 13.67 328 1.3513
1.0892 13.71 329 1.3511
1.1729 13.75 330 1.3510
0.9449 13.79 331 1.3509
1.1423 13.83 332 1.3507
1.0322 13.88 333 1.3507
0.8021 13.92 334 1.3508
1.2308 13.96 335 1.3508
0.9415 14.0 336 1.3508
0.9796 14.04 337 1.3508
0.9764 14.08 338 1.3509
0.8994 14.12 339 1.3509
1.0552 14.17 340 1.3509
1.1901 14.21 341 1.3509
0.9142 14.25 342 1.3509
0.8429 14.29 343 1.3510
1.1567 14.33 344 1.3510
0.7431 14.38 345 1.3511
1.0394 14.42 346 1.3511
0.9096 14.46 347 1.3512
1.2756 14.5 348 1.3512
0.9605 14.54 349 1.3512
1.1898 14.58 350 1.3513
1.1416 14.62 351 1.3513
0.9723 14.67 352 1.3513
0.8913 14.71 353 1.3514
0.9704 14.75 354 1.3514
1.1285 14.79 355 1.3515
1.0252 14.83 356 1.3515
1.0035 14.88 357 1.3515
0.8794 14.92 358 1.3516
1.0658 14.96 359 1.3516
0.8244 15.0 360 1.3516

Framework versions

  • Transformers 4.38.1
  • Pytorch 2.1.0+cu121
  • Datasets 2.17.0
  • Tokenizers 0.15.2