metadata
license: apache-2.0
base_model: google/flan-t5-small
tags:
- generated_from_trainer
model-index:
- name: flant5-tuned-15-warmup
results: []
flant5-tuned-15-warmup
This model is a fine-tuned version of google/flan-t5-small on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.3516
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 20
- num_epochs: 15
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.1664 | 0.04 | 1 | 1.8055 |
2.3197 | 0.08 | 2 | 1.7996 |
1.8347 | 0.12 | 3 | 1.7885 |
1.945 | 0.17 | 4 | 1.7732 |
2.2942 | 0.21 | 5 | 1.7553 |
1.9688 | 0.25 | 6 | 1.7373 |
1.7152 | 0.29 | 7 | 1.7200 |
1.7729 | 0.33 | 8 | 1.7044 |
2.0102 | 0.38 | 9 | 1.6898 |
2.0015 | 0.42 | 10 | 1.6755 |
1.6719 | 0.46 | 11 | 1.6612 |
1.68 | 0.5 | 12 | 1.6452 |
1.9285 | 0.54 | 13 | 1.6288 |
1.8603 | 0.58 | 14 | 1.6130 |
2.2619 | 0.62 | 15 | 1.5958 |
1.6485 | 0.67 | 16 | 1.5784 |
1.845 | 0.71 | 17 | 1.5625 |
1.7137 | 0.75 | 18 | 1.5464 |
1.6553 | 0.79 | 19 | 1.5312 |
1.981 | 0.83 | 20 | 1.5161 |
1.3079 | 0.88 | 21 | 1.5038 |
1.5341 | 0.92 | 22 | 1.4930 |
1.6369 | 0.96 | 23 | 1.4844 |
2.5189 | 1.0 | 24 | 1.4767 |
1.3177 | 1.04 | 25 | 1.4702 |
1.4357 | 1.08 | 26 | 1.4646 |
1.7583 | 1.12 | 27 | 1.4581 |
1.6517 | 1.17 | 28 | 1.4533 |
1.7093 | 1.21 | 29 | 1.4485 |
1.3521 | 1.25 | 30 | 1.4442 |
1.4714 | 1.29 | 31 | 1.4399 |
1.7541 | 1.33 | 32 | 1.4364 |
1.5828 | 1.38 | 33 | 1.4329 |
1.64 | 1.42 | 34 | 1.4292 |
1.5735 | 1.46 | 35 | 1.4257 |
1.5155 | 1.5 | 36 | 1.4230 |
1.6881 | 1.54 | 37 | 1.4199 |
1.4118 | 1.58 | 38 | 1.4183 |
1.5981 | 1.62 | 39 | 1.4166 |
1.6888 | 1.67 | 40 | 1.4149 |
1.4802 | 1.71 | 41 | 1.4139 |
1.501 | 1.75 | 42 | 1.4119 |
1.4882 | 1.79 | 43 | 1.4111 |
1.7281 | 1.83 | 44 | 1.4096 |
1.5792 | 1.88 | 45 | 1.4080 |
1.5964 | 1.92 | 46 | 1.4055 |
1.1762 | 1.96 | 47 | 1.4035 |
1.2127 | 2.0 | 48 | 1.4023 |
1.5406 | 2.04 | 49 | 1.4004 |
1.6261 | 2.08 | 50 | 1.3995 |
1.6105 | 2.12 | 51 | 1.3981 |
1.3115 | 2.17 | 52 | 1.3966 |
1.4817 | 2.21 | 53 | 1.3953 |
1.3679 | 2.25 | 54 | 1.3942 |
1.3511 | 2.29 | 55 | 1.3923 |
1.4458 | 2.33 | 56 | 1.3910 |
1.1303 | 2.38 | 57 | 1.3908 |
1.3293 | 2.42 | 58 | 1.3906 |
1.248 | 2.46 | 59 | 1.3915 |
1.4307 | 2.5 | 60 | 1.3905 |
1.6081 | 2.54 | 61 | 1.3880 |
1.5352 | 2.58 | 62 | 1.3854 |
1.1685 | 2.62 | 63 | 1.3832 |
1.5049 | 2.67 | 64 | 1.3812 |
1.2554 | 2.71 | 65 | 1.3794 |
1.4978 | 2.75 | 66 | 1.3778 |
1.3643 | 2.79 | 67 | 1.3761 |
1.4122 | 2.83 | 68 | 1.3742 |
1.6356 | 2.88 | 69 | 1.3716 |
1.5886 | 2.92 | 70 | 1.3688 |
1.3843 | 2.96 | 71 | 1.3665 |
1.8691 | 3.0 | 72 | 1.3638 |
1.33 | 3.04 | 73 | 1.3620 |
1.2651 | 3.08 | 74 | 1.3605 |
1.5486 | 3.12 | 75 | 1.3590 |
1.347 | 3.17 | 76 | 1.3577 |
1.5603 | 3.21 | 77 | 1.3562 |
1.6223 | 3.25 | 78 | 1.3542 |
1.4045 | 3.29 | 79 | 1.3515 |
1.4092 | 3.33 | 80 | 1.3499 |
1.1476 | 3.38 | 81 | 1.3496 |
1.3087 | 3.42 | 82 | 1.3499 |
1.4861 | 3.46 | 83 | 1.3496 |
1.4168 | 3.5 | 84 | 1.3487 |
1.0794 | 3.54 | 85 | 1.3485 |
1.2572 | 3.58 | 86 | 1.3490 |
1.5438 | 3.62 | 87 | 1.3490 |
1.3175 | 3.67 | 88 | 1.3487 |
1.1355 | 3.71 | 89 | 1.3486 |
1.4005 | 3.75 | 90 | 1.3486 |
1.363 | 3.79 | 91 | 1.3487 |
1.2214 | 3.83 | 92 | 1.3497 |
1.0884 | 3.88 | 93 | 1.3507 |
1.3816 | 3.92 | 94 | 1.3517 |
1.3544 | 3.96 | 95 | 1.3529 |
1.0685 | 4.0 | 96 | 1.3544 |
1.4339 | 4.04 | 97 | 1.3563 |
1.2282 | 4.08 | 98 | 1.3575 |
1.1425 | 4.12 | 99 | 1.3585 |
1.4971 | 4.17 | 100 | 1.3585 |
1.1353 | 4.21 | 101 | 1.3589 |
1.3451 | 4.25 | 102 | 1.3596 |
1.2957 | 4.29 | 103 | 1.3600 |
1.2654 | 4.33 | 104 | 1.3598 |
1.2395 | 4.38 | 105 | 1.3592 |
1.1531 | 4.42 | 106 | 1.3587 |
1.1824 | 4.46 | 107 | 1.3580 |
1.3588 | 4.5 | 108 | 1.3569 |
1.3703 | 4.54 | 109 | 1.3548 |
1.5438 | 4.58 | 110 | 1.3519 |
1.1184 | 4.62 | 111 | 1.3497 |
1.3506 | 4.67 | 112 | 1.3466 |
1.5091 | 4.71 | 113 | 1.3437 |
1.3477 | 4.75 | 114 | 1.3413 |
1.3114 | 4.79 | 115 | 1.3395 |
1.1235 | 4.83 | 116 | 1.3385 |
1.189 | 4.88 | 117 | 1.3376 |
1.1811 | 4.92 | 118 | 1.3371 |
1.2506 | 4.96 | 119 | 1.3374 |
1.2154 | 5.0 | 120 | 1.3384 |
1.3547 | 5.04 | 121 | 1.3395 |
1.4633 | 5.08 | 122 | 1.3398 |
1.4047 | 5.12 | 123 | 1.3404 |
1.0155 | 5.17 | 124 | 1.3412 |
1.2121 | 5.21 | 125 | 1.3427 |
1.1646 | 5.25 | 126 | 1.3437 |
1.2765 | 5.29 | 127 | 1.3450 |
1.1937 | 5.33 | 128 | 1.3455 |
1.3145 | 5.38 | 129 | 1.3464 |
1.0305 | 5.42 | 130 | 1.3476 |
1.4225 | 5.46 | 131 | 1.3486 |
1.1455 | 5.5 | 132 | 1.3486 |
1.3314 | 5.54 | 133 | 1.3480 |
1.4563 | 5.58 | 134 | 1.3470 |
1.2709 | 5.62 | 135 | 1.3462 |
1.0006 | 5.67 | 136 | 1.3458 |
1.2831 | 5.71 | 137 | 1.3456 |
1.2246 | 5.75 | 138 | 1.3449 |
1.0799 | 5.79 | 139 | 1.3452 |
1.2161 | 5.83 | 140 | 1.3445 |
1.1016 | 5.88 | 141 | 1.3439 |
1.2136 | 5.92 | 142 | 1.3431 |
1.0087 | 5.96 | 143 | 1.3431 |
0.8238 | 6.0 | 144 | 1.3434 |
1.0138 | 6.04 | 145 | 1.3441 |
1.2912 | 6.08 | 146 | 1.3443 |
1.234 | 6.12 | 147 | 1.3444 |
1.1389 | 6.17 | 148 | 1.3437 |
1.3006 | 6.21 | 149 | 1.3426 |
0.978 | 6.25 | 150 | 1.3418 |
1.0744 | 6.29 | 151 | 1.3413 |
1.213 | 6.33 | 152 | 1.3418 |
1.0914 | 6.38 | 153 | 1.3429 |
1.2845 | 6.42 | 154 | 1.3437 |
1.1967 | 6.46 | 155 | 1.3445 |
0.9909 | 6.5 | 156 | 1.3452 |
1.1752 | 6.54 | 157 | 1.3458 |
1.3885 | 6.58 | 158 | 1.3461 |
1.1556 | 6.62 | 159 | 1.3464 |
0.994 | 6.67 | 160 | 1.3464 |
1.2133 | 6.71 | 161 | 1.3455 |
1.1922 | 6.75 | 162 | 1.3441 |
1.2964 | 6.79 | 163 | 1.3427 |
1.0437 | 6.83 | 164 | 1.3417 |
1.1666 | 6.88 | 165 | 1.3409 |
1.3587 | 6.92 | 166 | 1.3397 |
1.3096 | 6.96 | 167 | 1.3385 |
1.1133 | 7.0 | 168 | 1.3378 |
1.0738 | 7.04 | 169 | 1.3374 |
1.2147 | 7.08 | 170 | 1.3368 |
1.135 | 7.12 | 171 | 1.3363 |
1.2445 | 7.17 | 172 | 1.3357 |
1.1927 | 7.21 | 173 | 1.3348 |
1.1672 | 7.25 | 174 | 1.3336 |
1.0623 | 7.29 | 175 | 1.3332 |
1.1242 | 7.33 | 176 | 1.3329 |
1.2888 | 7.38 | 177 | 1.3328 |
1.196 | 7.42 | 178 | 1.3328 |
1.2507 | 7.46 | 179 | 1.3330 |
1.0763 | 7.5 | 180 | 1.3338 |
0.9774 | 7.54 | 181 | 1.3351 |
1.1876 | 7.58 | 182 | 1.3369 |
1.2101 | 7.62 | 183 | 1.3382 |
1.1968 | 7.67 | 184 | 1.3397 |
0.8876 | 7.71 | 185 | 1.3416 |
1.0407 | 7.75 | 186 | 1.3430 |
1.1468 | 7.79 | 187 | 1.3445 |
0.981 | 7.83 | 188 | 1.3458 |
1.1389 | 7.88 | 189 | 1.3465 |
1.2701 | 7.92 | 190 | 1.3470 |
1.2079 | 7.96 | 191 | 1.3471 |
1.4571 | 8.0 | 192 | 1.3471 |
1.3467 | 8.04 | 193 | 1.3473 |
1.1879 | 8.08 | 194 | 1.3466 |
1.0661 | 8.12 | 195 | 1.3459 |
1.0822 | 8.17 | 196 | 1.3451 |
0.7801 | 8.21 | 197 | 1.3453 |
1.1876 | 8.25 | 198 | 1.3451 |
1.1006 | 8.29 | 199 | 1.3446 |
1.0083 | 8.33 | 200 | 1.3442 |
1.1796 | 8.38 | 201 | 1.3436 |
1.2475 | 8.42 | 202 | 1.3431 |
0.9513 | 8.46 | 203 | 1.3428 |
1.1191 | 8.5 | 204 | 1.3422 |
1.0786 | 8.54 | 205 | 1.3420 |
1.138 | 8.58 | 206 | 1.3423 |
1.0057 | 8.62 | 207 | 1.3423 |
1.2386 | 8.67 | 208 | 1.3423 |
0.9629 | 8.71 | 209 | 1.3429 |
1.2914 | 8.75 | 210 | 1.3428 |
0.938 | 8.79 | 211 | 1.3428 |
1.1721 | 8.83 | 212 | 1.3429 |
1.2278 | 8.88 | 213 | 1.3429 |
0.9463 | 8.92 | 214 | 1.3431 |
0.9662 | 8.96 | 215 | 1.3433 |
1.3535 | 9.0 | 216 | 1.3433 |
0.8468 | 9.04 | 217 | 1.3435 |
1.1178 | 9.08 | 218 | 1.3438 |
1.0344 | 9.12 | 219 | 1.3445 |
1.2105 | 9.17 | 220 | 1.3450 |
1.0636 | 9.21 | 221 | 1.3449 |
0.8061 | 9.25 | 222 | 1.3453 |
1.1739 | 9.29 | 223 | 1.3456 |
1.1879 | 9.33 | 224 | 1.3459 |
0.9653 | 9.38 | 225 | 1.3460 |
0.9331 | 9.42 | 226 | 1.3464 |
0.998 | 9.46 | 227 | 1.3469 |
1.2129 | 9.5 | 228 | 1.3471 |
1.2902 | 9.54 | 229 | 1.3468 |
0.888 | 9.58 | 230 | 1.3469 |
0.9717 | 9.62 | 231 | 1.3472 |
1.2792 | 9.67 | 232 | 1.3475 |
1.0243 | 9.71 | 233 | 1.3477 |
1.3012 | 9.75 | 234 | 1.3475 |
1.0606 | 9.79 | 235 | 1.3470 |
1.0991 | 9.83 | 236 | 1.3467 |
1.2828 | 9.88 | 237 | 1.3457 |
1.2449 | 9.92 | 238 | 1.3449 |
0.9969 | 9.96 | 239 | 1.3446 |
1.1315 | 10.0 | 240 | 1.3443 |
0.9608 | 10.04 | 241 | 1.3445 |
1.163 | 10.08 | 242 | 1.3448 |
1.029 | 10.12 | 243 | 1.3450 |
1.1781 | 10.17 | 244 | 1.3450 |
1.0766 | 10.21 | 245 | 1.3458 |
0.804 | 10.25 | 246 | 1.3468 |
0.881 | 10.29 | 247 | 1.3482 |
1.1738 | 10.33 | 248 | 1.3492 |
1.1217 | 10.38 | 249 | 1.3497 |
0.9642 | 10.42 | 250 | 1.3504 |
1.0833 | 10.46 | 251 | 1.3509 |
1.0573 | 10.5 | 252 | 1.3514 |
1.2313 | 10.54 | 253 | 1.3515 |
1.007 | 10.58 | 254 | 1.3512 |
0.8919 | 10.62 | 255 | 1.3509 |
1.1255 | 10.67 | 256 | 1.3504 |
0.8156 | 10.71 | 257 | 1.3502 |
1.1596 | 10.75 | 258 | 1.3503 |
1.0573 | 10.79 | 259 | 1.3508 |
0.9606 | 10.83 | 260 | 1.3513 |
1.1967 | 10.88 | 261 | 1.3511 |
1.2035 | 10.92 | 262 | 1.3508 |
1.0998 | 10.96 | 263 | 1.3504 |
1.0149 | 11.0 | 264 | 1.3501 |
1.106 | 11.04 | 265 | 1.3498 |
0.9227 | 11.08 | 266 | 1.3497 |
1.105 | 11.12 | 267 | 1.3495 |
1.079 | 11.17 | 268 | 1.3492 |
1.1853 | 11.21 | 269 | 1.3493 |
0.9819 | 11.25 | 270 | 1.3496 |
0.9681 | 11.29 | 271 | 1.3500 |
1.1715 | 11.33 | 272 | 1.3502 |
1.1711 | 11.38 | 273 | 1.3504 |
1.0301 | 11.42 | 274 | 1.3504 |
1.0097 | 11.46 | 275 | 1.3502 |
0.9109 | 11.5 | 276 | 1.3501 |
1.1929 | 11.54 | 277 | 1.3498 |
1.1418 | 11.58 | 278 | 1.3494 |
1.2005 | 11.62 | 279 | 1.3488 |
1.1507 | 11.67 | 280 | 1.3484 |
1.007 | 11.71 | 281 | 1.3480 |
0.8808 | 11.75 | 282 | 1.3477 |
0.7668 | 11.79 | 283 | 1.3479 |
1.0597 | 11.83 | 284 | 1.3480 |
1.0563 | 11.88 | 285 | 1.3483 |
0.7806 | 11.92 | 286 | 1.3487 |
1.233 | 11.96 | 287 | 1.3490 |
0.9242 | 12.0 | 288 | 1.3493 |
1.1043 | 12.04 | 289 | 1.3491 |
0.9379 | 12.08 | 290 | 1.3489 |
0.8592 | 12.12 | 291 | 1.3487 |
1.0302 | 12.17 | 292 | 1.3485 |
1.1544 | 12.21 | 293 | 1.3483 |
1.0905 | 12.25 | 294 | 1.3480 |
0.9576 | 12.29 | 295 | 1.3480 |
0.8627 | 12.33 | 296 | 1.3480 |
0.8748 | 12.38 | 297 | 1.3482 |
1.2431 | 12.42 | 298 | 1.3485 |
0.9514 | 12.46 | 299 | 1.3487 |
0.9526 | 12.5 | 300 | 1.3487 |
0.9222 | 12.54 | 301 | 1.3489 |
0.9418 | 12.58 | 302 | 1.3491 |
1.0765 | 12.62 | 303 | 1.3492 |
1.007 | 12.67 | 304 | 1.3493 |
1.1301 | 12.71 | 305 | 1.3494 |
1.0612 | 12.75 | 306 | 1.3495 |
0.7988 | 12.79 | 307 | 1.3495 |
1.2483 | 12.83 | 308 | 1.3493 |
0.9587 | 12.88 | 309 | 1.3492 |
1.0277 | 12.92 | 310 | 1.3490 |
1.085 | 12.96 | 311 | 1.3490 |
0.9661 | 13.0 | 312 | 1.3489 |
0.9396 | 13.04 | 313 | 1.3490 |
0.8657 | 13.08 | 314 | 1.3492 |
1.0302 | 13.12 | 315 | 1.3495 |
0.877 | 13.17 | 316 | 1.3499 |
1.0629 | 13.21 | 317 | 1.3503 |
1.1157 | 13.25 | 318 | 1.3505 |
0.9327 | 13.29 | 319 | 1.3506 |
0.8881 | 13.33 | 320 | 1.3509 |
0.8696 | 13.38 | 321 | 1.3512 |
0.9604 | 13.42 | 322 | 1.3514 |
1.1611 | 13.46 | 323 | 1.3515 |
0.9612 | 13.5 | 324 | 1.3516 |
1.0779 | 13.54 | 325 | 1.3515 |
1.0823 | 13.58 | 326 | 1.3514 |
1.0548 | 13.62 | 327 | 1.3514 |
1.099 | 13.67 | 328 | 1.3513 |
1.0892 | 13.71 | 329 | 1.3511 |
1.1729 | 13.75 | 330 | 1.3510 |
0.9449 | 13.79 | 331 | 1.3509 |
1.1423 | 13.83 | 332 | 1.3507 |
1.0322 | 13.88 | 333 | 1.3507 |
0.8021 | 13.92 | 334 | 1.3508 |
1.2308 | 13.96 | 335 | 1.3508 |
0.9415 | 14.0 | 336 | 1.3508 |
0.9796 | 14.04 | 337 | 1.3508 |
0.9764 | 14.08 | 338 | 1.3509 |
0.8994 | 14.12 | 339 | 1.3509 |
1.0552 | 14.17 | 340 | 1.3509 |
1.1901 | 14.21 | 341 | 1.3509 |
0.9142 | 14.25 | 342 | 1.3509 |
0.8429 | 14.29 | 343 | 1.3510 |
1.1567 | 14.33 | 344 | 1.3510 |
0.7431 | 14.38 | 345 | 1.3511 |
1.0394 | 14.42 | 346 | 1.3511 |
0.9096 | 14.46 | 347 | 1.3512 |
1.2756 | 14.5 | 348 | 1.3512 |
0.9605 | 14.54 | 349 | 1.3512 |
1.1898 | 14.58 | 350 | 1.3513 |
1.1416 | 14.62 | 351 | 1.3513 |
0.9723 | 14.67 | 352 | 1.3513 |
0.8913 | 14.71 | 353 | 1.3514 |
0.9704 | 14.75 | 354 | 1.3514 |
1.1285 | 14.79 | 355 | 1.3515 |
1.0252 | 14.83 | 356 | 1.3515 |
1.0035 | 14.88 | 357 | 1.3515 |
0.8794 | 14.92 | 358 | 1.3516 |
1.0658 | 14.96 | 359 | 1.3516 |
0.8244 | 15.0 | 360 | 1.3516 |
Framework versions
- Transformers 4.38.1
- Pytorch 2.1.0+cu121
- Datasets 2.17.0
- Tokenizers 0.15.2