mt5_small_lg_inf_en_v1
This model is a fine-tuned version of MubarakB/mt5_small_lg_en on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.4187
- Bleu: 0.2171
- Gen Len: 9.0204
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 100
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
---|---|---|---|---|---|
No log | 1.0 | 138 | 0.4669 | 0.0658 | 9.3837 |
No log | 2.0 | 276 | 0.4559 | 0.132 | 8.0245 |
No log | 3.0 | 414 | 0.4507 | 0.2112 | 8.1592 |
0.4726 | 4.0 | 552 | 0.4472 | 0.2144 | 8.0367 |
0.4726 | 5.0 | 690 | 0.4445 | 0.2134 | 8.0082 |
0.4726 | 6.0 | 828 | 0.4425 | 0.3274 | 7.8612 |
0.4726 | 7.0 | 966 | 0.4405 | 0.3378 | 7.5959 |
0.447 | 8.0 | 1104 | 0.4390 | 0.3304 | 7.3918 |
0.447 | 9.0 | 1242 | 0.4378 | 0.3285 | 7.3673 |
0.447 | 10.0 | 1380 | 0.4362 | 0.3147 | 7.6694 |
0.4398 | 11.0 | 1518 | 0.4350 | 0.3181 | 7.4163 |
0.4398 | 12.0 | 1656 | 0.4341 | 0.3166 | 7.5224 |
0.4398 | 13.0 | 1794 | 0.4330 | 0.3178 | 7.5592 |
0.4398 | 14.0 | 1932 | 0.4318 | 0.2157 | 7.8204 |
0.4313 | 15.0 | 2070 | 0.4312 | 0.3169 | 8.1388 |
0.4313 | 16.0 | 2208 | 0.4307 | 0.3169 | 7.9633 |
0.4313 | 17.0 | 2346 | 0.4297 | 0.3064 | 8.2245 |
0.4313 | 18.0 | 2484 | 0.4293 | 0.2045 | 8.2776 |
0.4262 | 19.0 | 2622 | 0.4286 | 0.3027 | 8.4367 |
0.4262 | 20.0 | 2760 | 0.4280 | 0.2042 | 8.5061 |
0.4262 | 21.0 | 2898 | 0.4274 | 0.3033 | 8.5633 |
0.4214 | 22.0 | 3036 | 0.4272 | 0.3019 | 8.7714 |
0.4214 | 23.0 | 3174 | 0.4264 | 0.3051 | 8.649 |
0.4214 | 24.0 | 3312 | 0.4263 | 0.3021 | 8.8367 |
0.4214 | 25.0 | 3450 | 0.4254 | 0.2981 | 8.8204 |
0.4161 | 26.0 | 3588 | 0.4251 | 0.2992 | 8.8776 |
0.4161 | 27.0 | 3726 | 0.4248 | 0.3044 | 8.8571 |
0.4161 | 28.0 | 3864 | 0.4246 | 0.3 | 8.8776 |
0.4124 | 29.0 | 4002 | 0.4246 | 0.2998 | 8.8163 |
0.4124 | 30.0 | 4140 | 0.4239 | 0.2983 | 9.0857 |
0.4124 | 31.0 | 4278 | 0.4234 | 0.2988 | 9.0163 |
0.4124 | 32.0 | 4416 | 0.4233 | 0.2996 | 8.8816 |
0.4087 | 33.0 | 4554 | 0.4232 | 0.298 | 8.9714 |
0.4087 | 34.0 | 4692 | 0.4226 | 0.3003 | 8.9796 |
0.4087 | 35.0 | 4830 | 0.4224 | 0.2992 | 9.1796 |
0.4087 | 36.0 | 4968 | 0.4225 | 0.3005 | 9.0571 |
0.4053 | 37.0 | 5106 | 0.4224 | 0.2994 | 8.8571 |
0.4053 | 38.0 | 5244 | 0.4220 | 0.3 | 9.1143 |
0.4053 | 39.0 | 5382 | 0.4216 | 0.3019 | 9.102 |
0.4006 | 40.0 | 5520 | 0.4215 | 0.3016 | 8.9714 |
0.4006 | 41.0 | 5658 | 0.4212 | 0.3011 | 8.9224 |
0.4006 | 42.0 | 5796 | 0.4211 | 0.2982 | 9.2816 |
0.4006 | 43.0 | 5934 | 0.4210 | 0.2985 | 9.1633 |
0.3986 | 44.0 | 6072 | 0.4210 | 0.2994 | 9.0776 |
0.3986 | 45.0 | 6210 | 0.4209 | 0.308 | 9.3265 |
0.3986 | 46.0 | 6348 | 0.4208 | 0.2963 | 9.1714 |
0.3986 | 47.0 | 6486 | 0.4205 | 0.3093 | 9.0531 |
0.3953 | 48.0 | 6624 | 0.4205 | 0.3068 | 9.4449 |
0.3953 | 49.0 | 6762 | 0.4202 | 0.3075 | 8.9918 |
0.3953 | 50.0 | 6900 | 0.4203 | 0.3071 | 9.1306 |
0.3929 | 51.0 | 7038 | 0.4200 | 0.3052 | 9.3143 |
0.3929 | 52.0 | 7176 | 0.4200 | 0.306 | 9.1796 |
0.3929 | 53.0 | 7314 | 0.4200 | 0.3058 | 9.2204 |
0.3929 | 54.0 | 7452 | 0.4200 | 0.3076 | 8.8367 |
0.391 | 55.0 | 7590 | 0.4196 | 0.3078 | 8.8776 |
0.391 | 56.0 | 7728 | 0.4197 | 0.3041 | 9.0449 |
0.391 | 57.0 | 7866 | 0.4198 | 0.3041 | 8.8776 |
0.3887 | 58.0 | 8004 | 0.4201 | 0.3171 | 8.9224 |
0.3887 | 59.0 | 8142 | 0.4192 | 0.3074 | 9.0449 |
0.3887 | 60.0 | 8280 | 0.4197 | 0.318 | 8.8571 |
0.3887 | 61.0 | 8418 | 0.4194 | 0.3167 | 9.1469 |
0.3871 | 62.0 | 8556 | 0.4194 | 0.3186 | 8.8612 |
0.3871 | 63.0 | 8694 | 0.4192 | 0.3181 | 8.8245 |
0.3871 | 64.0 | 8832 | 0.4192 | 0.3178 | 9.0449 |
0.3871 | 65.0 | 8970 | 0.4194 | 0.3168 | 8.9673 |
0.3849 | 66.0 | 9108 | 0.4191 | 0.3159 | 8.9184 |
0.3849 | 67.0 | 9246 | 0.4192 | 0.3191 | 8.7347 |
0.3849 | 68.0 | 9384 | 0.4189 | 0.3173 | 8.8367 |
0.3841 | 69.0 | 9522 | 0.4189 | 0.3198 | 8.7633 |
0.3841 | 70.0 | 9660 | 0.4189 | 0.3168 | 8.9306 |
0.3841 | 71.0 | 9798 | 0.4187 | 0.3182 | 8.9837 |
0.3841 | 72.0 | 9936 | 0.4191 | 0.3179 | 8.9918 |
0.3823 | 73.0 | 10074 | 0.4189 | 0.3173 | 8.951 |
0.3823 | 74.0 | 10212 | 0.4188 | 0.3158 | 8.9551 |
0.3823 | 75.0 | 10350 | 0.4188 | 0.3184 | 8.9061 |
0.3823 | 76.0 | 10488 | 0.4187 | 0.3174 | 8.9347 |
0.3809 | 77.0 | 10626 | 0.4186 | 0.2163 | 9.1061 |
0.3809 | 78.0 | 10764 | 0.4189 | 0.2173 | 8.8531 |
0.3809 | 79.0 | 10902 | 0.4187 | 0.3156 | 9.0776 |
0.3798 | 80.0 | 11040 | 0.4187 | 0.3166 | 8.9796 |
0.3798 | 81.0 | 11178 | 0.4187 | 0.3172 | 8.9796 |
0.3798 | 82.0 | 11316 | 0.4187 | 0.3177 | 9.0 |
0.3798 | 83.0 | 11454 | 0.4187 | 0.3167 | 9.0204 |
0.3799 | 84.0 | 11592 | 0.4187 | 0.3166 | 8.9837 |
0.3799 | 85.0 | 11730 | 0.4187 | 0.3174 | 9.0776 |
0.3799 | 86.0 | 11868 | 0.4187 | 0.2174 | 9.1469 |
0.3789 | 87.0 | 12006 | 0.4188 | 0.2167 | 8.9143 |
0.3789 | 88.0 | 12144 | 0.4187 | 0.2171 | 9.0327 |
0.3789 | 89.0 | 12282 | 0.4187 | 0.217 | 9.0531 |
0.3789 | 90.0 | 12420 | 0.4186 | 0.3176 | 9.1102 |
0.378 | 91.0 | 12558 | 0.4186 | 0.3182 | 9.0531 |
0.378 | 92.0 | 12696 | 0.4186 | 0.3186 | 9.1102 |
0.378 | 93.0 | 12834 | 0.4187 | 0.2177 | 9.0163 |
0.378 | 94.0 | 12972 | 0.4187 | 0.2172 | 9.0204 |
0.3768 | 95.0 | 13110 | 0.4186 | 0.2171 | 9.0204 |
0.3768 | 96.0 | 13248 | 0.4186 | 0.2171 | 9.0367 |
0.3768 | 97.0 | 13386 | 0.4187 | 0.2173 | 8.9959 |
0.3769 | 98.0 | 13524 | 0.4187 | 0.2172 | 8.9959 |
0.3769 | 99.0 | 13662 | 0.4187 | 0.2172 | 9.0 |
0.3769 | 100.0 | 13800 | 0.4187 | 0.2171 | 9.0204 |
Framework versions
- Transformers 4.45.1
- Pytorch 2.4.0
- Datasets 3.0.1
- Tokenizers 0.20.0
- Downloads last month
- 5
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.