metadata
license: mit
base_model: facebook/m2m100_1.2B
tags:
- generated_from_trainer
metrics:
- bleu
model-index:
- name: cs_m2m_0.001_50_v0.2
results: []
cs_m2m_0.001_50_v0.2
This model is a fine-tuned version of facebook/m2m100_1.2B on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 8.4343
- Bleu: 0.0488
- Gen Len: 93.2857
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 100
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
---|---|---|---|---|---|
5.0853 | 1.0 | 6 | 6.9325 | 0.0 | 5.0 |
4.3538 | 2.0 | 12 | 7.0396 | 0.1923 | 7.5714 |
4.6426 | 3.0 | 18 | 7.0321 | 0.1563 | 42.1429 |
5.1737 | 4.0 | 24 | 7.0390 | 0.0335 | 103.5238 |
3.9214 | 5.0 | 30 | 7.0585 | 0.0 | 5.0 |
4.7309 | 6.0 | 36 | 7.1597 | 0.1313 | 7.7619 |
4.3458 | 7.0 | 42 | 7.1875 | 0.0 | 5.0 |
4.1409 | 8.0 | 48 | 7.1934 | 0.308 | 18.1429 |
3.8187 | 9.0 | 54 | 7.1696 | 0.0 | 5.0 |
3.9459 | 10.0 | 60 | 7.1153 | 0.0 | 5.0 |
4.3563 | 11.0 | 66 | 7.2286 | 0.3581 | 8.619 |
4.4193 | 12.0 | 72 | 7.3526 | 0.0 | 5.0 |
4.4508 | 13.0 | 78 | 7.4000 | 0.0 | 5.0 |
4.115 | 14.0 | 84 | 7.4140 | 0.0 | 5.0 |
4.1807 | 15.0 | 90 | 7.4866 | 0.0 | 5.0 |
3.8422 | 16.0 | 96 | 7.6149 | 0.3839 | 9.0 |
4.1567 | 17.0 | 102 | 7.5413 | 0.2035 | 8.8095 |
4.3236 | 18.0 | 108 | 7.5256 | 0.2104 | 9.0 |
4.3343 | 19.0 | 114 | 7.5449 | 0.149 | 8.4286 |
4.3139 | 20.0 | 120 | 7.4758 | 0.0 | 5.0 |
3.1706 | 21.0 | 126 | 7.5896 | 0.0274 | 130.9048 |
3.0241 | 22.0 | 132 | 7.8300 | 0.2142 | 7.9524 |
4.5364 | 23.0 | 138 | 7.8698 | 0.0515 | 5.2857 |
5.4824 | 24.0 | 144 | 7.8732 | 0.0364 | 192.0952 |
3.8072 | 25.0 | 150 | 7.7993 | 0.0 | 5.0 |
3.9879 | 26.0 | 156 | 7.7222 | 0.0746 | 200.0 |
4.0397 | 27.0 | 162 | 7.6906 | 0.0436 | 146.0476 |
3.7429 | 28.0 | 168 | 7.7814 | 0.0 | 6.8095 |
3.7498 | 29.0 | 174 | 7.8873 | 0.2861 | 8.0 |
4.1991 | 30.0 | 180 | 8.0400 | 0.3032 | 13.5714 |
5.4424 | 31.0 | 186 | 7.9368 | 0.2537 | 15.1905 |
3.6523 | 32.0 | 192 | 7.8529 | 0.3288 | 7.1905 |
5.5908 | 33.0 | 198 | 7.8531 | 0.087 | 5.8571 |
3.8218 | 34.0 | 204 | 7.7538 | 0.2073 | 7.8571 |
3.8408 | 35.0 | 210 | 7.6796 | 0.1027 | 7.381 |
3.2347 | 36.0 | 216 | 7.8281 | 0.1662 | 8.9524 |
4.0158 | 37.0 | 222 | 7.8108 | 0.1907 | 23.9524 |
4.2395 | 38.0 | 228 | 7.7778 | 0.4592 | 19.4286 |
3.1863 | 39.0 | 234 | 7.8962 | 0.3148 | 16.1429 |
3.5706 | 40.0 | 240 | 8.2310 | 0.2962 | 33.7619 |
3.8174 | 41.0 | 246 | 8.0290 | 0.2864 | 14.1429 |
3.6144 | 42.0 | 252 | 7.9235 | 0.2737 | 11.8095 |
3.914 | 43.0 | 258 | 7.9920 | 0.286 | 15.5714 |
3.9245 | 44.0 | 264 | 7.9770 | 0.1251 | 35.8571 |
3.223 | 45.0 | 270 | 8.1701 | 0.1428 | 32.1429 |
3.5751 | 46.0 | 276 | 8.2573 | 0.2497 | 19.9048 |
3.7939 | 47.0 | 282 | 8.2825 | 0.0571 | 110.9524 |
3.8968 | 48.0 | 288 | 8.4263 | 0.0702 | 200.0 |
2.2186 | 49.0 | 294 | 8.3673 | 0.2356 | 107.5714 |
3.1794 | 50.0 | 300 | 8.2041 | 0.2142 | 38.5238 |
3.3098 | 51.0 | 306 | 8.2863 | 0.0349 | 113.3333 |
3.7869 | 52.0 | 312 | 8.3350 | 0.0655 | 95.2857 |
3.7239 | 53.0 | 318 | 8.2509 | 0.025 | 179.7143 |
3.5206 | 54.0 | 324 | 8.2301 | 0.074 | 75.9524 |
3.2225 | 55.0 | 330 | 8.1540 | 0.0242 | 173.5238 |
2.6646 | 56.0 | 336 | 8.1574 | 0.3081 | 91.2381 |
3.3487 | 57.0 | 342 | 8.1095 | 0.0597 | 115.6667 |
3.2801 | 58.0 | 348 | 8.1534 | 0.1796 | 39.8095 |
2.7653 | 59.0 | 354 | 8.2800 | 0.0423 | 82.0476 |
3.3158 | 60.0 | 360 | 8.2560 | 0.0437 | 116.4762 |
2.5549 | 61.0 | 366 | 8.2070 | 0.0348 | 164.2857 |
2.9411 | 62.0 | 372 | 8.2850 | 0.3249 | 12.381 |
2.965 | 63.0 | 378 | 8.3497 | 0.0352 | 117.1429 |
3.4553 | 64.0 | 384 | 8.3532 | 0.0739 | 145.9524 |
3.1656 | 65.0 | 390 | 8.3229 | 0.1993 | 102.5714 |
3.3285 | 66.0 | 396 | 8.3454 | 0.2297 | 46.9524 |
2.7365 | 67.0 | 402 | 8.4989 | 0.2246 | 39.381 |
3.1372 | 68.0 | 408 | 8.4935 | 0.0444 | 115.2381 |
2.3018 | 69.0 | 414 | 8.4543 | 0.0552 | 113.8571 |
2.5972 | 70.0 | 420 | 8.4092 | 0.245 | 15.3333 |
5.2476 | 71.0 | 426 | 8.3573 | 0.2629 | 32.0476 |
2.4894 | 72.0 | 432 | 8.3228 | 0.2863 | 42.5238 |
3.9303 | 73.0 | 438 | 8.3295 | 0.5382 | 36.7619 |
3.8135 | 74.0 | 444 | 8.3803 | 0.2421 | 41.8095 |
2.36 | 75.0 | 450 | 8.4558 | 0.1325 | 58.381 |
2.7095 | 76.0 | 456 | 8.5280 | 0.2592 | 68.9524 |
2.0011 | 77.0 | 462 | 8.4020 | 0.2997 | 58.2381 |
1.9209 | 78.0 | 468 | 8.4449 | 0.1838 | 43.7143 |
3.3766 | 79.0 | 474 | 8.5564 | 0.2789 | 24.9048 |
3.4283 | 80.0 | 480 | 8.5476 | 0.264 | 35.7143 |
2.8935 | 81.0 | 486 | 8.5057 | 0.0633 | 79.8095 |
2.5961 | 82.0 | 492 | 8.4756 | 0.0648 | 92.9524 |
3.999 | 83.0 | 498 | 8.4273 | 0.1558 | 68.4286 |
3.612 | 84.0 | 504 | 8.3825 | 0.1379 | 52.9524 |
2.5813 | 85.0 | 510 | 8.3289 | 0.1275 | 42.0 |
2.8265 | 86.0 | 516 | 8.3150 | 0.2806 | 22.9048 |
3.1955 | 87.0 | 522 | 8.3218 | 0.2976 | 17.4762 |
2.7654 | 88.0 | 528 | 8.3135 | 0.2878 | 35.619 |
3.7539 | 89.0 | 534 | 8.3157 | 0.0896 | 48.4762 |
1.8882 | 90.0 | 540 | 8.3397 | 0.0897 | 57.7619 |
2.5795 | 91.0 | 546 | 8.3700 | 0.069 | 79.1905 |
1.9473 | 92.0 | 552 | 8.4195 | 0.1347 | 152.4762 |
2.349 | 93.0 | 558 | 8.4513 | 0.0239 | 183.619 |
3.1561 | 94.0 | 564 | 8.4664 | 0.0234 | 192.4286 |
2.9355 | 95.0 | 570 | 8.4679 | 0.1186 | 167.8571 |
2.5661 | 96.0 | 576 | 8.4588 | 0.1833 | 110.9524 |
3.1005 | 97.0 | 582 | 8.4478 | 0.0432 | 124.8571 |
2.7184 | 98.0 | 588 | 8.4399 | 0.0589 | 84.9048 |
2.8431 | 99.0 | 594 | 8.4340 | 0.1961 | 103.9524 |
2.9269 | 100.0 | 600 | 8.4343 | 0.0488 | 93.2857 |
Framework versions
- Transformers 4.35.2
- Pytorch 1.13.1+cu117
- Datasets 2.16.1
- Tokenizers 0.15.0