kmok1's picture
End of training
0cda8da verified
metadata
license: mit
base_model: facebook/m2m100_1.2B
tags:
  - generated_from_trainer
metrics:
  - bleu
model-index:
  - name: cs_m2m_2e-5_100_v0.2
    results: []

cs_m2m_2e-5_100_v0.2

This model is a fine-tuned version of facebook/m2m100_1.2B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3369
  • Bleu: 48.2659
  • Gen Len: 20.4286

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
0.0 1.0 6 2.3719 49.7748 20.619
0.0001 2.0 12 2.4619 52.8954 20.0952
0.0242 3.0 18 2.5621 49.1697 19.9524
0.0 4.0 24 2.4757 48.7512 20.2857
0.0001 5.0 30 2.5652 43.0006 22.619
0.0001 6.0 36 2.5258 40.1532 22.2381
0.0 7.0 42 2.4040 49.8751 20.1905
0.0002 8.0 48 2.4212 49.541 19.4286
0.0001 9.0 54 2.3373 50.7267 21.619
0.0002 10.0 60 2.3222 49.2808 20.9524
0.0002 11.0 66 2.3240 50.4615 20.0
0.0008 12.0 72 2.3064 49.0688 20.1429
0.0006 13.0 78 2.2857 47.8241 19.7619
0.0001 14.0 84 2.2707 48.1756 19.8095
0.0001 15.0 90 2.2770 47.4155 20.0
0.0002 16.0 96 2.3248 46.9435 20.5238
0.0001 17.0 102 2.3505 47.3096 20.9048
0.0001 18.0 108 2.3525 48.5449 20.7619
0.0 19.0 114 2.3462 48.5449 20.7619
0.0001 20.0 120 2.3439 48.6822 20.7143
0.0001 21.0 126 2.3570 49.1326 20.5238
0.0 22.0 132 2.3656 48.5247 20.7143
0.0 23.0 138 2.3684 48.5247 20.7143
0.0001 24.0 144 2.3738 49.3527 20.5714
0.0 25.0 150 2.3793 48.2079 20.8571
0.0001 26.0 156 2.3854 47.8381 21.0476
0.0 27.0 162 2.3897 48.0223 21.0476
0.0001 28.0 168 2.3947 47.8029 21.0
0.0 29.0 174 2.3994 48.1359 20.8571
0.0002 30.0 180 2.3992 48.7452 20.8095
0.0001 31.0 186 2.3984 48.0307 20.5714
0.0001 32.0 192 2.3991 48.2877 20.5238
0.0 33.0 198 2.3979 49.5262 20.619
0.0001 34.0 204 2.3998 49.7465 20.6667
0.0001 35.0 210 2.4019 49.5488 20.619
0.0001 36.0 216 2.4056 49.7465 20.6667
0.0001 37.0 222 2.4108 50.1467 20.4762
0.0001 38.0 228 2.4150 50.1467 20.4762
0.0 39.0 234 2.4182 50.707 20.5714
0.0 40.0 240 2.4189 50.503 20.5238
0.0 41.0 246 2.4151 48.2877 20.5238
0.0001 42.0 252 2.4511 48.7331 20.4762
0.0 43.0 258 2.4614 49.1268 20.2857
0.0 44.0 264 2.4134 48.6628 20.381
0.0 45.0 270 2.4117 48.6628 20.381
0.0001 46.0 276 2.4130 48.2745 20.4286
0.0002 47.0 282 2.3939 47.9864 20.4286
0.0 48.0 288 2.3937 48.6253 20.2857
0.0 49.0 294 2.4062 49.3153 20.0476
0.0 50.0 300 2.4131 49.9443 20.0476
0.0001 51.0 306 2.4164 50.9445 20.0476
0.0 52.0 312 2.4129 50.7412 20.1905
0.0001 53.0 318 2.4178 50.693 20.1905
0.0 54.0 324 2.4051 49.2945 20.381
0.0002 55.0 330 2.4062 49.3592 20.381
0.0001 56.0 336 2.3274 49.7531 20.3333
0.0002 57.0 342 2.2969 50.5601 20.2857
0.0 58.0 348 2.2919 50.9648 20.1429
0.0 59.0 354 2.2730 50.1805 20.2381
0.0 60.0 360 2.2660 50.1805 20.2857
0.0001 61.0 366 2.2664 50.1805 20.2857
0.0001 62.0 372 2.2620 49.064 20.1905
0.0001 63.0 378 2.2576 50.8497 20.2857
0.0002 64.0 384 2.2808 50.5765 20.0476
0.0 65.0 390 2.2962 46.8674 20.381
0.0 66.0 396 2.3097 46.4579 20.3333
0.0001 67.0 402 2.3109 50.015 20.0
0.0 68.0 408 2.3189 49.5925 20.0
0.0 69.0 414 2.3080 49.5925 20.0
0.0 70.0 420 2.3065 49.5925 20.0
0.0 71.0 426 2.3102 50.3721 19.9048
0.0 72.0 432 2.3129 50.3721 19.9048
0.0 73.0 438 2.3154 48.9649 19.8571
0.0 74.0 444 2.3178 48.3266 20.0476
0.0001 75.0 450 2.3205 49.9671 20.1905
0.0 76.0 456 2.3218 49.746 20.0952
0.0 77.0 462 2.3216 49.746 20.2381
0.0 78.0 468 2.3218 49.746 20.2381
0.0 79.0 474 2.3174 50.1689 20.0952
0.0 80.0 480 2.3154 50.5016 20.1905
0.0001 81.0 486 2.3215 48.6113 20.2381
0.0001 82.0 492 2.3330 48.6113 20.2381
0.0003 83.0 498 2.3391 48.6113 20.2381
0.0 84.0 504 2.3418 48.4616 20.2857
0.0 85.0 510 2.3408 48.53 20.1429
0.0031 86.0 516 2.3392 48.4848 20.2857
0.0 87.0 522 2.3401 48.4848 20.2857
0.0001 88.0 528 2.3410 48.4848 20.2857
0.0 89.0 534 2.3416 48.8708 20.2381
0.0002 90.0 540 2.3417 48.8244 20.381
0.0 91.0 546 2.3407 48.2659 20.4286
0.0 92.0 552 2.3394 48.2659 20.4286
0.0001 93.0 558 2.3388 48.2659 20.4286
0.0001 94.0 564 2.3386 48.2659 20.4286
0.0 95.0 570 2.3385 48.2659 20.4286
0.0 96.0 576 2.3383 48.2659 20.4286
0.0001 97.0 582 2.3376 48.2659 20.4286
0.0001 98.0 588 2.3371 48.2659 20.4286
0.0 99.0 594 2.3369 48.2659 20.4286
0.0001 100.0 600 2.3369 48.2659 20.4286

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2