Edit model card

moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth0.25-cbl1e-04-ncs1e-02

This model is a fine-tuned version of google/switch-base-32 on the wmt16 tr-en dataset. It achieves the following results on the evaluation set:

  • Loss: 2.7365
  • Bleu: 18.5199
  • Gen Len: 23.9251
  • Num Effective Experts: 30.333
  • Num Experts Activated: 4.993

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 40.0

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len Num Effective Experts Num Experts Activated
No log 0 0 3.4154 8.2378 28.5195 30.667 2.911
2.8479 0.3110 500 3.0730 14.1072 22.3107 24.667 3.322
2.6451 0.6221 1000 3.0686 14.2929 22.4066 25.0 3.799
2.5862 0.9331 1500 3.0439 14.6207 22.4436 26.333 4.221
2.5054 1.2442 2000 3.0426 14.2785 22.6494 27.0 4.24
2.4584 1.5552 2500 3.0154 14.7523 22.5275 27.333 4.519
2.4138 1.8663 3000 3.0164 14.81 22.5914 26.667 4.693
2.3793 2.1773 3500 3.0143 14.853 22.6703 27.333 4.837
2.3473 2.4883 4000 2.9970 14.7708 22.5425 26.667 4.686
2.3361 2.7994 4500 2.9877 14.9613 23.032 27.333 4.938
2.3039 3.1104 5000 2.9886 15.1493 22.6963 27.667 5.036
2.2854 3.4215 5500 2.9847 15.1378 23.022 28.0 5.079
2.2748 3.7325 6000 2.9653 15.1679 22.8561 29.333 5.07
2.2496 4.0435 6500 2.9634 15.2058 23.1968 28.667 5.035
2.2477 4.3546 7000 2.9777 15.1306 23.1638 29.667 5.279
2.2309 4.6656 7500 2.9624 15.2592 23.2388 29.667 5.179
2.2022 4.9767 8000 2.9510 15.1499 23.012 29.667 4.948
2.1813 5.2877 8500 2.9470 15.6414 23.1139 30.0 5.384
2.1842 5.5988 9000 2.9692 15.2317 22.9371 29.333 5.365
2.1533 5.9098 9500 2.9626 15.5085 23.028 30.333 5.105
2.151 6.2208 10000 2.9343 15.5229 23.029 29.667 5.243
2.1538 6.5319 10500 2.9360 15.4529 22.992 29.667 5.3
2.1335 6.8429 11000 2.9354 15.6448 23.1868 29.667 5.25
2.1096 7.1540 11500 2.9262 15.7469 23.3656 30.0 5.129
2.1085 7.4650 12000 2.9244 15.9215 23.2687 30.0 5.235
2.1147 7.7760 12500 2.9228 15.7136 23.2557 30.667 5.513
2.0895 8.0871 13000 2.9172 15.7284 23.1209 29.0 5.274
2.0884 8.3981 13500 2.9237 16.1724 23.3576 29.333 5.157
2.0895 8.7092 14000 2.9076 16.2392 23.4026 30.667 5.244
2.0699 9.0202 14500 2.9323 15.715 22.998 30.333 5.288
2.0527 9.3313 15000 2.9141 15.975 23.2288 29.333 5.041
2.0537 9.6423 15500 2.9181 15.8598 23.2188 29.333 5.126
2.0686 9.9533 16000 2.8982 16.0522 23.2278 29.667 5.104
2.0291 10.2644 16500 2.9157 15.946 23.2857 30.0 5.042
2.0462 10.5754 17000 2.9202 15.752 23.0829 30.0 5.194
2.031 10.8865 17500 2.9284 16.0618 23.3876 29.667 5.131
2.0148 11.1975 18000 2.9079 16.1963 23.4206 28.333 5.045
2.0322 11.5086 18500 2.8918 16.1004 23.3986 30.0 4.984
2.0189 11.8196 19000 2.8864 16.3315 23.5385 30.0 5.32
2.0081 12.1306 19500 2.8959 16.1941 23.2857 29.333 5.184
1.9797 12.4417 20000 2.8860 16.4102 23.4885 30.333 5.066
2.0115 12.7527 20500 2.8843 16.3521 23.2318 30.0 5.387
1.9816 13.0638 21000 2.8950 16.2667 23.4925 28.0 5.16
1.979 13.3748 21500 2.8804 16.3399 23.3606 29.667 5.232
1.9937 13.6858 22000 2.8774 16.7289 23.5824 29.667 5.289
1.9771 13.9969 22500 2.8911 16.3132 23.2617 29.0 4.985
1.955 14.3079 23000 2.8790 16.3578 23.5085 30.333 5.2
1.9582 14.6190 23500 2.8677 16.3462 23.4006 29.333 5.185
1.9595 14.9300 24000 2.8824 16.599 23.4935 29.333 5.034
1.9532 15.2411 24500 2.8866 16.5684 23.5185 28.667 4.923
1.9387 15.5521 25000 2.8832 16.5632 23.3716 30.333 5.083
1.9543 15.8631 25500 2.8689 16.4792 23.4875 30.667 5.195
1.9415 16.1742 26000 2.8836 16.6852 23.4416 30.667 5.147
1.9135 16.4852 26500 2.8615 16.8874 23.5964 29.667 5.154
1.9384 16.7963 27000 2.8801 16.9029 23.4675 30.333 5.007
1.9244 17.1073 27500 2.8592 16.8369 23.4895 29.333 5.038
1.9168 17.4184 28000 2.8519 16.6897 23.5864 29.333 5.152
1.9162 17.7294 28500 2.8757 16.7875 23.4645 29.667 5.16
1.9233 18.0404 29000 2.8350 17.2552 23.7463 29.667 5.24
1.9001 18.3515 29500 2.8502 17.0544 23.3946 30.0 5.284
1.9086 18.6625 30000 2.8485 16.9477 23.4266 29.667 5.204
1.909 18.9736 30500 2.8407 17.1568 23.4965 29.333 5.166
1.8849 19.2846 31000 2.8562 17.5149 23.6104 30.333 5.218
1.9012 19.5956 31500 2.8393 16.7434 23.5774 29.333 4.9
1.8739 19.9067 32000 2.8426 17.2393 23.6903 29.667 5.331
1.8686 20.2177 32500 2.8522 17.0798 23.5684 29.0 4.883
1.8794 20.5288 33000 2.8316 17.382 23.8711 30.0 4.991
1.8873 20.8398 33500 2.8475 17.4889 23.7253 30.333 5.219
1.8779 21.1509 34000 2.8296 17.4302 23.8082 30.0 5.206
1.8663 21.4619 34500 2.8442 17.5479 23.4865 29.667 4.982
1.8638 21.7729 35000 2.8150 17.4714 23.6773 29.667 5.219
1.8539 22.0840 35500 2.8202 17.6184 23.8232 30.333 5.236
1.8589 22.3950 36000 2.8311 17.6722 23.8052 30.0 5.325
1.8648 22.7061 36500 2.8243 17.3194 23.6064 30.0 5.237
1.8493 23.0171 37000 2.8199 17.6041 23.8062 28.333 5.198
1.8314 23.3281 37500 2.8188 17.8015 23.5644 31.0 5.344
1.8396 23.6392 38000 2.8255 17.456 23.6983 30.667 5.471
1.8403 23.9502 38500 2.8204 17.5146 23.6653 29.667 5.181
1.8294 24.2613 39000 2.8034 17.7815 24.023 29.667 4.961
1.8451 24.5723 39500 2.8020 17.7122 23.7463 31.0 5.167
1.836 24.8834 40000 2.8049 17.7888 23.5604 31.0 5.347
1.8273 25.1944 40500 2.8008 17.9521 23.4785 29.667 5.009
1.8258 25.5054 41000 2.7999 17.8782 23.7323 31.333 5.209
1.8265 25.8165 41500 2.8089 17.7571 23.5684 30.0 4.953
1.8077 26.1275 42000 2.8038 17.4878 23.8052 31.0 5.35
1.8163 26.4386 42500 2.8044 17.8833 23.8631 31.0 5.202
1.8287 26.7496 43000 2.7864 17.8188 23.7053 31.0 5.392
1.8122 27.0607 43500 2.7997 17.8302 23.6474 31.333 5.396
1.8169 27.3717 44000 2.7905 18.0336 23.6364 30.333 5.068
1.7994 27.6827 44500 2.7886 17.9767 23.7113 30.0 5.26
1.8139 27.9938 45000 2.8025 18.0798 23.7622 30.667 5.456
1.793 28.3048 45500 2.8010 17.7851 23.3586 31.0 5.294
1.7857 28.6159 46000 2.8050 17.6537 23.5015 30.667 5.31
1.8001 28.9269 46500 2.7894 17.7159 23.6274 31.0 5.203
1.8004 29.2379 47000 2.8059 17.7587 23.6693 30.0 5.118
1.7788 29.5490 47500 2.7883 18.0221 23.7532 29.333 5.161
1.7873 29.8600 48000 2.7848 18.0927 23.5704 30.333 5.009
1.7693 30.1711 48500 2.7864 17.8142 23.6753 30.0 5.068
1.7817 30.4821 49000 2.7972 18.1257 23.7393 29.667 5.311
1.7867 30.7932 49500 2.7791 18.0242 23.6144 31.667 5.419
1.7751 31.1042 50000 2.7734 18.2008 23.8012 30.0 5.081
1.7627 31.4152 50500 2.7707 18.1825 23.6563 30.667 5.103
1.784 31.7263 51000 2.7776 18.2353 23.8561 30.667 5.221
1.7679 32.0373 51500 2.7779 18.2023 23.7572 30.667 5.124
1.7653 32.3484 52000 2.8029 18.1777 23.7193 29.667 5.329
1.7687 32.6594 52500 2.7755 18.3571 23.5774 30.333 5.273
1.7767 32.9705 53000 2.7805 18.2234 23.7942 30.333 5.572
1.7506 33.2815 53500 2.7670 18.1877 23.6434 30.0 5.187
1.7595 33.5925 54000 2.7883 18.4783 23.7313 29.667 5.051
1.7303 33.9036 54500 2.7782 18.3414 23.6883 31.0 5.081
1.7489 34.2146 55000 2.7792 18.24 23.8841 31.333 5.266
1.7368 34.5257 55500 2.7779 18.289 23.8801 30.667 5.443
1.7519 34.8367 56000 2.7524 18.6989 23.7153 31.0 5.129
1.7279 35.1477 56500 2.7832 18.4093 23.6743 30.333 5.359
1.7438 35.4588 57000 2.7767 18.4513 23.7572 30.333 5.263
1.7382 35.7698 57500 2.7621 18.5617 23.7522 30.0 5.144
1.7325 36.0809 58000 2.7782 18.1986 23.6044 30.0 5.198
1.7206 36.3919 58500 2.7754 18.3978 23.7572 30.667 5.101
1.7551 36.7030 59000 2.7666 18.181 23.8202 30.0 5.148
1.7195 37.0140 59500 2.7790 18.1844 23.7572 30.333 5.347
1.7336 37.3250 60000 2.7590 18.1549 23.7762 31.333 5.186
1.7448 37.6361 60500 2.7695 18.0343 23.5794 30.333 5.052
1.7295 37.9471 61000 2.7522 18.5933 23.8242 30.667 5.066
1.7187 38.2582 61500 2.7631 18.3494 23.5574 30.667 5.232
1.7245 38.5692 62000 2.7743 18.3502 23.7632 30.667 4.992
1.7248 38.8802 62500 2.7415 18.2357 23.7862 31.0 5.055
1.7125 39.1913 63000 2.7606 18.4132 23.6993 30.333 5.153
1.7298 39.5023 63500 2.7516 18.2747 23.5904 30.333 5.087
1.7298 39.8134 64000 2.7470 18.546 23.9281 31.333 5.351

Framework versions

  • Transformers 4.44.1
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for taehyunzzz/moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth0.25-cbl1e-04-ncs1e-02

Finetuned
this model

Dataset used to train taehyunzzz/moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth0.25-cbl1e-04-ncs1e-02

Evaluation results