Edit model card

moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth1.0-cbl1e-04-ncs1e-02

This model is a fine-tuned version of google/switch-base-32 on the wmt16 tr-en dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8158
  • Bleu: 17.9697
  • Gen Len: 23.4585
  • Num Effective Experts: 1.0
  • Num Experts Activated: 1.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 40.0

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len Num Effective Experts Num Experts Activated
No log 0 0 3.4882 6.5723 31.0779 1.0 1.0
2.8827 0.3110 500 3.1980 13.7656 22.1908 1.0 1.0
2.6705 0.6221 1000 3.1761 13.8218 22.044 1.0 1.0
2.6131 0.9331 1500 3.1757 14.1353 22.0509 1.0 1.0
2.532 1.2442 2000 3.1722 14.6205 22.1938 1.0 1.0
2.4799 1.5552 2500 3.1606 14.1852 21.9211 1.0 1.0
2.4403 1.8663 3000 3.1326 14.7625 22.1788 1.0 1.0
2.405 2.1773 3500 3.1424 14.395 22.4186 1.0 1.0
2.3671 2.4883 4000 3.1295 14.5631 22.1089 1.0 1.0
2.3672 2.7994 4500 3.1100 14.8208 22.3467 1.0 1.0
2.3233 3.1104 5000 3.1279 14.5771 22.2488 1.0 1.0
2.3117 3.4215 5500 3.1230 14.8875 22.2827 1.0 1.0
2.2961 3.7325 6000 3.1072 14.9112 22.4655 1.0 1.0
2.2791 4.0435 6500 3.0837 14.9004 22.4935 1.0 1.0
2.2677 4.3546 7000 3.0973 14.9838 22.5724 1.0 1.0
2.25 4.6656 7500 3.0762 15.0671 22.5265 1.0 1.0
2.2196 4.9767 8000 3.0667 15.3542 22.6883 1.0 1.0
2.2104 5.2877 8500 3.0642 14.9499 22.5025 1.0 1.0
2.2078 5.5988 9000 3.0641 15.2073 22.6583 1.0 1.0
2.1742 5.9098 9500 3.0536 15.5605 22.4555 1.0 1.0
2.1663 6.2208 10000 3.0454 15.3846 22.7423 1.0 1.0
2.1706 6.5319 10500 3.0503 15.4938 22.6803 1.0 1.0
2.1599 6.8429 11000 3.0283 15.5693 22.7712 1.0 1.0
2.1333 7.1540 11500 3.0282 15.4237 22.6593 1.0 1.0
2.1346 7.4650 12000 3.0225 15.7185 22.9251 1.0 1.0
2.1391 7.7760 12500 3.0253 15.8025 22.8102 1.0 1.0
2.1061 8.0871 13000 3.0294 15.8164 22.7263 1.0 1.0
2.1034 8.3981 13500 3.0155 16.0624 22.953 1.0 1.0
2.1083 8.7092 14000 3.0003 16.0519 23.1948 1.0 1.0
2.0951 9.0202 14500 3.0086 15.876 22.9071 1.0 1.0
2.0686 9.3313 15000 3.0056 15.9467 23.0639 1.0 1.0
2.0756 9.6423 15500 3.0084 15.9649 23.0619 1.0 1.0
2.093 9.9533 16000 2.9907 16.1523 23.1738 1.0 1.0
2.0505 10.2644 16500 2.9956 16.0086 23.05 1.0 1.0
2.0669 10.5754 17000 3.0066 16.0278 22.9121 1.0 1.0
2.0578 10.8865 17500 2.9970 16.0734 22.981 1.0 1.0
2.0344 11.1975 18000 2.9924 16.2015 23.007 1.0 1.0
2.0468 11.5086 18500 2.9852 16.2568 23.029 1.0 1.0
2.0355 11.8196 19000 2.9727 16.3392 23.1658 1.0 1.0
2.0265 12.1306 19500 2.9666 16.2718 22.9021 1.0 1.0
1.9991 12.4417 20000 2.9773 16.4887 23.1768 1.0 1.0
2.0307 12.7527 20500 2.9632 16.3963 23.1888 1.0 1.0
2.0034 13.0638 21000 2.9750 16.285 23.1109 1.0 1.0
1.9987 13.3748 21500 2.9614 16.2877 23.1379 1.0 1.0
2.0169 13.6858 22000 2.9667 16.4026 23.3776 1.0 1.0
2.0004 13.9969 22500 2.9640 16.353 23.1229 1.0 1.0
1.9763 14.3079 23000 2.9707 16.2877 22.7912 1.0 1.0
1.9777 14.6190 23500 2.9613 16.4306 23.1139 1.0 1.0
1.9777 14.9300 24000 2.9546 16.5177 23.1329 1.0 1.0
1.9698 15.2411 24500 2.9568 16.4457 23.1718 1.0 1.0
1.9528 15.5521 25000 2.9439 16.4265 23.03 1.0 1.0
1.9712 15.8631 25500 2.9592 16.4107 22.9481 1.0 1.0
1.9648 16.1742 26000 2.9436 16.7914 23.3027 1.0 1.0
1.9409 16.4852 26500 2.9242 16.6053 23.2328 1.0 1.0
1.9589 16.7963 27000 2.9364 16.6904 23.1419 1.0 1.0
1.9441 17.1073 27500 2.9384 16.6006 23.3786 1.0 1.0
1.9389 17.4184 28000 2.9259 16.5851 23.1249 1.0 1.0
1.9402 17.7294 28500 2.9365 16.7892 23.3037 1.0 1.0
1.9391 18.0404 29000 2.9174 16.8765 23.3007 1.0 1.0
1.9202 18.3515 29500 2.9283 16.8139 23.2278 1.0 1.0
1.9258 18.6625 30000 2.9103 16.7764 23.3626 1.0 1.0
1.9289 18.9736 30500 2.9025 16.9497 23.4216 1.0 1.0
1.9054 19.2846 31000 2.9183 16.8306 23.1538 1.0 1.0
1.9248 19.5956 31500 2.9174 16.6121 23.2557 1.0 1.0
1.8915 19.9067 32000 2.9188 16.8099 23.2707 1.0 1.0
1.8897 20.2177 32500 2.9161 17.1379 23.3337 1.0 1.0
1.9033 20.5288 33000 2.8964 17.3044 23.3377 1.0 1.0
1.9092 20.8398 33500 2.8851 17.2853 23.5245 1.0 1.0
1.892 21.1509 34000 2.8927 17.3724 23.6663 1.0 1.0
1.8814 21.4619 34500 2.9085 17.7419 23.5804 1.0 1.0
1.882 21.7729 35000 2.8999 17.4058 23.3866 1.0 1.0
1.8704 22.0840 35500 2.8943 17.3501 23.4126 1.0 1.0
1.8786 22.3950 36000 2.8861 16.9294 23.2408 1.0 1.0
1.8864 22.7061 36500 2.8948 17.602 23.3367 1.0 1.0
1.8705 23.0171 37000 2.9012 16.978 23.3187 1.0 1.0
1.8506 23.3281 37500 2.8966 17.0945 23.2807 1.0 1.0
1.8602 23.6392 38000 2.8981 17.4144 23.3067 1.0 1.0
1.8609 23.9502 38500 2.8913 17.2312 23.3966 1.0 1.0
1.8456 24.2613 39000 2.8868 17.3542 23.5315 1.0 1.0
1.8624 24.5723 39500 2.8816 17.5182 23.4625 1.0 1.0
1.8549 24.8834 40000 2.8679 17.6249 23.3147 1.0 1.0
1.8482 25.1944 40500 2.8696 17.0777 23.2488 1.0 1.0
1.8508 25.5054 41000 2.8802 17.5002 23.3926 1.0 1.0
1.8478 25.8165 41500 2.8835 17.4787 23.2408 1.0 1.0
1.8285 26.1275 42000 2.8708 17.593 23.4815 1.0 1.0
1.8405 26.4386 42500 2.8660 17.6444 23.5215 1.0 1.0
1.8478 26.7496 43000 2.8591 17.2991 23.4975 1.0 1.0
1.8333 27.0607 43500 2.8684 17.1266 23.2717 1.0 1.0
1.8414 27.3717 44000 2.8626 17.6693 23.3946 1.0 1.0
1.8179 27.6827 44500 2.8631 17.496 23.3087 1.0 1.0
1.8373 27.9938 45000 2.8615 17.2557 23.4905 1.0 1.0
1.8125 28.3048 45500 2.8634 17.5983 23.2837 1.0 1.0
1.8083 28.6159 46000 2.8739 17.4523 23.4196 1.0 1.0
1.8198 28.9269 46500 2.8648 17.4243 23.1239 1.0 1.0
1.8176 29.2379 47000 2.8561 17.663 23.5075 1.0 1.0
1.7978 29.5490 47500 2.8633 17.3527 23.2817 1.0 1.0
1.8006 29.8600 48000 2.8673 17.5728 23.2607 1.0 1.0
1.7864 30.1711 48500 2.8652 17.4747 23.3596 1.0 1.0
1.8005 30.4821 49000 2.8419 17.2911 23.2967 1.0 1.0
1.8019 30.7932 49500 2.8508 17.5193 23.4166 1.0 1.0
1.799 31.1042 50000 2.8583 17.8199 23.4146 1.0 1.0
1.7793 31.4152 50500 2.8638 17.6801 23.2248 1.0 1.0
1.8058 31.7263 51000 2.8558 17.8915 23.4436 1.0 1.0
1.7813 32.0373 51500 2.8543 17.7754 23.4875 1.0 1.0
1.7797 32.3484 52000 2.8473 17.8121 23.4116 1.0 1.0
1.7899 32.6594 52500 2.8375 17.93 23.5185 1.0 1.0
1.7933 32.9705 53000 2.8415 17.7522 23.4525 1.0 1.0
1.7688 33.2815 53500 2.8382 17.7477 23.4276 1.0 1.0
1.7744 33.5925 54000 2.8387 17.7408 23.3167 1.0 1.0
1.7471 33.9036 54500 2.8381 17.877 23.2008 1.0 1.0
1.7634 34.2146 55000 2.8337 17.89 23.7752 1.0 1.0
1.7575 34.5257 55500 2.8345 17.9517 23.5095 1.0 1.0
1.7714 34.8367 56000 2.8359 18.0543 23.3107 1.0 1.0
1.7433 35.1477 56500 2.8411 17.7165 23.4705 1.0 1.0
1.7606 35.4588 57000 2.8445 17.7763 23.2967 1.0 1.0
1.756 35.7698 57500 2.8265 18.063 23.3756 1.0 1.0
1.7563 36.0809 58000 2.8317 18.0996 23.5814 1.0 1.0
1.7395 36.3919 58500 2.8379 17.7001 23.3387 1.0 1.0
1.7761 36.7030 59000 2.8318 18.1463 23.5554 1.0 1.0
1.7363 37.0140 59500 2.8464 18.0277 23.4266 1.0 1.0
1.7502 37.3250 60000 2.8201 18.0244 23.4775 1.0 1.0
1.7577 37.6361 60500 2.8100 18.2631 23.6773 1.0 1.0
1.7443 37.9471 61000 2.8229 18.0188 23.3806 1.0 1.0
1.7385 38.2582 61500 2.8347 18.1092 23.1698 1.0 1.0
1.7392 38.5692 62000 2.8096 18.295 23.4166 1.0 1.0
1.7424 38.8802 62500 2.8257 18.0568 23.3427 1.0 1.0
1.7297 39.1913 63000 2.8203 17.989 23.5455 1.0 1.0
1.7461 39.5023 63500 2.8248 17.9612 23.4466 1.0 1.0
1.7442 39.8134 64000 2.8192 18.0649 23.3087 1.0 1.0

Framework versions

  • Transformers 4.44.1
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for taehyunzzz/moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth1.0-cbl1e-04-ncs1e-02

Finetuned
this model

Dataset used to train taehyunzzz/moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth1.0-cbl1e-04-ncs1e-02

Evaluation results