moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth0.75-cbl1e-04-ncs1e-02
This model is a fine-tuned version of google/switch-base-32 on the wmt16 tr-en dataset. It achieves the following results on the evaluation set:
- Loss: 2.7773
- Bleu: 18.228
- Gen Len: 23.6693
- Num Effective Experts: 24.333
- Num Experts Activated: 2.259
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_steps: 200
- num_epochs: 40.0
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len | Num Effective Experts | Num Experts Activated |
---|---|---|---|---|---|---|---|
No log | 0 | 0 | 3.4641 | 7.0396 | 30.2687 | 25.0 | 1.36 |
2.8476 | 0.3110 | 500 | 3.1034 | 13.9731 | 22.045 | 16.0 | 1.613 |
2.6574 | 0.6221 | 1000 | 3.0850 | 14.1585 | 22.0719 | 19.0 | 1.947 |
2.595 | 0.9331 | 1500 | 3.0816 | 14.2217 | 22.1878 | 21.667 | 2.049 |
2.5157 | 1.2442 | 2000 | 3.0821 | 14.3701 | 22.4505 | 23.333 | 2.135 |
2.4631 | 1.5552 | 2500 | 3.0577 | 14.3031 | 22.1119 | 23.667 | 2.174 |
2.4304 | 1.8663 | 3000 | 3.0451 | 14.9221 | 22.4975 | 23.667 | 2.129 |
2.3938 | 2.1773 | 3500 | 3.0466 | 14.6766 | 22.5694 | 23.333 | 2.141 |
2.3619 | 2.4883 | 4000 | 3.0580 | 14.6777 | 22.4296 | 22.667 | 2.124 |
2.3518 | 2.7994 | 4500 | 3.0400 | 14.9204 | 22.4835 | 22.667 | 2.167 |
2.3149 | 3.1104 | 5000 | 3.0427 | 15.0572 | 22.5684 | 23.0 | 2.164 |
2.2997 | 3.4215 | 5500 | 3.0233 | 15.1565 | 22.6434 | 23.0 | 2.158 |
2.2833 | 3.7325 | 6000 | 3.0361 | 15.0865 | 22.4505 | 23.333 | 2.211 |
2.2684 | 4.0435 | 6500 | 3.0082 | 15.0727 | 22.7283 | 23.333 | 2.218 |
2.2573 | 4.3546 | 7000 | 3.0196 | 15.1365 | 22.6763 | 23.667 | 2.226 |
2.2428 | 4.6656 | 7500 | 3.0066 | 15.1963 | 22.8581 | 24.333 | 2.215 |
2.216 | 4.9767 | 8000 | 2.9991 | 15.1412 | 22.6344 | 23.333 | 2.196 |
2.1934 | 5.2877 | 8500 | 2.9896 | 15.3506 | 22.7532 | 25.333 | 2.248 |
2.1987 | 5.5988 | 9000 | 2.9837 | 15.3846 | 22.8072 | 24.0 | 2.241 |
2.1615 | 5.9098 | 9500 | 2.9952 | 15.6195 | 22.7972 | 24.0 | 2.238 |
2.1599 | 6.2208 | 10000 | 2.9797 | 15.8393 | 22.9491 | 25.667 | 2.27 |
2.1647 | 6.5319 | 10500 | 2.9705 | 15.6556 | 22.8182 | 25.333 | 2.274 |
2.1441 | 6.8429 | 11000 | 2.9618 | 15.644 | 22.975 | 24.667 | 2.256 |
2.1191 | 7.1540 | 11500 | 2.9723 | 15.3876 | 22.8172 | 24.667 | 2.256 |
2.1249 | 7.4650 | 12000 | 2.9505 | 15.5752 | 22.8751 | 25.333 | 2.258 |
2.1302 | 7.7760 | 12500 | 2.9474 | 15.3248 | 23.037 | 25.333 | 2.298 |
2.1044 | 8.0871 | 13000 | 2.9539 | 15.4191 | 22.8242 | 24.333 | 2.279 |
2.0993 | 8.3981 | 13500 | 2.9420 | 15.9856 | 23.2537 | 25.333 | 2.294 |
2.1002 | 8.7092 | 14000 | 2.9409 | 15.5383 | 23.018 | 25.333 | 2.312 |
2.0892 | 9.0202 | 14500 | 2.9431 | 15.7347 | 22.965 | 26.0 | 2.317 |
2.0602 | 9.3313 | 15000 | 2.9369 | 15.7065 | 23.2907 | 25.333 | 2.297 |
2.0661 | 9.6423 | 15500 | 2.9426 | 15.5456 | 23.1429 | 26.0 | 2.333 |
2.077 | 9.9533 | 16000 | 2.9398 | 15.7638 | 23.024 | 25.667 | 2.345 |
2.045 | 10.2644 | 16500 | 2.9482 | 15.6033 | 22.8651 | 25.333 | 2.302 |
2.0559 | 10.5754 | 17000 | 2.9469 | 15.852 | 22.971 | 25.0 | 2.346 |
2.045 | 10.8865 | 17500 | 2.9362 | 15.7729 | 22.8661 | 25.333 | 2.323 |
2.0293 | 11.1975 | 18000 | 2.9130 | 16.2118 | 23.1898 | 25.333 | 2.362 |
2.0403 | 11.5086 | 18500 | 2.9211 | 16.3007 | 23.2837 | 25.667 | 2.374 |
2.0301 | 11.8196 | 19000 | 2.9190 | 16.5118 | 23.3187 | 25.333 | 2.385 |
2.0197 | 12.1306 | 19500 | 2.9139 | 16.4862 | 23.1838 | 25.667 | 2.366 |
1.9903 | 12.4417 | 20000 | 2.9099 | 16.6206 | 23.3397 | 26.333 | 2.394 |
2.0236 | 12.7527 | 20500 | 2.9099 | 16.3161 | 23.1908 | 25.0 | 2.364 |
1.9977 | 13.0638 | 21000 | 2.9182 | 16.4259 | 23.2088 | 23.333 | 2.383 |
1.9883 | 13.3748 | 21500 | 2.9033 | 16.5355 | 23.0579 | 25.667 | 2.351 |
2.0018 | 13.6858 | 22000 | 2.9034 | 16.5309 | 23.1039 | 25.667 | 2.38 |
1.9869 | 13.9969 | 22500 | 2.8894 | 16.1481 | 23.3407 | 24.667 | 2.382 |
1.9649 | 14.3079 | 23000 | 2.9003 | 16.4894 | 23.3756 | 25.333 | 2.391 |
1.9665 | 14.6190 | 23500 | 2.8996 | 16.6273 | 23.2757 | 25.333 | 2.39 |
1.9664 | 14.9300 | 24000 | 2.8856 | 16.5667 | 23.2997 | 24.333 | 2.366 |
1.964 | 15.2411 | 24500 | 2.8963 | 16.5836 | 22.986 | 24.0 | 2.373 |
1.9455 | 15.5521 | 25000 | 2.8765 | 16.5123 | 23.4565 | 24.667 | 2.395 |
1.9616 | 15.8631 | 25500 | 2.8802 | 16.1826 | 23.0619 | 25.0 | 2.381 |
1.9526 | 16.1742 | 26000 | 2.8844 | 16.4702 | 23.4845 | 24.0 | 2.369 |
1.9282 | 16.4852 | 26500 | 2.8865 | 16.8511 | 23.2947 | 25.0 | 2.383 |
1.9518 | 16.7963 | 27000 | 2.8779 | 16.5658 | 23.3437 | 24.0 | 2.374 |
1.9342 | 17.1073 | 27500 | 2.8745 | 16.4818 | 23.3477 | 25.0 | 2.367 |
1.933 | 17.4184 | 28000 | 2.8753 | 16.5922 | 23.0749 | 25.0 | 2.376 |
1.924 | 17.7294 | 28500 | 2.8700 | 16.6258 | 23.1059 | 24.667 | 2.395 |
1.9317 | 18.0404 | 29000 | 2.8742 | 17.0658 | 23.4775 | 24.0 | 2.37 |
1.9069 | 18.3515 | 29500 | 2.8713 | 16.8956 | 23.2737 | 24.333 | 2.428 |
1.9172 | 18.6625 | 30000 | 2.8586 | 16.9523 | 23.3197 | 25.0 | 2.363 |
1.9174 | 18.9736 | 30500 | 2.8368 | 17.1978 | 23.5864 | 24.667 | 2.398 |
1.897 | 19.2846 | 31000 | 2.8538 | 17.3551 | 23.4456 | 25.0 | 2.388 |
1.9115 | 19.5956 | 31500 | 2.8569 | 16.7605 | 23.3187 | 24.667 | 2.4 |
1.8882 | 19.9067 | 32000 | 2.8524 | 17.0631 | 23.3137 | 24.667 | 2.423 |
1.8862 | 20.2177 | 32500 | 2.8410 | 17.3506 | 23.4525 | 25.0 | 2.4 |
1.8974 | 20.5288 | 33000 | 2.8373 | 17.1207 | 23.3756 | 23.333 | 2.421 |
1.899 | 20.8398 | 33500 | 2.8498 | 17.2751 | 23.4176 | 24.333 | 2.419 |
1.8887 | 21.1509 | 34000 | 2.8350 | 17.1577 | 23.6074 | 25.333 | 2.421 |
1.8772 | 21.4619 | 34500 | 2.8412 | 16.9793 | 23.2977 | 25.667 | 2.396 |
1.8709 | 21.7729 | 35000 | 2.8457 | 17.0508 | 23.5754 | 24.333 | 2.42 |
1.8666 | 22.0840 | 35500 | 2.8340 | 17.0809 | 23.3616 | 24.0 | 2.422 |
1.8729 | 22.3950 | 36000 | 2.8410 | 17.0623 | 23.4835 | 24.667 | 2.383 |
1.8826 | 22.7061 | 36500 | 2.8572 | 17.1967 | 23.2857 | 24.333 | 2.399 |
1.8643 | 23.0171 | 37000 | 2.8530 | 16.8548 | 23.4236 | 24.333 | 2.394 |
1.845 | 23.3281 | 37500 | 2.8371 | 17.2153 | 23.4346 | 24.333 | 2.381 |
1.859 | 23.6392 | 38000 | 2.8478 | 17.1054 | 23.2208 | 25.0 | 2.379 |
1.854 | 23.9502 | 38500 | 2.8382 | 17.4409 | 23.6643 | 24.667 | 2.347 |
1.8375 | 24.2613 | 39000 | 2.8333 | 17.6097 | 23.6973 | 24.0 | 2.395 |
1.8552 | 24.5723 | 39500 | 2.8248 | 17.7811 | 23.6823 | 24.333 | 2.408 |
1.8443 | 24.8834 | 40000 | 2.8306 | 17.1533 | 23.2138 | 23.333 | 2.38 |
1.836 | 25.1944 | 40500 | 2.8330 | 17.2345 | 23.5155 | 23.667 | 2.366 |
1.8437 | 25.5054 | 41000 | 2.8244 | 17.6586 | 23.4086 | 24.333 | 2.392 |
1.8402 | 25.8165 | 41500 | 2.8249 | 17.3532 | 23.6424 | 24.667 | 2.344 |
1.8182 | 26.1275 | 42000 | 2.8298 | 17.5388 | 23.5704 | 22.667 | 2.324 |
1.8293 | 26.4386 | 42500 | 2.8203 | 17.3617 | 23.3726 | 24.0 | 2.328 |
1.8408 | 26.7496 | 43000 | 2.8146 | 17.294 | 23.3946 | 24.333 | 2.34 |
1.8228 | 27.0607 | 43500 | 2.8102 | 17.5626 | 23.4156 | 24.667 | 2.308 |
1.8321 | 27.3717 | 44000 | 2.8370 | 17.5569 | 23.4196 | 25.0 | 2.36 |
1.8123 | 27.6827 | 44500 | 2.8256 | 17.3386 | 23.6464 | 24.667 | 2.343 |
1.8277 | 27.9938 | 45000 | 2.8216 | 17.6806 | 23.7652 | 23.667 | 2.325 |
1.8002 | 28.3048 | 45500 | 2.8217 | 17.3995 | 23.5834 | 23.0 | 2.319 |
1.8018 | 28.6159 | 46000 | 2.8145 | 17.2068 | 23.4466 | 23.667 | 2.336 |
1.8145 | 28.9269 | 46500 | 2.8261 | 17.3871 | 23.4675 | 22.333 | 2.362 |
1.8109 | 29.2379 | 47000 | 2.8277 | 17.4646 | 23.4515 | 23.333 | 2.318 |
1.7886 | 29.5490 | 47500 | 2.8132 | 17.3751 | 23.5714 | 21.667 | 2.335 |
1.795 | 29.8600 | 48000 | 2.8155 | 17.6465 | 23.5534 | 23.333 | 2.322 |
1.7832 | 30.1711 | 48500 | 2.8150 | 17.6549 | 23.4615 | 23.667 | 2.306 |
1.7974 | 30.4821 | 49000 | 2.8102 | 17.3266 | 23.4336 | 25.667 | 2.314 |
1.7962 | 30.7932 | 49500 | 2.7995 | 17.8548 | 23.5145 | 22.333 | 2.293 |
1.7842 | 31.1042 | 50000 | 2.8146 | 17.745 | 23.5235 | 21.667 | 2.308 |
1.7769 | 31.4152 | 50500 | 2.8023 | 17.8384 | 23.3616 | 23.333 | 2.316 |
1.7971 | 31.7263 | 51000 | 2.8043 | 18.0606 | 23.6454 | 24.333 | 2.32 |
1.7746 | 32.0373 | 51500 | 2.8176 | 17.7628 | 23.4745 | 25.0 | 2.33 |
1.7735 | 32.3484 | 52000 | 2.8082 | 17.931 | 23.5115 | 24.333 | 2.314 |
1.7825 | 32.6594 | 52500 | 2.7981 | 17.6958 | 23.5844 | 25.333 | 2.311 |
1.7872 | 32.9705 | 53000 | 2.8009 | 17.8739 | 23.6484 | 24.667 | 2.31 |
1.7652 | 33.2815 | 53500 | 2.8003 | 17.9319 | 23.6533 | 23.667 | 2.315 |
1.7699 | 33.5925 | 54000 | 2.8008 | 17.8419 | 23.6793 | 23.333 | 2.316 |
1.7431 | 33.9036 | 54500 | 2.7819 | 17.7766 | 23.6354 | 24.333 | 2.348 |
1.7523 | 34.2146 | 55000 | 2.8095 | 18.3728 | 23.8182 | 23.667 | 2.319 |
1.7466 | 34.5257 | 55500 | 2.7906 | 18.2066 | 23.6823 | 24.333 | 2.319 |
1.7626 | 34.8367 | 56000 | 2.7825 | 17.9383 | 23.4605 | 24.333 | 2.311 |
1.7403 | 35.1477 | 56500 | 2.8030 | 17.7189 | 23.5035 | 23.667 | 2.295 |
1.7531 | 35.4588 | 57000 | 2.8020 | 17.7917 | 23.6184 | 22.667 | 2.293 |
1.7525 | 35.7698 | 57500 | 2.7739 | 17.8031 | 23.4805 | 23.667 | 2.269 |
1.7511 | 36.0809 | 58000 | 2.7933 | 18.267 | 23.4895 | 23.333 | 2.297 |
1.7316 | 36.3919 | 58500 | 2.7853 | 18.3131 | 23.6014 | 24.667 | 2.322 |
1.766 | 36.7030 | 59000 | 2.7798 | 17.9374 | 23.6703 | 24.0 | 2.307 |
1.7258 | 37.0140 | 59500 | 2.7869 | 18.055 | 23.7522 | 25.333 | 2.275 |
1.7434 | 37.3250 | 60000 | 2.7764 | 18.1332 | 23.8412 | 23.667 | 2.264 |
1.7489 | 37.6361 | 60500 | 2.7782 | 18.0713 | 23.5355 | 23.0 | 2.292 |
1.7386 | 37.9471 | 61000 | 2.7788 | 18.3528 | 23.6953 | 23.667 | 2.26 |
1.727 | 38.2582 | 61500 | 2.7905 | 17.9954 | 23.5934 | 23.333 | 2.297 |
1.7318 | 38.5692 | 62000 | 2.7829 | 18.3854 | 23.7882 | 24.0 | 2.289 |
1.734 | 38.8802 | 62500 | 2.7808 | 17.945 | 23.4905 | 23.667 | 2.277 |
1.725 | 39.1913 | 63000 | 2.7804 | 17.8555 | 23.5964 | 23.667 | 2.285 |
1.7396 | 39.5023 | 63500 | 2.7892 | 17.9859 | 23.5415 | 24.0 | 2.237 |
1.7396 | 39.8134 | 64000 | 2.7794 | 18.1683 | 23.4446 | 23.333 | 2.269 |
Framework versions
- Transformers 4.44.1
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
- Downloads last month
- 2
Inference API (serverless) is not available, repository is disabled.
Model tree for taehyunzzz/moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth0.75-cbl1e-04-ncs1e-02
Base model
google/switch-base-32
Finetuned
this model