Edit model card

moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth0.75-cbl1e-04-ncs1e-02

This model is a fine-tuned version of google/switch-base-32 on the wmt16 tr-en dataset. It achieves the following results on the evaluation set:

  • Loss: 2.7773
  • Bleu: 18.228
  • Gen Len: 23.6693
  • Num Effective Experts: 24.333
  • Num Experts Activated: 2.259

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 40.0

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len Num Effective Experts Num Experts Activated
No log 0 0 3.4641 7.0396 30.2687 25.0 1.36
2.8476 0.3110 500 3.1034 13.9731 22.045 16.0 1.613
2.6574 0.6221 1000 3.0850 14.1585 22.0719 19.0 1.947
2.595 0.9331 1500 3.0816 14.2217 22.1878 21.667 2.049
2.5157 1.2442 2000 3.0821 14.3701 22.4505 23.333 2.135
2.4631 1.5552 2500 3.0577 14.3031 22.1119 23.667 2.174
2.4304 1.8663 3000 3.0451 14.9221 22.4975 23.667 2.129
2.3938 2.1773 3500 3.0466 14.6766 22.5694 23.333 2.141
2.3619 2.4883 4000 3.0580 14.6777 22.4296 22.667 2.124
2.3518 2.7994 4500 3.0400 14.9204 22.4835 22.667 2.167
2.3149 3.1104 5000 3.0427 15.0572 22.5684 23.0 2.164
2.2997 3.4215 5500 3.0233 15.1565 22.6434 23.0 2.158
2.2833 3.7325 6000 3.0361 15.0865 22.4505 23.333 2.211
2.2684 4.0435 6500 3.0082 15.0727 22.7283 23.333 2.218
2.2573 4.3546 7000 3.0196 15.1365 22.6763 23.667 2.226
2.2428 4.6656 7500 3.0066 15.1963 22.8581 24.333 2.215
2.216 4.9767 8000 2.9991 15.1412 22.6344 23.333 2.196
2.1934 5.2877 8500 2.9896 15.3506 22.7532 25.333 2.248
2.1987 5.5988 9000 2.9837 15.3846 22.8072 24.0 2.241
2.1615 5.9098 9500 2.9952 15.6195 22.7972 24.0 2.238
2.1599 6.2208 10000 2.9797 15.8393 22.9491 25.667 2.27
2.1647 6.5319 10500 2.9705 15.6556 22.8182 25.333 2.274
2.1441 6.8429 11000 2.9618 15.644 22.975 24.667 2.256
2.1191 7.1540 11500 2.9723 15.3876 22.8172 24.667 2.256
2.1249 7.4650 12000 2.9505 15.5752 22.8751 25.333 2.258
2.1302 7.7760 12500 2.9474 15.3248 23.037 25.333 2.298
2.1044 8.0871 13000 2.9539 15.4191 22.8242 24.333 2.279
2.0993 8.3981 13500 2.9420 15.9856 23.2537 25.333 2.294
2.1002 8.7092 14000 2.9409 15.5383 23.018 25.333 2.312
2.0892 9.0202 14500 2.9431 15.7347 22.965 26.0 2.317
2.0602 9.3313 15000 2.9369 15.7065 23.2907 25.333 2.297
2.0661 9.6423 15500 2.9426 15.5456 23.1429 26.0 2.333
2.077 9.9533 16000 2.9398 15.7638 23.024 25.667 2.345
2.045 10.2644 16500 2.9482 15.6033 22.8651 25.333 2.302
2.0559 10.5754 17000 2.9469 15.852 22.971 25.0 2.346
2.045 10.8865 17500 2.9362 15.7729 22.8661 25.333 2.323
2.0293 11.1975 18000 2.9130 16.2118 23.1898 25.333 2.362
2.0403 11.5086 18500 2.9211 16.3007 23.2837 25.667 2.374
2.0301 11.8196 19000 2.9190 16.5118 23.3187 25.333 2.385
2.0197 12.1306 19500 2.9139 16.4862 23.1838 25.667 2.366
1.9903 12.4417 20000 2.9099 16.6206 23.3397 26.333 2.394
2.0236 12.7527 20500 2.9099 16.3161 23.1908 25.0 2.364
1.9977 13.0638 21000 2.9182 16.4259 23.2088 23.333 2.383
1.9883 13.3748 21500 2.9033 16.5355 23.0579 25.667 2.351
2.0018 13.6858 22000 2.9034 16.5309 23.1039 25.667 2.38
1.9869 13.9969 22500 2.8894 16.1481 23.3407 24.667 2.382
1.9649 14.3079 23000 2.9003 16.4894 23.3756 25.333 2.391
1.9665 14.6190 23500 2.8996 16.6273 23.2757 25.333 2.39
1.9664 14.9300 24000 2.8856 16.5667 23.2997 24.333 2.366
1.964 15.2411 24500 2.8963 16.5836 22.986 24.0 2.373
1.9455 15.5521 25000 2.8765 16.5123 23.4565 24.667 2.395
1.9616 15.8631 25500 2.8802 16.1826 23.0619 25.0 2.381
1.9526 16.1742 26000 2.8844 16.4702 23.4845 24.0 2.369
1.9282 16.4852 26500 2.8865 16.8511 23.2947 25.0 2.383
1.9518 16.7963 27000 2.8779 16.5658 23.3437 24.0 2.374
1.9342 17.1073 27500 2.8745 16.4818 23.3477 25.0 2.367
1.933 17.4184 28000 2.8753 16.5922 23.0749 25.0 2.376
1.924 17.7294 28500 2.8700 16.6258 23.1059 24.667 2.395
1.9317 18.0404 29000 2.8742 17.0658 23.4775 24.0 2.37
1.9069 18.3515 29500 2.8713 16.8956 23.2737 24.333 2.428
1.9172 18.6625 30000 2.8586 16.9523 23.3197 25.0 2.363
1.9174 18.9736 30500 2.8368 17.1978 23.5864 24.667 2.398
1.897 19.2846 31000 2.8538 17.3551 23.4456 25.0 2.388
1.9115 19.5956 31500 2.8569 16.7605 23.3187 24.667 2.4
1.8882 19.9067 32000 2.8524 17.0631 23.3137 24.667 2.423
1.8862 20.2177 32500 2.8410 17.3506 23.4525 25.0 2.4
1.8974 20.5288 33000 2.8373 17.1207 23.3756 23.333 2.421
1.899 20.8398 33500 2.8498 17.2751 23.4176 24.333 2.419
1.8887 21.1509 34000 2.8350 17.1577 23.6074 25.333 2.421
1.8772 21.4619 34500 2.8412 16.9793 23.2977 25.667 2.396
1.8709 21.7729 35000 2.8457 17.0508 23.5754 24.333 2.42
1.8666 22.0840 35500 2.8340 17.0809 23.3616 24.0 2.422
1.8729 22.3950 36000 2.8410 17.0623 23.4835 24.667 2.383
1.8826 22.7061 36500 2.8572 17.1967 23.2857 24.333 2.399
1.8643 23.0171 37000 2.8530 16.8548 23.4236 24.333 2.394
1.845 23.3281 37500 2.8371 17.2153 23.4346 24.333 2.381
1.859 23.6392 38000 2.8478 17.1054 23.2208 25.0 2.379
1.854 23.9502 38500 2.8382 17.4409 23.6643 24.667 2.347
1.8375 24.2613 39000 2.8333 17.6097 23.6973 24.0 2.395
1.8552 24.5723 39500 2.8248 17.7811 23.6823 24.333 2.408
1.8443 24.8834 40000 2.8306 17.1533 23.2138 23.333 2.38
1.836 25.1944 40500 2.8330 17.2345 23.5155 23.667 2.366
1.8437 25.5054 41000 2.8244 17.6586 23.4086 24.333 2.392
1.8402 25.8165 41500 2.8249 17.3532 23.6424 24.667 2.344
1.8182 26.1275 42000 2.8298 17.5388 23.5704 22.667 2.324
1.8293 26.4386 42500 2.8203 17.3617 23.3726 24.0 2.328
1.8408 26.7496 43000 2.8146 17.294 23.3946 24.333 2.34
1.8228 27.0607 43500 2.8102 17.5626 23.4156 24.667 2.308
1.8321 27.3717 44000 2.8370 17.5569 23.4196 25.0 2.36
1.8123 27.6827 44500 2.8256 17.3386 23.6464 24.667 2.343
1.8277 27.9938 45000 2.8216 17.6806 23.7652 23.667 2.325
1.8002 28.3048 45500 2.8217 17.3995 23.5834 23.0 2.319
1.8018 28.6159 46000 2.8145 17.2068 23.4466 23.667 2.336
1.8145 28.9269 46500 2.8261 17.3871 23.4675 22.333 2.362
1.8109 29.2379 47000 2.8277 17.4646 23.4515 23.333 2.318
1.7886 29.5490 47500 2.8132 17.3751 23.5714 21.667 2.335
1.795 29.8600 48000 2.8155 17.6465 23.5534 23.333 2.322
1.7832 30.1711 48500 2.8150 17.6549 23.4615 23.667 2.306
1.7974 30.4821 49000 2.8102 17.3266 23.4336 25.667 2.314
1.7962 30.7932 49500 2.7995 17.8548 23.5145 22.333 2.293
1.7842 31.1042 50000 2.8146 17.745 23.5235 21.667 2.308
1.7769 31.4152 50500 2.8023 17.8384 23.3616 23.333 2.316
1.7971 31.7263 51000 2.8043 18.0606 23.6454 24.333 2.32
1.7746 32.0373 51500 2.8176 17.7628 23.4745 25.0 2.33
1.7735 32.3484 52000 2.8082 17.931 23.5115 24.333 2.314
1.7825 32.6594 52500 2.7981 17.6958 23.5844 25.333 2.311
1.7872 32.9705 53000 2.8009 17.8739 23.6484 24.667 2.31
1.7652 33.2815 53500 2.8003 17.9319 23.6533 23.667 2.315
1.7699 33.5925 54000 2.8008 17.8419 23.6793 23.333 2.316
1.7431 33.9036 54500 2.7819 17.7766 23.6354 24.333 2.348
1.7523 34.2146 55000 2.8095 18.3728 23.8182 23.667 2.319
1.7466 34.5257 55500 2.7906 18.2066 23.6823 24.333 2.319
1.7626 34.8367 56000 2.7825 17.9383 23.4605 24.333 2.311
1.7403 35.1477 56500 2.8030 17.7189 23.5035 23.667 2.295
1.7531 35.4588 57000 2.8020 17.7917 23.6184 22.667 2.293
1.7525 35.7698 57500 2.7739 17.8031 23.4805 23.667 2.269
1.7511 36.0809 58000 2.7933 18.267 23.4895 23.333 2.297
1.7316 36.3919 58500 2.7853 18.3131 23.6014 24.667 2.322
1.766 36.7030 59000 2.7798 17.9374 23.6703 24.0 2.307
1.7258 37.0140 59500 2.7869 18.055 23.7522 25.333 2.275
1.7434 37.3250 60000 2.7764 18.1332 23.8412 23.667 2.264
1.7489 37.6361 60500 2.7782 18.0713 23.5355 23.0 2.292
1.7386 37.9471 61000 2.7788 18.3528 23.6953 23.667 2.26
1.727 38.2582 61500 2.7905 17.9954 23.5934 23.333 2.297
1.7318 38.5692 62000 2.7829 18.3854 23.7882 24.0 2.289
1.734 38.8802 62500 2.7808 17.945 23.4905 23.667 2.277
1.725 39.1913 63000 2.7804 17.8555 23.5964 23.667 2.285
1.7396 39.5023 63500 2.7892 17.9859 23.5415 24.0 2.237
1.7396 39.8134 64000 2.7794 18.1683 23.4446 23.333 2.269

Framework versions

  • Transformers 4.44.1
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for taehyunzzz/moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth0.75-cbl1e-04-ncs1e-02

Finetuned
this model

Dataset used to train taehyunzzz/moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth0.75-cbl1e-04-ncs1e-02

Evaluation results