Edit model card

moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth0.5-cbl1e-04-ncs1e-02

This model is a fine-tuned version of google/switch-base-32 on the wmt16 tr-en dataset. It achieves the following results on the evaluation set:

  • Loss: 2.7536
  • Bleu: 18.4585
  • Gen Len: 23.6054
  • Num Effective Experts: 29.333
  • Num Experts Activated: 3.383

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 40.0

Training results

Training Loss Epoch Step Bleu Gen Len Validation Loss Num Effective Experts Num Experts Activated
No log 0 0 8.0546 28.7393 3.4501 29.667 1.926
2.8497 0.3110 500 14.0285 22.2517 3.0874 17.667 2.13
2.6592 0.6221 1000 14.2469 22.0679 3.0750 23.0 2.442
2.5976 0.9331 1500 14.535 22.2837 3.0424 23.0 2.646
2.5142 1.2442 2000 14.3825 22.3497 3.0336 24.0 2.807
2.4613 1.5552 2500 14.631 22.3576 3.0406 25.0 2.893
2.4171 1.8663 3000 14.5814 22.5285 3.0402 25.333 2.995
2.4059 2.0 3215 3.0356 14.7799 22.5255 25.667 3.105
2.3908 2.1779 3500 3.0245 14.6185 22.6484 26.0 3.086
2.3519 2.4890 4000 3.0352 14.6469 22.3227 27.0 3.18
2.3428 2.8 4500 3.0238 14.7843 22.5045 27.333 3.163
2.3126 3.1110 5000 3.0022 15.2789 22.8242 27.333 3.224
2.2948 3.4221 5500 3.0188 15.0425 22.8142 27.333 3.277
2.2811 3.7331 6000 3.0076 14.8579 22.7123 28.667 3.247
2.2606 4.0442 6500 2.9833 15.3554 22.7183 28.667 3.334
2.2591 4.3552 7000 2.9885 14.954 22.7632 28.333 3.258
2.2363 4.6663 7500 2.9742 15.2251 22.8711 29.333 3.285
2.2115 4.9773 8000 2.9904 15.4151 22.8052 28.667 3.348
2.1871 5.2883 8500 2.9504 15.248 22.8312 29.667 3.365
2.1874 5.5994 9000 2.9649 15.3555 22.8402 28.667 3.356
2.1614 5.9104 9500 2.9720 15.6983 22.8941 28.667 3.34
2.1576 6.2215 10000 2.9532 15.542 22.8511 29.333 3.402
2.1597 6.5325 10500 2.9582 16.0016 23.0799 28.667 3.357
2.1467 6.8435 11000 2.9514 15.6646 23.012 27.667 3.259
2.1208 7.1546 11500 2.9469 15.7848 22.8621 27.667 3.323
2.1223 7.4656 12000 2.9436 15.8058 23.0799 28.667 3.515
2.126 7.7767 12500 2.9229 15.9607 23.1678 26.667 3.336
2.1006 8.0877 13000 2.9303 15.9553 22.8052 28.667 3.392
2.0946 8.3988 13500 2.9237 16.0432 23.2068 28.667 3.326
2.1015 8.7098 14000 2.9190 16.2694 23.3087 27.0 3.354
2.0846 9.0208 14500 2.9171 16.0638 22.974 27.667 3.292
2.0549 9.3319 15000 2.9350 15.9644 23.2038 27.333 3.354
2.062 9.6429 15500 2.9211 16.2404 23.029 28.667 3.454
2.0727 9.9540 16000 2.9132 16.7013 23.0619 27.667 3.438
2.0406 10.2650 16500 2.9238 16.3424 22.8501 27.333 3.456
2.0545 10.5760 17000 2.9277 16.317 23.1209 27.333 3.441
2.0419 10.8871 17500 2.9143 16.5313 23.3756 28.333 3.555
2.023 11.1981 18000 2.9177 16.2833 23.0989 29.0 3.58
2.0392 11.5092 18500 2.8999 16.095 23.047 27.333 3.441
2.0266 11.8202 19000 2.9097 16.2376 23.2787 28.667 3.526
2.02 12.1313 19500 2.8855 16.4612 23.3257 27.667 3.498
1.9934 12.4423 20000 2.8914 16.6401 23.2827 28.0 3.473
2.0178 12.7533 20500 2.8991 16.6445 23.1888 28.0 3.494
1.994 13.0644 21000 2.9001 16.3065 23.2597 28.667 3.516
1.9857 13.3754 21500 2.8889 16.3074 23.2617 27.667 3.489
2.0016 13.6865 22000 2.8787 16.724 23.1668 29.0 3.624
1.9855 13.9975 22500 2.8919 16.2858 23.1868 28.333 3.465
1.9638 14.3086 23000 2.8753 16.5986 23.2448 28.333 3.588
1.97 14.6196 23500 2.8730 16.4952 23.4396 28.0 3.468
1.9707 14.9306 24000 2.8766 16.9244 23.2697 27.667 3.524
1.9606 15.2417 24500 2.8732 16.683 23.2877 27.0 3.47
1.9464 15.5527 25000 2.8698 16.8571 23.3976 29.0 3.55
1.9599 15.8638 25500 2.8661 16.4797 23.2218 30.0 3.524
1.9541 16.1748 26000 2.8707 16.552 23.3696 29.667 3.417
1.9183 16.4858 26500 2.8647 16.6879 23.2837 29.667 3.648
1.9407 16.7969 27000 2.8746 16.6718 23.3726 29.0 3.673
1.9294 17.1079 27500 2.8588 16.6616 23.3207 28.667 3.619
1.926 17.4190 28000 2.8629 16.9872 23.3676 29.667 3.451
1.9245 17.7300 28500 2.8617 16.824 23.2857 29.333 3.507
1.9245 18.0411 29000 2.8534 17.071 23.7123 30.0 3.762
1.9081 18.3521 29500 2.8577 16.8301 23.2787 30.0 3.366
1.9178 18.6631 30000 2.8601 16.8287 23.4975 29.333 3.659
1.9147 18.9742 30500 2.8473 16.5578 23.2218 29.667 3.626
1.8953 19.2852 31000 2.8395 17.0523 23.4336 30.0 3.614
1.9098 19.5963 31500 2.8481 16.8686 23.2947 30.0 3.663
1.8851 19.9073 32000 2.8521 17.0517 23.4965 29.667 3.446
1.8846 20.2184 32500 2.8483 16.9526 23.4006 29.0 3.678
1.8953 20.5294 33000 2.8339 17.2873 23.6274 30.0 3.547
1.897 20.8404 33500 2.8412 16.8259 23.5485 29.667 3.53
1.8832 21.1515 34000 2.8292 17.4103 23.6783 29.333 3.534
1.8677 21.4625 34500 2.8387 17.2259 23.7123 29.333 3.431
1.8663 21.7736 35000 2.8319 17.382 23.5135 29.333 3.402
1.8539 22.0846 35500 2.8361 17.0419 23.4066 30.0 3.382
1.8635 22.3956 36000 2.8262 17.5244 23.7063 29.0 3.496
1.8747 22.7067 36500 2.8209 17.5117 23.5215 29.667 3.708
1.8605 23.0177 37000 2.8253 17.1874 23.3896 29.333 3.656
1.8425 23.3288 37500 2.8302 17.3545 23.4875 30.0 3.521
1.8533 23.6398 38000 2.8234 17.1979 23.4206 30.0 3.456
1.8464 23.9509 38500 2.8211 16.9943 23.4456 29.667 3.536
1.8332 24.2619 39000 2.8181 17.3187 23.6873 29.0 3.465
1.8493 24.5729 39500 2.8120 17.296 23.7353 28.333 3.542
1.8397 24.8840 40000 2.8102 17.379 23.3636 29.0 3.514
1.8275 25.1950 40500 2.8154 17.3565 23.3516 30.667 3.446
1.8333 25.5061 41000 2.8082 17.677 23.6903 30.667 3.709
1.836 25.8171 41500 2.8210 17.185 23.4735 29.0 3.564
1.8156 26.1281 42000 2.8040 17.6055 23.7073 29.667 3.575
1.8256 26.4392 42500 2.8325 17.4965 23.4545 29.667 3.825
1.8366 26.7502 43000 2.8012 17.6267 23.6014 29.667 3.501
1.8261 27.0613 43500 2.7957 17.563 23.6683 29.667 3.619
1.8321 27.3723 44000 2.8064 17.1074 23.2478 28.0 3.474
1.8068 27.6834 44500 2.7995 17.3519 23.4246 29.333 3.648
1.8236 27.9944 45000 2.7992 17.6658 23.5215 28.333 3.365
1.7985 28.3054 45500 2.7919 17.8624 23.6414 30.333 3.594
1.7963 28.6165 46000 2.7959 17.3935 23.6673 29.0 3.416
1.809 28.9275 46500 2.7913 17.3222 23.4246 29.667 3.529
1.8088 29.2386 47000 2.7905 17.6733 23.5844 29.333 3.471
1.7866 29.5496 47500 2.7915 17.389 23.4825 30.0 3.589
1.7928 29.8607 48000 2.7874 17.4279 23.3377 30.0 3.579
1.7741 30.1717 48500 2.7966 17.5687 23.3846 30.667 3.626
1.7912 30.4827 49000 2.7832 17.6559 23.4486 29.667 3.519
1.7912 30.7938 49500 2.7882 17.6686 23.4615 29.333 3.628
1.7837 31.1048 50000 2.7795 18.038 23.6943 29.333 3.534
1.77 31.4159 50500 2.7806 18.1033 23.7353 29.0 3.548
1.7954 31.7269 51000 2.7851 17.9456 23.7393 30.0 3.479
1.7741 32.0379 51500 2.7926 17.886 23.3526 30.0 3.586
1.7727 32.3490 52000 2.7772 18.0568 23.8372 29.0 3.696
1.7764 32.6600 52500 2.7746 17.9756 23.6424 29.0 3.594
1.7812 32.9711 53000 2.7746 18.1688 23.6653 30.667 3.682
1.7589 33.2821 53500 2.7804 17.8214 23.6813 29.333 3.537
1.7703 33.5932 54000 2.7704 17.869 23.6374 30.0 3.581
1.7412 33.9042 54500 2.7820 17.6582 23.4785 29.333 3.784
1.7552 34.2152 55000 2.7857 17.5605 23.4785 30.0 3.589
1.7442 34.5263 55500 2.7829 17.6995 23.2967 30.333 3.602
1.7582 34.8373 56000 2.7603 18.1319 23.5514 30.333 3.648
1.7345 35.1484 56500 2.7730 17.7749 23.5614 30.0 3.359
1.7509 35.4594 57000 2.7730 17.9115 23.7283 29.667 3.451
1.742 35.7705 57500 2.7585 17.6428 23.3806 30.0 3.683
1.7434 36.0815 58000 2.7671 18.1606 23.5804 30.667 3.462
1.7303 36.3925 58500 2.7652 18.2692 23.5914 30.0 3.563
1.7647 36.7036 59000 2.7637 18.0115 23.7423 30.0 3.446
1.7257 37.0146 59500 2.7775 18.196 23.7582 29.333 3.668
1.7377 37.3257 60000 2.7622 18.0064 23.8352 29.667 3.367
1.7501 37.6367 60500 2.7513 17.9934 23.7982 29.0 3.603
1.7372 37.9477 61000 2.7645 18.1485 23.6663 29.667 3.456
1.727 38.2588 61500 2.7651 18.1881 23.3836 28.667 3.512
1.732 38.5698 62000 2.7565 18.0689 23.6633 30.0 3.463
1.7332 38.8809 62500 2.7521 18.1615 23.6364 30.667 3.545
1.7221 39.1919 63000 2.7601 18.2152 23.6853 29.0 3.607
1.7322 39.5030 63500 2.7604 18.2851 23.4905 29.667 3.508
1.7298 39.8140 64000 2.7443 18.3403 23.7602 29.0 3.478

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
1B params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for taehyunzzz/moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth0.5-cbl1e-04-ncs1e-02

Finetuned
this model

Dataset used to train taehyunzzz/moe-32-wmt16-tr-en-route-pooling-ba128-lr1e-04-cth0.5-cbl1e-04-ncs1e-02

Evaluation results