Holmeister's picture
Update README.md
a88cb5b verified
metadata
license: other
tags:
  - generated_from_trainer
base_model: boun-tabi-LMG/TURNA
metrics:
  - rouge
  - bleu
model-index:
  - name: TURNA_spell_correction_product_search
    results: []

TURNA_spell_correction_product_search

This model is a fine-tuned version of boun-tabi-LMG/TURNA on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1088
  • Rouge1: 0.8437
  • Rouge2: 0.7401
  • Rougel: 0.8435
  • Rougelsum: 0.8437
  • Bleu: 0.8713
  • Precisions: [0.8736109932988378, 0.8306083370157608, 0.8473118279569892, 0.9631336405529954]
  • Brevity Penalty: 0.9932
  • Length Ratio: 0.9933
  • Translation Length: 11789
  • Reference Length: 11869
  • Meteor: 0.7484
  • Score: 14.6658
  • Num Edits: 1709
  • Ref Length: 11653.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Bleu Precisions Brevity Penalty Length Ratio Translation Length Reference Length Meteor Score Num Edits Ref Length
No log 0.3335 1253 0.2447 0.7099 0.5537 0.7097 0.7097 0.7033 [0.7548184082863512, 0.6378386771213415, 0.6301176470588236, 0.906832298136646] 0.9711 0.9715 22881 23553 0.5852 27.8452 6474 23250.0
No log 0.6670 2506 0.1886 0.7586 0.6231 0.7584 0.7585 0.7555 [0.7995148154565933, 0.7114032405992051, 0.7142528735632184, 0.8698224852071006] 0.9799 0.9801 23084 23553 0.6454 22.8860 5321 23250.0
0.3827 1.0005 3759 0.1571 0.7810 0.6561 0.7807 0.7809 0.7947 [0.8183424557169332, 0.7418011058092858, 0.7430269775948788, 0.939297124600639] 0.9850 0.9851 23203 23553 0.6742 20.6710 4806 23250.0
0.3827 1.3340 5012 0.1458 0.7973 0.6822 0.7973 0.7974 0.8139 [0.8318891557995882, 0.7666015625, 0.7682954289574421, 0.9333333333333333] 0.9897 0.9898 23312 23553 0.6955 19.1441 4451 23250.0
0.3827 1.6676 6265 0.1320 0.8109 0.6993 0.8107 0.8111 0.8294 [0.8467426359922597, 0.7889852885703508, 0.788783355947535, 0.9453376205787781] 0.9873 0.9873 23255 23553 0.7111 17.6258 4098 23250.0
0.1238 2.0011 7518 0.1218 0.8205 0.7139 0.8205 0.8206 0.8462 [0.8559577028885832, 0.8045084439083233, 0.8144353369763205, 0.9607843137254902] 0.9877 0.9877 23264 23553 0.7231 16.5720 3853 23250.0
0.1238 2.3346 8771 0.1223 0.8246 0.7219 0.8247 0.8249 0.8506 [0.8575583882282488, 0.8074450590521752, 0.8080267558528428, 0.9639344262295082] 0.9925 0.9926 23378 23553 0.7298 16.1978 3766 23250.0
0.1238 2.6681 10024 0.1177 0.8319 0.7326 0.8320 0.8321 0.8580 [0.8628791114908159, 0.8155853840417598, 0.8160765976397238, 0.9671052631578947] 0.9939 0.9939 23410 23553 0.7379 15.6602 3641 23250.0
0.0686 3.0016 11277 0.1122 0.8388 0.7400 0.8391 0.8391 0.8623 [0.8686514886164624, 0.8236522257848036, 0.8239625167336011, 0.9607843137254902] 0.9940 0.9940 23411 23553 0.7462 15.0237 3493 23250.0
0.0686 3.3351 12530 0.1184 0.8398 0.7450 0.8397 0.8398 0.8682 [0.8676339190741608, 0.8243353328889876, 0.8229854689564069, 0.9735099337748344] 0.9979 0.9979 23503 23553 0.7488 14.9677 3480 23250.0
0.0686 3.6686 13783 0.1148 0.8440 0.7484 0.8441 0.8442 0.8716 [0.8706277359853798, 0.8271121294995935, 0.826640333552776, 0.9735099337748344] 0.9990 0.9990 23529 23553 0.7533 14.5806 3390 23250.0
0.0383 4.0021 15036 0.1134 0.8498 0.7547 0.8498 0.8500 0.8750 [0.8757069354084279, 0.8344307168750462, 0.8344676180021954, 0.9671052631578947] 0.9985 0.9985 23517 23553 0.7592 14.0516 3267 23250.0

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.1+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1

Citation Information

Uludoğan, G., Balal, Z. Y., Akkurt, F., Türker, M., Güngör, O., & Üsküdarlı, S. (2024).
Turna: A turkish encoder-decoder language model for enhanced understanding and generation. arXiv preprint arXiv:2401.14373.