Edit model card

TURNA_TSATweets_cond_gen_no_instruction

This model is a fine-tuned version of google/mt5-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0710
  • Rouge1: 0.708
  • Rouge2: 0.094
  • Rougel: 0.709
  • Rougelsum: 0.708
  • Bleu: 0.0
  • Precisions: [0.709, 0.0, 0.0, 0.0]
  • Brevity Penalty: 1.0
  • Length Ratio: 1.0
  • Translation Length: 1000
  • Reference Length: 1000
  • Meteor: 0.3545
  • Score: 29.1000
  • Num Edits: 291
  • Ref Length: 1000.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Bleu Precisions Brevity Penalty Length Ratio Translation Length Reference Length Meteor Score Num Edits Ref Length
No log 0.5 82 0.1335 0.2707 0.0 0.2707 0.2707 0.0 [0.2706896551724138, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.1353 72.9310 423 580.0
2.9168 1.0 164 0.0903 0.6172 0.0017 0.6190 0.6155 0.0 [0.6172413793103448, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3086 38.2759 222 580.0
2.9168 1.5 246 0.0968 0.6310 0.0 0.6328 0.6310 0.0 [0.6310344827586207, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3155 36.8966 214 580.0
0.1116 2.0 328 0.0769 0.6586 0.0328 0.6603 0.6603 0.0 [0.6603448275862069, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3302 33.9655 197 580.0
0.1116 2.5 410 0.0762 0.6931 0.0707 0.6931 0.6931 0.0 [0.6931034482758621, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3466 30.6897 178 580.0
0.0921 3.0 492 0.0709 0.6931 0.0276 0.6931 0.6931 0.0 [0.6931034482758621, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3466 30.6897 178 580.0
0.0921 3.5 574 0.0897 0.6897 0.0379 0.6897 0.6897 0.0 [0.6896551724137931, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3448 31.0345 180 580.0
0.079 4.0 656 0.0679 0.6931 0.0707 0.6948 0.6931 0.0 [0.6948275862068966, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3474 30.5172 177 580.0
0.079 4.5 738 0.0771 0.7103 0.0345 0.7103 0.7103 0.0 [0.7103448275862069, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3552 28.9655 168 580.0
0.0712 5.0 820 0.0675 0.7043 0.0517 0.7034 0.7034 0.0 [0.7051724137931035, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3526 29.4828 171 580.0
0.0712 5.5 902 0.0657 0.7138 0.0603 0.7138 0.7138 0.0 [0.7137931034482758, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3569 28.6207 166 580.0
0.065 6.0 984 0.0670 0.7052 0.0621 0.7069 0.7069 0.0 [0.7068965517241379, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3534 29.3103 170 580.0
0.065 6.5 1066 0.0658 0.7086 0.0672 0.7103 0.7086 0.0 [0.7103448275862069, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3552 28.9655 168 580.0
0.0596 7.0 1148 0.0741 0.7138 0.0586 0.7155 0.7155 0.0 [0.7155172413793104, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3578 28.4483 165 580.0

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1

Citation Information

Uludoğan, G., Balal, Z. Y., Akkurt, F., Türker, M., Güngör, O., & Üsküdarlı, S. (2024).
Turna: A turkish encoder-decoder language model for enhanced understanding and generation. arXiv preprint arXiv:2401.14373.
Downloads last month
4
Safetensors
Model size
1.23B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Holmeister/TURNA_TSATweets_cond_gen_no_instruction

Base model

google/mt5-large
Finetuned
(41)
this model