TURNA_TSATweets_cond_gen_no_instruction

This model is a fine-tuned version of google/mt5-large on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0710
Rouge1: 0.708
Rouge2: 0.094
Rougel: 0.709
Rougelsum: 0.708
Bleu: 0.0
Precisions: [0.709, 0.0, 0.0, 0.0]
Brevity Penalty: 1.0
Length Ratio: 1.0
Translation Length: 1000
Reference Length: 1000
Meteor: 0.3545
Score: 29.1000
Num Edits: 291
Ref Length: 1000.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Precisions	Brevity Penalty	Length Ratio	Translation Length	Reference Length	Meteor	Score	Num Edits	Ref Length
No log	0.5	82	0.1335	0.2707	0.0	0.2707	0.2707	[0.2706896551724138, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.1353	72.9310	423	580.0
2.9168	1.0	164	0.0903	0.6172	0.0017	0.6190	0.6155	[0.6172413793103448, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.3086	38.2759	222	580.0
2.9168	1.5	246	0.0968	0.6310	0.0	0.6328	0.6310	[0.6310344827586207, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.3155	36.8966	214	580.0
0.1116	2.0	328	0.0769	0.6586	0.0328	0.6603	0.6603	[0.6603448275862069, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.3302	33.9655	197	580.0
0.1116	2.5	410	0.0762	0.6931	0.0707	0.6931	0.6931	[0.6931034482758621, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.3466	30.6897	178	580.0
0.0921	3.0	492	0.0709	0.6931	0.0276	0.6931	0.6931	[0.6931034482758621, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.3466	30.6897	178	580.0
0.0921	3.5	574	0.0897	0.6897	0.0379	0.6897	0.6897	[0.6896551724137931, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.3448	31.0345	180	580.0
0.079	4.0	656	0.0679	0.6931	0.0707	0.6948	0.6931	[0.6948275862068966, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.3474	30.5172	177	580.0
0.079	4.5	738	0.0771	0.7103	0.0345	0.7103	0.7103	[0.7103448275862069, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.3552	28.9655	168	580.0
0.0712	5.0	820	0.0675	0.7043	0.0517	0.7034	0.7034	[0.7051724137931035, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.3526	29.4828	171	580.0
0.0712	5.5	902	0.0657	0.7138	0.0603	0.7138	0.7138	[0.7137931034482758, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.3569	28.6207	166	580.0
0.065	6.0	984	0.0670	0.7052	0.0621	0.7069	0.7069	[0.7068965517241379, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.3534	29.3103	170	580.0
0.065	6.5	1066	0.0658	0.7086	0.0672	0.7103	0.7086	[0.7103448275862069, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.3552	28.9655	168	580.0
0.0596	7.0	1148	0.0741	0.7138	0.0586	0.7155	0.7155	[0.7155172413793104, 0.0, 0.0, 0.0]	1.0	1.0	580	580	0.3578	28.4483	165	580.0

Framework versions

Transformers 4.40.2
Pytorch 2.2.1+cu121
Datasets 2.19.1
Tokenizers 0.19.1

Citation Information

Uludoğan, G., Balal, Z. Y., Akkurt, F., Türker, M., Güngör, O., & Üsküdarlı, S. (2024).
Turna: A turkish encoder-decoder language model for enhanced understanding and generation. arXiv preprint arXiv:2401.14373.

Holmeister
/

TURNA_TSATweets_cond_gen_no_instruction

TURNA_TSATweets_cond_gen_no_instruction

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Citation Information

Model tree for Holmeister/TURNA_TSATweets_cond_gen_no_instruction

Evaluation results