mt5_small_lg_inf_en_v1

This model is a fine-tuned version of MubarakB/mt5_small_lg_en on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.4187
Bleu: 0.2171
Gen Len: 9.0204

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
No log	1.0	138	0.4669	0.0658	9.3837
No log	2.0	276	0.4559	0.132	8.0245
No log	3.0	414	0.4507	0.2112	8.1592
0.4726	4.0	552	0.4472	0.2144	8.0367
0.4726	5.0	690	0.4445	0.2134	8.0082
0.4726	6.0	828	0.4425	0.3274	7.8612
0.4726	7.0	966	0.4405	0.3378	7.5959
0.447	8.0	1104	0.4390	0.3304	7.3918
0.447	9.0	1242	0.4378	0.3285	7.3673
0.447	10.0	1380	0.4362	0.3147	7.6694
0.4398	11.0	1518	0.4350	0.3181	7.4163
0.4398	12.0	1656	0.4341	0.3166	7.5224
0.4398	13.0	1794	0.4330	0.3178	7.5592
0.4398	14.0	1932	0.4318	0.2157	7.8204
0.4313	15.0	2070	0.4312	0.3169	8.1388
0.4313	16.0	2208	0.4307	0.3169	7.9633
0.4313	17.0	2346	0.4297	0.3064	8.2245
0.4313	18.0	2484	0.4293	0.2045	8.2776
0.4262	19.0	2622	0.4286	0.3027	8.4367
0.4262	20.0	2760	0.4280	0.2042	8.5061
0.4262	21.0	2898	0.4274	0.3033	8.5633
0.4214	22.0	3036	0.4272	0.3019	8.7714
0.4214	23.0	3174	0.4264	0.3051	8.649
0.4214	24.0	3312	0.4263	0.3021	8.8367
0.4214	25.0	3450	0.4254	0.2981	8.8204
0.4161	26.0	3588	0.4251	0.2992	8.8776
0.4161	27.0	3726	0.4248	0.3044	8.8571
0.4161	28.0	3864	0.4246	0.3	8.8776
0.4124	29.0	4002	0.4246	0.2998	8.8163
0.4124	30.0	4140	0.4239	0.2983	9.0857
0.4124	31.0	4278	0.4234	0.2988	9.0163
0.4124	32.0	4416	0.4233	0.2996	8.8816
0.4087	33.0	4554	0.4232	0.298	8.9714
0.4087	34.0	4692	0.4226	0.3003	8.9796
0.4087	35.0	4830	0.4224	0.2992	9.1796
0.4087	36.0	4968	0.4225	0.3005	9.0571
0.4053	37.0	5106	0.4224	0.2994	8.8571
0.4053	38.0	5244	0.4220	0.3	9.1143
0.4053	39.0	5382	0.4216	0.3019	9.102
0.4006	40.0	5520	0.4215	0.3016	8.9714
0.4006	41.0	5658	0.4212	0.3011	8.9224
0.4006	42.0	5796	0.4211	0.2982	9.2816
0.4006	43.0	5934	0.4210	0.2985	9.1633
0.3986	44.0	6072	0.4210	0.2994	9.0776
0.3986	45.0	6210	0.4209	0.308	9.3265
0.3986	46.0	6348	0.4208	0.2963	9.1714
0.3986	47.0	6486	0.4205	0.3093	9.0531
0.3953	48.0	6624	0.4205	0.3068	9.4449
0.3953	49.0	6762	0.4202	0.3075	8.9918
0.3953	50.0	6900	0.4203	0.3071	9.1306
0.3929	51.0	7038	0.4200	0.3052	9.3143
0.3929	52.0	7176	0.4200	0.306	9.1796
0.3929	53.0	7314	0.4200	0.3058	9.2204
0.3929	54.0	7452	0.4200	0.3076	8.8367
0.391	55.0	7590	0.4196	0.3078	8.8776
0.391	56.0	7728	0.4197	0.3041	9.0449
0.391	57.0	7866	0.4198	0.3041	8.8776
0.3887	58.0	8004	0.4201	0.3171	8.9224
0.3887	59.0	8142	0.4192	0.3074	9.0449
0.3887	60.0	8280	0.4197	0.318	8.8571
0.3887	61.0	8418	0.4194	0.3167	9.1469
0.3871	62.0	8556	0.4194	0.3186	8.8612
0.3871	63.0	8694	0.4192	0.3181	8.8245
0.3871	64.0	8832	0.4192	0.3178	9.0449
0.3871	65.0	8970	0.4194	0.3168	8.9673
0.3849	66.0	9108	0.4191	0.3159	8.9184
0.3849	67.0	9246	0.4192	0.3191	8.7347
0.3849	68.0	9384	0.4189	0.3173	8.8367
0.3841	69.0	9522	0.4189	0.3198	8.7633
0.3841	70.0	9660	0.4189	0.3168	8.9306
0.3841	71.0	9798	0.4187	0.3182	8.9837
0.3841	72.0	9936	0.4191	0.3179	8.9918
0.3823	73.0	10074	0.4189	0.3173	8.951
0.3823	74.0	10212	0.4188	0.3158	8.9551
0.3823	75.0	10350	0.4188	0.3184	8.9061
0.3823	76.0	10488	0.4187	0.3174	8.9347
0.3809	77.0	10626	0.4186	0.2163	9.1061
0.3809	78.0	10764	0.4189	0.2173	8.8531
0.3809	79.0	10902	0.4187	0.3156	9.0776
0.3798	80.0	11040	0.4187	0.3166	8.9796
0.3798	81.0	11178	0.4187	0.3172	8.9796
0.3798	82.0	11316	0.4187	0.3177	9.0
0.3798	83.0	11454	0.4187	0.3167	9.0204
0.3799	84.0	11592	0.4187	0.3166	8.9837
0.3799	85.0	11730	0.4187	0.3174	9.0776
0.3799	86.0	11868	0.4187	0.2174	9.1469
0.3789	87.0	12006	0.4188	0.2167	8.9143
0.3789	88.0	12144	0.4187	0.2171	9.0327
0.3789	89.0	12282	0.4187	0.217	9.0531
0.3789	90.0	12420	0.4186	0.3176	9.1102
0.378	91.0	12558	0.4186	0.3182	9.0531
0.378	92.0	12696	0.4186	0.3186	9.1102
0.378	93.0	12834	0.4187	0.2177	9.0163
0.378	94.0	12972	0.4187	0.2172	9.0204
0.3768	95.0	13110	0.4186	0.2171	9.0204
0.3768	96.0	13248	0.4186	0.2171	9.0367
0.3768	97.0	13386	0.4187	0.2173	8.9959
0.3769	98.0	13524	0.4187	0.2172	8.9959
0.3769	99.0	13662	0.4187	0.2172	9.0
0.3769	100.0	13800	0.4187	0.2171	9.0204

Framework versions

Transformers 4.45.1
Pytorch 2.4.0
Datasets 3.0.1
Tokenizers 0.20.0

MubarakB
/

mt5_small_lg_inf_en_v1

mt5_small_lg_inf_en_v1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for MubarakB/mt5_small_lg_inf_en_v1

Evaluation results