t5-small_6_3-hi_en-to-en

This model was trained from scratch on the cmu_hinglish_dog dataset. It achieves the following results on the evaluation set:

Loss: 2.3662
Bleu: 18.0863
Gen Len: 15.2708

Model description

Model generated using:
python make_student.py t5-small t5_small_6_3 6 3
Check this link for more information.

Intended uses & limitations

More information needed

Training and evaluation data

Used cmu_hinglish_dog dataset. Please check this link for dataset description

Translation:

Source: hi_en: The text in Hinglish
Target: en: The text in English

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
No log	1.0	126	3.0601	4.7146	11.9904
No log	2.0	252	2.8885	5.9584	12.3418
No log	3.0	378	2.7914	6.649	12.3758
3.4671	4.0	504	2.7347	7.3305	12.3854
3.4671	5.0	630	2.6832	8.3132	12.4268
3.4671	6.0	756	2.6485	8.339	12.3641
3.4671	7.0	882	2.6096	8.7269	12.414
3.0208	8.0	1008	2.5814	9.2163	12.2675
3.0208	9.0	1134	2.5542	9.448	12.3875
3.0208	10.0	1260	2.5339	9.9011	12.4321
3.0208	11.0	1386	2.5043	9.7529	12.5149
2.834	12.0	1512	2.4848	9.9606	12.4193
2.834	13.0	1638	2.4737	9.9368	12.3673
2.834	14.0	1764	2.4458	10.3182	12.4352
2.834	15.0	1890	2.4332	10.486	12.4671
2.7065	16.0	2016	2.4239	10.6921	12.414
2.7065	17.0	2142	2.4064	10.7426	12.4607
2.7065	18.0	2268	2.3941	11.0509	12.4087
2.7065	19.0	2394	2.3826	11.2407	12.3386
2.603	20.0	2520	2.3658	11.3711	12.3992
2.603	21.0	2646	2.3537	11.42	12.5032
2.603	22.0	2772	2.3475	12.0665	12.5074
2.603	23.0	2898	2.3398	12.0343	12.4342
2.5192	24.0	3024	2.3298	12.1011	12.5096
2.5192	25.0	3150	2.3216	12.2562	12.4809
2.5192	26.0	3276	2.3131	12.4585	12.4427
2.5192	27.0	3402	2.3052	12.7094	12.534
2.4445	28.0	3528	2.2984	12.7432	12.5053
2.4445	29.0	3654	2.2920	12.8409	12.4501
2.4445	30.0	3780	2.2869	12.6365	12.4936
2.4445	31.0	3906	2.2777	12.8523	12.5234
2.3844	32.0	4032	2.2788	12.9216	12.4204
2.3844	33.0	4158	2.2710	12.9568	12.5064
2.3844	34.0	4284	2.2643	12.9641	12.4299
2.3844	35.0	4410	2.2621	12.9787	12.448
2.3282	36.0	4536	2.2554	13.1264	12.4374
2.3282	37.0	4662	2.2481	13.1853	12.4416
2.3282	38.0	4788	2.2477	13.3259	12.4119
2.3282	39.0	4914	2.2448	13.2017	12.4278
2.2842	40.0	5040	2.2402	13.3772	12.4437
2.2842	41.0	5166	2.2373	13.2184	12.414
2.2842	42.0	5292	2.2357	13.5267	12.4342
2.2842	43.0	5418	2.2310	13.5754	12.4087
2.2388	44.0	5544	2.2244	13.653	12.4427
2.2388	45.0	5670	2.2243	13.6028	12.431
2.2388	46.0	5796	2.2216	13.7128	12.4151
2.2388	47.0	5922	2.2231	13.749	12.4172
2.2067	48.0	6048	2.2196	13.7256	12.4034
2.2067	49.0	6174	2.2125	13.8237	12.396
2.2067	50.0	6300	2.2131	13.6642	12.4416
2.2067	51.0	6426	2.2115	13.8876	12.4119
2.1688	52.0	6552	2.2091	14.0323	12.4639
2.1688	53.0	6678	2.2082	13.916	12.3843
2.1688	54.0	6804	2.2071	13.924	12.3758
2.1688	55.0	6930	2.2046	13.9563	12.4416
2.1401	56.0	7056	2.2020	14.0592	12.483
2.1401	57.0	7182	2.2047	13.8879	12.4076
2.1401	58.0	7308	2.2018	13.9267	12.3949
2.1401	59.0	7434	2.1964	14.0518	12.4363
2.1092	60.0	7560	2.1926	14.1518	12.4883
2.1092	61.0	7686	2.1972	14.132	12.4034
2.1092	62.0	7812	2.1939	14.2066	12.4151
2.1092	63.0	7938	2.1905	14.2923	12.4459
2.0932	64.0	8064	2.1932	14.2476	12.3418
2.0932	65.0	8190	2.1925	14.2057	12.3907
2.0932	66.0	8316	2.1906	14.2978	12.4055
2.0932	67.0	8442	2.1903	14.3276	12.4427
2.0706	68.0	8568	2.1918	14.4681	12.4034
2.0706	69.0	8694	2.1882	14.3751	12.4225
2.0706	70.0	8820	2.1870	14.5904	12.4204
2.0706	71.0	8946	2.1865	14.6409	12.4512
2.0517	72.0	9072	2.1831	14.6505	12.4352
2.0517	73.0	9198	2.1835	14.7485	12.4363
2.0517	74.0	9324	2.1824	14.7344	12.4586
2.0517	75.0	9450	2.1829	14.8097	12.4575
2.0388	76.0	9576	2.1822	14.6681	12.4108
2.0388	77.0	9702	2.1823	14.6421	12.4342
2.0388	78.0	9828	2.1816	14.7014	12.4459
2.0388	79.0	9954	2.1810	14.744	12.4565
2.0224	80.0	10080	2.1839	14.7889	12.4437
2.0224	81.0	10206	2.1793	14.802	12.4565
2.0224	82.0	10332	2.1776	14.7702	12.4214
2.0224	83.0	10458	2.1809	14.6772	12.4236
2.0115	84.0	10584	2.1786	14.709	12.4214
2.0115	85.0	10710	2.1805	14.7693	12.3981
2.0115	86.0	10836	2.1790	14.7628	12.4172
2.0115	87.0	10962	2.1785	14.7538	12.3992
2.0007	88.0	11088	2.1788	14.7493	12.3726
2.0007	89.0	11214	2.1788	14.8793	12.4045
2.0007	90.0	11340	2.1786	14.8318	12.3747
2.0007	91.0	11466	2.1769	14.8061	12.4013
1.9967	92.0	11592	2.1757	14.8108	12.3843
1.9967	93.0	11718	2.1747	14.8036	12.379
1.9967	94.0	11844	2.1764	14.7447	12.3737
1.9967	95.0	11970	2.1759	14.7759	12.3875
1.9924	96.0	12096	2.1760	14.7695	12.3875
1.9924	97.0	12222	2.1762	14.8022	12.3769
1.9924	98.0	12348	2.1763	14.7519	12.3822
1.9924	99.0	12474	2.1760	14.7756	12.3832
1.9903	100.0	12600	2.1761	14.7713	12.3822

Evaluation results

Data Split	Bleu
Validation	17.8061
Test	18.0863

Framework versions

Transformers 4.20.0.dev0
Pytorch 1.8.0
Datasets 2.1.0
Tokenizers 0.12.1

sayanmandal
/

t5-small_6_3-hi_en-to-en

t5-small_6_3-hi_en-to-en

Model description

Intended uses & limitations

Training and evaluation data

Translation:

Training procedure

Training hyperparameters

Training results

Evaluation results

Framework versions

Dataset used to train sayanmandal/t5-small_6_3-hi_en-to-en

Evaluation results