Whisper Medium GA-EN Speech Translation Raw

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, and SpokenWords dataset. It achieves the following results on the evaluation set:

Loss: 1.5187
Bleu: 26.56
Chrf: 46.91
Wer: 76.6772

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.03
training_steps: 4000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Bleu	Chrf	Validation Loss	Wer
2.5874	0.0539	100	4.9	19.49	2.1785	114.0027
2.3237	0.1079	200	6.48	22.77	2.1129	151.8235
2.192	0.1618	300	7.92	25.9	2.0182	148.6718
1.9861	0.2157	400	10.55	28.55	1.8607	121.0266
1.8893	0.2697	500	16.68	33.64	1.8560	89.7794
1.8526	0.3236	600	8.83	30.12	1.7738	166.9968
1.6537	0.3776	700	10.94	33.83	1.6781	152.2287
1.7103	0.4315	800	16.9	36.4	1.6389	92.2557
1.4837	0.4854	900	13.81	34.5	1.6077	124.2233
1.2784	0.5394	1000	14.79	37.53	1.6103	116.3440
1.111	0.5933	1100	19.31	39.0	1.5579	93.6965
1.167	0.6472	1200	20.88	41.7	1.5210	91.6704
1.2217	0.7012	1300	21.29	41.72	1.4719	84.9167
1.0613	0.7551	1400	28.3	44.37	1.4663	67.1319
0.9256	0.8091	1500	27.5	45.59	1.4258	68.7078
0.8023	0.8630	1600	27.1	46.27	1.4027	72.7600
0.8327	0.9169	1700	27.03	46.19	1.3784	73.0302
0.7019	0.9709	1800	28.91	46.34	1.4127	67.4921
0.2681	1.0248	1900	28.53	47.12	1.3955	68.3026
0.2659	1.0787	2000	28.37	45.85	1.4194	68.1225
0.4202	1.1327	2100	1.5449	27.53	44.0	69.8784
0.4212	1.1866	2200	1.6060	25.89	43.05	70.1036
0.4124	1.2406	2300	1.6167	24.31	41.55	75.8217
0.4696	1.2945	2400	1.5904	21.79	41.86	85.0968
0.4018	1.3484	2500	1.6300	25.36	43.45	76.4070
0.4549	1.4024	2600	1.5540	26.06	44.27	71.9946
0.4018	1.4563	2700	1.5721	26.22	45.42	72.9851
0.3534	1.5102	2800	1.5488	23.65	44.43	80.0090
0.2907	1.5642	2900	1.5494	24.04	42.57	75.3715
0.3117	1.6181	3000	1.5691	28.27	45.06	67.2670
0.3379	1.6721	3100	1.4951	30.52	47.42	65.5561
0.3686	1.7260	3200	1.5010	30.7	48.13	64.8357
0.2855	1.7799	3300	1.5197	27.19	46.47	74.5610
0.2919	1.8339	3400	1.4974	31.39	48.56	63.5299
0.2582	1.8878	3500	1.4779	30.18	48.54	64.9257
0.2523	1.9417	3600	1.4835	30.29	47.07	66.6367
0.2005	1.9957	3700	1.4682	29.89	47.95	68.2125
0.0617	2.0496	3800	1.5221	29.49	47.1	67.6272
0.0661	2.1036	3900	1.5142	26.93	46.91	75.8217
0.0609	2.1575	4000	1.5187	26.56	46.91	76.6772

Framework versions

Transformers 4.41.2
Pytorch 2.2.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

ymoslem
/

whisper-medium-ga2en-v1.3.1-4k-r

Whisper Medium GA-EN Speech Translation Raw

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Datasets used to train ymoslem/whisper-medium-ga2en-v1.3.1-4k-r

Evaluation results

Whisper Medium GA-EN Speech Translation Raw

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from openai/whisper-medium

Datasets used to train ymoslem/whisper-medium-ga2en-v1.3.1-4k-r

Evaluation results

Finetuned from