w2v-bert-cv-grain-lg_both_v2

This model is a fine-tuned version of facebook/w2v-bert-2.0 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 80
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.2889	1.0	10812	0.1708	0.1703	0.0386
0.1849	2.0	21624	0.1342	0.1274	0.0285
0.1512	3.0	32436	0.1144	0.1044	0.0244
0.1313	4.0	43248	0.1033	0.0918	0.0217
0.117	5.0	54060	0.1034	0.0738	0.0191
0.1056	6.0	64872	0.0906	0.0738	0.0181
0.0962	7.0	75684	0.0959	0.0655	0.0168
0.0885	8.0	86496	0.0860	0.0592	0.0155
0.0807	9.0	97308	0.0844	0.0603	0.0154
0.0742	10.0	108120	0.0814	0.0573	0.0144
0.0683	11.0	118932	0.0858	0.0588	0.0154
0.0629	12.0	129744	0.0944	0.0538	0.0146
0.0581	13.0	140556	0.0842	0.0558	0.0151
0.0528	14.0	151368	0.0873	0.0503	0.0141
0.0479	15.0	162180	0.0820	0.0503	0.0138
0.0429	16.0	172992	0.0815	0.0427	0.0125
0.0392	17.0	183804	0.0864	0.0466	0.0128
0.035	18.0	194616	0.0899	0.0479	0.0128
0.0316	19.0	205428	0.0872	0.0430	0.0120
0.0286	20.0	216240	0.0821	0.0425	0.0114
0.0254	21.0	227052	0.0898	0.0466	0.0122
0.0229	22.0	237864	0.0864	0.0417	0.0120
0.021	23.0	248676	0.0893	0.0408	0.0122
0.0192	24.0	259488	0.0878	0.0430	0.0118
0.0171	25.0	270300	0.0994	0.0473	0.0128
0.0156	26.0	281112	0.0892	0.0443	0.0123