File size: 4,750 Bytes
9e595ae
 
 
 
 
 
 
 
 
 
 
497df5e
9e595ae
 
 
 
 
497df5e
9e595ae
 
 
 
 
497df5e
9e595ae
497df5e
9e595ae
 
 
 
 
 
 
3054f33
9e595ae
 
 
 
 
 
b7ff348
9e595ae
 
 
b0ca718
9e595ae
 
 
 
 
 
 
fd1ea7e
 
9e595ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
language:
- ru
license: apache-2.0
tags:
- hf-asr-leaderboard
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
metrics:
- wer
base_model: openai/whisper-base
model-index:
- name: whisper-base-fine_tuned-ru
  results:
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    dataset:
      name: common_voice_11_0
      type: mozilla-foundation/common_voice_11_0
      args: 'config: ru, split: test'
    metrics:
    - type: wer
      value: 41.216909250757055
      name: Wer
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# whisper-base-fine_tuned-ru

This model is a fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on the [common_voice_11_0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4553
- Wer: 41.2169

## Model description

Same as original model (see [whisper-base](https://huggingface.co/openai/whisper-base)). ***But! This model has been fine-tuned for the task of transcribing the Russian language.***

## Intended uses & limitations

Same as original model (see [whisper-base](https://huggingface.co/openai/whisper-base)).

## Training and evaluation data

More information needed

## Training procedure

The model is fine-tuned using the following notebook (available only in the Russian version): https://github.com/blademoon/Whisper_Train

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 250
- training_steps: 20000
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Wer     |
|:-------------:|:-----:|:-----:|:---------------:|:-------:|
| 0.702         | 0.25  | 500   | 0.8245          | 71.6653 |
| 0.5699        | 0.49  | 1000  | 0.6640          | 55.7048 |
| 0.5261        | 0.74  | 1500  | 0.6127          | 50.6215 |
| 0.4997        | 0.98  | 2000  | 0.5834          | 47.4541 |
| 0.4681        | 1.23  | 2500  | 0.5638          | 46.6262 |
| 0.4651        | 1.48  | 3000  | 0.5497          | 47.5090 |
| 0.4637        | 1.72  | 3500  | 0.5379          | 46.5700 |
| 0.4185        | 1.97  | 4000  | 0.5274          | 45.3160 |
| 0.3856        | 2.22  | 4500  | 0.5205          | 45.5871 |
| 0.4078        | 2.46  | 5000  | 0.5122          | 45.7190 |
| 0.4132        | 2.71  | 5500  | 0.5066          | 45.5004 |
| 0.3914        | 2.96  | 6000  | 0.4998          | 47.0011 |
| 0.3822        | 3.2   | 6500  | 0.4959          | 44.9570 |
| 0.3596        | 3.45  | 7000  | 0.4916          | 45.5578 |
| 0.3877        | 3.69  | 7500  | 0.4870          | 45.2476 |
| 0.3687        | 3.94  | 8000  | 0.4832          | 45.2159 |
| 0.3514        | 4.19  | 8500  | 0.4809          | 46.0254 |
| 0.3202        | 4.43  | 9000  | 0.4779          | 48.1306 |
| 0.3229        | 4.68  | 9500  | 0.4751          | 45.5724 |
| 0.3285        | 4.93  | 10000 | 0.4717          | 45.9436 |
| 0.3286        | 5.17  | 10500 | 0.4705          | 45.0510 |
| 0.3294        | 5.42  | 11000 | 0.4689          | 47.2111 |
| 0.3384        | 5.66  | 11500 | 0.4666          | 47.3393 |
| 0.316         | 5.91  | 12000 | 0.4650          | 43.2536 |
| 0.2988        | 6.16  | 12500 | 0.4638          | 42.9789 |
| 0.3046        | 6.4   | 13000 | 0.4629          | 42.4331 |
| 0.2962        | 6.65  | 13500 | 0.4614          | 40.2437 |
| 0.3008        | 6.9   | 14000 | 0.4602          | 39.5734 |
| 0.2749        | 7.14  | 14500 | 0.4593          | 40.1497 |
| 0.3001        | 7.39  | 15000 | 0.4588          | 42.6248 |
| 0.3054        | 7.64  | 15500 | 0.4580          | 40.3707 |
| 0.2926        | 7.88  | 16000 | 0.4574          | 39.4232 |
| 0.2938        | 8.13  | 16500 | 0.4569          | 40.9532 |
| 0.3105        | 8.37  | 17000 | 0.4566          | 40.4379 |
| 0.2799        | 8.62  | 17500 | 0.4562          | 40.3622 |
| 0.2793        | 8.87  | 18000 | 0.4557          | 41.3451 |
| 0.2819        | 9.11  | 18500 | 0.4555          | 41.4184 |
| 0.2907        | 9.36  | 19000 | 0.4555          | 39.9348 |
| 0.3113        | 9.61  | 19500 | 0.4553          | 41.0289 |
| 0.2867        | 9.85  | 20000 | 0.4553          | 41.2169 |


### Framework versions

- Transformers 4.24.0
- Pytorch 1.13.1
- Datasets 2.7.1
- Tokenizers 0.13.1