Update README.md
Browse files
README.md
CHANGED
@@ -17,13 +17,13 @@ model-index:
|
|
17 |
type: automatic-speech-recognition
|
18 |
dataset:
|
19 |
name: Common Voice 13
|
20 |
-
type:
|
21 |
config: hi
|
22 |
split: test
|
23 |
args: hi
|
24 |
metrics:
|
25 |
- name: Wer
|
26 |
-
type:
|
27 |
value: 17.39228374836173
|
28 |
library_name: transformers
|
29 |
pipeline_tag: automatic-speech-recognition
|
@@ -32,9 +32,9 @@ pipeline_tag: automatic-speech-recognition
|
|
32 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
33 |
should probably proofread and complete it, then remove this comment. -->
|
34 |
|
35 |
-
#
|
36 |
|
37 |
-
This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Common Voice
|
38 |
It achieves the following results on the evaluation set:
|
39 |
- Loss: 0.2933
|
40 |
- Wer Ortho: 34.1997
|
@@ -42,17 +42,23 @@ It achieves the following results on the evaluation set:
|
|
42 |
|
43 |
## Model description
|
44 |
|
45 |
-
|
46 |
|
47 |
## Intended uses & limitations
|
48 |
|
49 |
-
|
50 |
|
51 |
## Training and evaluation data
|
52 |
|
53 |
-
|
54 |
|
55 |
## Training procedure
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
|
57 |
### Training hyperparameters
|
58 |
|
@@ -67,6 +73,9 @@ The following hyperparameters were used during training:
|
|
67 |
- training_steps: 1000
|
68 |
- mixed_precision_training: Native AMP
|
69 |
|
|
|
|
|
|
|
70 |
### Training results
|
71 |
|
72 |
| Training Loss | Epoch | Step | Validation Loss | Wer Ortho | Wer |
|
|
|
17 |
type: automatic-speech-recognition
|
18 |
dataset:
|
19 |
name: Common Voice 13
|
20 |
+
type: Mozilla-foundation/common_voice_17_0
|
21 |
config: hi
|
22 |
split: test
|
23 |
args: hi
|
24 |
metrics:
|
25 |
- name: Wer
|
26 |
+
type: Wer
|
27 |
value: 17.39228374836173
|
28 |
library_name: transformers
|
29 |
pipeline_tag: automatic-speech-recognition
|
|
|
32 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
33 |
should probably proofread and complete it, then remove this comment. -->
|
34 |
|
35 |
+
# Whisper-Small-Finetuned-Hindi - Yash_Ratnaker
|
36 |
|
37 |
+
This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Common Voice 17 dataset.
|
38 |
It achieves the following results on the evaluation set:
|
39 |
- Loss: 0.2933
|
40 |
- Wer Ortho: 34.1997
|
|
|
42 |
|
43 |
## Model description
|
44 |
|
45 |
+
This model is based on the Whisper-small architecture, originally developed by OpenAI for automatic speech recognition (ASR). It was trained on a large amount of multilingual data to understand and transcribe speech in different languages.Originally trained by OpenAI on 680k hours of labeled multilingual and multitask supervised data, the Whisper model demonstrates a strong ability to generalize across languages and tasks. I fine-tuned this model using the Common Voice 17 Hindi dataset, which helps it better recognize and transcribe spoken Hindi
|
46 |
|
47 |
## Intended uses & limitations
|
48 |
|
49 |
+
This fine-tuned Whisper model is intended for automatic speech recognition in Hindi. It is suitable for transcribing spoken Hindi in various contexts, such as educational content, media transcription, and accessibility services. It excels in scenarios where clear audio with minimal background noise is available. However, the model may have limitations when dealing with highly noisy environments, overlapping speech, or dialects that were not well-represented in the training dataset. It is also less effective in real-time transcription where low latency is required.
|
50 |
|
51 |
## Training and evaluation data
|
52 |
|
53 |
+
The model was trained on the Common Voice 17 Hindi dataset, consisting of diverse speech samples from native Hindi speakers. This dataset provides a wide range of accents, pronunciations, and speech patterns, enabling the model to learn from a rich linguistic variety. The evaluation data was a subset of this dataset, carefully selected to represent different speakers and audio conditions, ensuring that the model's performance is robust and generalizes well to new, unseen data.
|
54 |
|
55 |
## Training procedure
|
56 |
+
Learning Rate: The learning rate was optimized to find a balance between fast convergence and stable training. The fine-tuning process utilized a lower learning rate than pre-training to ensure careful adjustments to the pre-trained weights.
|
57 |
+
Batch Size: A batch size that maximizes GPU utilization without overwhelming memory capacity was chosen. This helps in maintaining consistent training steps and reliable gradient updates across epochs.
|
58 |
+
Epochs: The model was trained for multiple epochs, iterating over the dataset to refine its parameters gradually. This allowed the model to converge effectively and improve its performance with each pass over the data.
|
59 |
+
Optimizer: The AdamW optimizer was selected for its adaptive learning rate capabilities, which help in efficiently managing the gradient descent process. It also includes a weight decay term to reduce the risk of overfitting.
|
60 |
+
Weight Decay: A small weight decay was applied during training to regularize the model and prevent overfitting. This was particularly important given the large capacity of the model and the relatively smaller size of the fine-tuning dataset compared to the original pre-training data.
|
61 |
+
|
62 |
|
63 |
### Training hyperparameters
|
64 |
|
|
|
73 |
- training_steps: 1000
|
74 |
- mixed_precision_training: Native AMP
|
75 |
|
76 |
+
###Training output
|
77 |
+
- global_step=1000, training_loss=0.23814286267757415, metrics={'train_runtime': 7575.8956, 'train_samples_per_second': 2.112, 'train_steps_per_second': 0.132, 'total_flos': 4.61563489271808e+18, 'train_loss': 0.23814286267757415, 'epoch': 2.247191011235955})
|
78 |
+
|
79 |
### Training results
|
80 |
|
81 |
| Training Loss | Epoch | Step | Validation Loss | Wer Ortho | Wer |
|