Updated Readme.md
Browse filesProvided vital information for the Model.
README.md
CHANGED
@@ -31,20 +31,31 @@ It achieves the following results on the evaluation set:
|
|
31 |
- Loss: 0.2967
|
32 |
- Wer: 0.1740
|
33 |
|
34 |
-
##
|
35 |
|
36 |
-
|
37 |
|
38 |
-
|
39 |
|
40 |
-
More information needed
|
41 |
|
42 |
## Training and evaluation data
|
43 |
|
44 |
-
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
## Training procedure
|
47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
### Training hyperparameters
|
49 |
|
50 |
The following hyperparameters were used during training:
|
|
|
31 |
- Loss: 0.2967
|
32 |
- Wer: 0.1740
|
33 |
|
34 |
+
## Usage
|
35 |
|
36 |
+
In order to evaluate this model on an entire dataset, the evaluation codes available in the whisper-finetune repository can be used.
|
37 |
|
38 |
+
The same repository also provides the scripts for faster inference using whisper-jax.
|
39 |
|
|
|
40 |
|
41 |
## Training and evaluation data
|
42 |
|
43 |
+
Training Data:
|
44 |
+
- [Snow Mountain Dataset for Kangri Language](https://huggingface.co/datasets/bridgeconn/snow-mountain)
|
45 |
+
|
46 |
+
Evaluation Data:
|
47 |
+
- [Snow Mountain Dataset for Kangri Language](https://huggingface.co/datasets/bridgeconn/snow-mountain)
|
48 |
+
- [Kangri Translators Dataset ](https://drive.google.com/drive/folders/16BdOieekGRAo2bFOQDd4YhE2LpgiRnqQ?usp=share_link)
|
49 |
|
50 |
## Training procedure
|
51 |
|
52 |
+
We implemented Cross-Lingual Phoneme Recognition - a process that leverages patterns in resource-rich languages such as Hindi to recognize utterances in resource-poor languages
|
53 |
+
such as Kangri. By fine-tuning a pre-trained model of the Whisper-Hindi-Large-V2 on a customised dataset - we have achieved SoTa accuracy.
|
54 |
+
A customised dataset - consisting of the brigdeconn/snow-mountain and sentences collected from Kangri translators was created. This was then split using the 80/20
|
55 |
+
split rule. The results were evaluated with 5000 steps. The model decreases the word error rate by 0.6% after the initial 1000 steps. The Validation Loss increases due to
|
56 |
+
more data being introduced.
|
57 |
+
|
58 |
+
|
59 |
### Training hyperparameters
|
60 |
|
61 |
The following hyperparameters were used during training:
|