Binarybardakshat
commited on
Commit
•
80f132f
1
Parent(s):
0289adf
Update README.md
Browse files
README.md
CHANGED
@@ -49,16 +49,12 @@ model-index:
|
|
49 |
|
50 |
# SWRA (SWARA)
|
51 |
|
52 |
-
`SWRA (SWARA)` is a Speech to Text Transformer (S2T) model trained by @binarybardakshat for automatic speech recognition (ASR).
|
53 |
|
54 |
## Model Description
|
55 |
|
56 |
SWRA (SWARA) is an end-to-end sequence-to-sequence transformer model. It is trained with standard autoregressive cross-entropy loss and generates the transcripts autoregressively.
|
57 |
|
58 |
-
## Intended Uses & Limitations
|
59 |
-
|
60 |
-
This model can be used for end-to-end speech recognition (ASR). See the [model hub](https://huggingface.co/models?filter=speech_to_text) to look for other S2T checkpoints.
|
61 |
-
|
62 |
### How to Use
|
63 |
|
64 |
As this is a standard sequence-to-sequence transformer model, you can use the `generate` method to generate the transcripts by passing the speech features to the model.
|
@@ -134,33 +130,3 @@ print("WER:", wer.compute(predictions=result["transcription"], references=result
|
|
134 |
|
135 |
The S2T-SMALL-LIBRISPEECH-ASR is trained on [LibriSpeech ASR Corpus](https://www.openslr.org/12), a dataset consisting of
|
136 |
approximately 1000 hours of 16kHz read English speech.
|
137 |
-
|
138 |
-
|
139 |
-
## Training procedure
|
140 |
-
|
141 |
-
### Preprocessing
|
142 |
-
|
143 |
-
The speech data is pre-processed by extracting Kaldi-compliant 80-channel log mel-filter bank features automatically from
|
144 |
-
WAV/FLAC audio files via PyKaldi or torchaudio. Further utterance-level CMVN (cepstral mean and variance normalization)
|
145 |
-
is applied to each example.
|
146 |
-
|
147 |
-
The texts are lowercased and tokenized using SentencePiece and a vocabulary size of 10,000.
|
148 |
-
|
149 |
-
|
150 |
-
### Training
|
151 |
-
|
152 |
-
The model is trained with standard autoregressive cross-entropy loss and using [SpecAugment](https://arxiv.org/abs/1904.08779).
|
153 |
-
The encoder receives speech features, and the decoder generates the transcripts autoregressively.
|
154 |
-
|
155 |
-
|
156 |
-
### BibTeX entry and citation info
|
157 |
-
|
158 |
-
```bibtex
|
159 |
-
@inproceedings{wang2020fairseqs2t,
|
160 |
-
title = {fairseq S2T: Fast Speech-to-Text Modeling with fairseq},
|
161 |
-
author = {Changhan Wang and Yun Tang and Xutai Ma and Anne Wu and Dmytro Okhonko and Juan Pino},
|
162 |
-
booktitle = {Proceedings of the 2020 Conference of the Asian Chapter of the Association for Computational Linguistics (AACL): System Demonstrations},
|
163 |
-
year = {2020},
|
164 |
-
}
|
165 |
-
|
166 |
-
```
|
|
|
49 |
|
50 |
# SWRA (SWARA)
|
51 |
|
52 |
+
`SWRA (SWARA)` is a Speech to Text Transformer (S2T) model trained by @binarybardakshat for automatic speech recognition (ASR).
|
53 |
|
54 |
## Model Description
|
55 |
|
56 |
SWRA (SWARA) is an end-to-end sequence-to-sequence transformer model. It is trained with standard autoregressive cross-entropy loss and generates the transcripts autoregressively.
|
57 |
|
|
|
|
|
|
|
|
|
58 |
### How to Use
|
59 |
|
60 |
As this is a standard sequence-to-sequence transformer model, you can use the `generate` method to generate the transcripts by passing the speech features to the model.
|
|
|
130 |
|
131 |
The S2T-SMALL-LIBRISPEECH-ASR is trained on [LibriSpeech ASR Corpus](https://www.openslr.org/12), a dataset consisting of
|
132 |
approximately 1000 hours of 16kHz read English speech.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|