|
--- |
|
license: apache-2.0 |
|
language: |
|
- ja |
|
library_name: espnet |
|
tags: |
|
- automatic-speech-recognition |
|
--- |
|
|
|
# reazonspeech-espnet-v2 |
|
|
|
`reazonspeech-espnet-v2` is an automatic speech recognition (ASR) model |
|
trained on [ReazonSpeech v2.0 corpus](https://huggingface.co/datasets/reazon-research/reazonspeech). |
|
|
|
## Model Architecture |
|
|
|
The general architecture is the same as [reazonspeech-espnet-v1](https://huggingface.co/reazon-research/reazonspeech-espnet-v1). |
|
|
|
* Conformer-Transducer model with 118.85M parameters. |
|
|
|
* We trained this model for 33 epoch using Adam optimizer. The maximum |
|
learning rate was 0.02, with 15000 warmup steps. |
|
|
|
* The training audio files were sampled at 16khz. Make sure that your |
|
input audio files have the same sampling rate. |
|
|
|
## Usage |
|
|
|
We provide `transcribe()` function that is suitable to use with this |
|
model. |
|
|
|
``` |
|
from espnet2.bin.asr_inference import Speech2Text |
|
from reazonspeech.espnet.asr import transcribe |
|
|
|
speech2text = Speech2Text( |
|
"exp/asr_train_asr_conformer_raw_jp_char/config.yaml", |
|
"exp/asr_train_asr_conformer_raw_jp_char/valid.acc.ave_10best.pth", |
|
device="cuda" |
|
) |
|
|
|
for cap in transcribe("speech.wav", speech2text): |
|
print(cap) |
|
``` |
|
|
|
## License |
|
|
|
[Apaceh Licence 2.0](https://choosealicense.com/licenses/apache-2.0/) |
|
|