reazon-research
/

reazonspeech-espnet-v2

Automatic Speech Recognition

Model card Files Files and versions Community

reazonspeech-espnet-v2 / README.md

fujimotos's picture

Add 'reazonspeech-espnet-v2' assets

2e20cf3 10 months ago

|

1.28 kB

	---
	license: apache-2.0
	language:
	- ja
	library_name: espnet
	tags:
	- automatic-speech-recognition
	---

	# reazonspeech-espnet-v2

	`reazonspeech-espnet-v2` is an automatic speech recognition (ASR) model
	trained on [ReazonSpeech v2.0 corpus](https://huggingface.co/datasets/reazon-research/reazonspeech).

	## Model Architecture

	The general architecture is the same as [reazonspeech-espnet-v1](https://huggingface.co/reazon-research/reazonspeech-espnet-v1).

	* Conformer-Transducer model with 118.85M parameters.

	* We trained this model for 33 epoch using Adam optimizer. The maximum
	learning rate was 0.02, with 15000 warmup steps.

	* The training audio files were sampled at 16khz. Make sure that your
	input audio files have the same sampling rate.

	## Usage

	We provide `transcribe()` function that is suitable to use with this
	model.

	```
	from espnet2.bin.asr_inference import Speech2Text
	from reazonspeech.espnet.asr import transcribe

	speech2text = Speech2Text(
	"exp/asr_train_asr_conformer_raw_jp_char/config.yaml",
	"exp/asr_train_asr_conformer_raw_jp_char/valid.acc.ave_10best.pth",
	device="cuda"
	)

	for cap in transcribe("speech.wav", speech2text):
	print(cap)
	```

	## License

	[Apaceh Licence 2.0](https://choosealicense.com/licenses/apache-2.0/)