Automatic Speech Recognition
NeMo
Japanese
NeMo
fujimotos commited on
Commit
f889979
1 Parent(s): a9099fc

Import reazonspeech-nemo-v2.nemo

Browse files

Signed-off-by: Fujimoto Seiji <[email protected]>

Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +50 -0
  3. reazonspeech-nemo-v2.nemo +3 -0
.gitattributes CHANGED
@@ -32,4 +32,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.xz filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
 
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
32
  *.xz filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *.nemo filter=lfs diff=lfs merge=lfs -text
36
  *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,53 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - ja
5
+ library_name: nemo
6
+ tags:
7
+ - automatic-speech-recognition
8
+ - NeMo
9
  ---
10
+
11
+ # reazonspeech-nemo-v2
12
+
13
+ `reazonspeech-nemo-v2` is an automatic speech recognition model trained
14
+ on [ReazonSpeech v2.0 corpus](https://huggingface.co/datasets/reazon-research/reazonspeech).
15
+
16
+ This model supports inference of long-form Japanese audio clips up to
17
+ several hours.
18
+
19
+ ## Model Architecture
20
+
21
+ The model features an improved Conformer architecture from
22
+ [Fast Conformer with Linearly Scalable Attention for Efficient
23
+ Speech Recognition](https://arxiv.org/abs/2305.05084).
24
+
25
+ * Subword-based RNN-T model. The total parameter count is 619M.
26
+
27
+ * Encoder uses [Longformer](https://arxiv.org/abs/2004.05150) attention
28
+ with local context size of 256, and has a single global token.
29
+
30
+ * Decoder has a vocabulary space of 3000 tokens constructed by
31
+ [SentencePiece](https://github.com/google/sentencepiece)
32
+ unigram tokenizer.
33
+
34
+ We trained this model for 1 million steps using AdamW optimizer
35
+ following Noam annealing schedule.
36
+
37
+ ## Usage
38
+
39
+ We recommend to use this model through our
40
+ [reazonspeech](https://github.com/reazon-research/reazonspeech)
41
+ library.
42
+
43
+ ```
44
+ from reazonspeech.nemo.asr import load_model, transcribe
45
+
46
+ model = load_model()
47
+ ret = transcribe("speech.wav", model)
48
+ print(ret.text)
49
+ ```
50
+
51
+ ## License
52
+
53
+ [Apaceh Licence 2.0](https://choosealicense.com/licenses/apache-2.0/)
reazonspeech-nemo-v2.nemo ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d196d43ad03466ca88beeda4bf5fafb07bab7202d4b663b8e4f12cb0a4381fae
3
+ size 2477946880