speechbrainteam
commited on
Commit
•
4aec8d8
1
Parent(s):
2dfe872
Update README.md
Browse files
README.md
CHANGED
@@ -3,38 +3,53 @@ language: "en"
|
|
3 |
thumbnail:
|
4 |
tags:
|
5 |
- embeddings
|
6 |
-
-
|
7 |
-
-
|
8 |
-
-
|
9 |
- pytorch
|
10 |
- xvectors
|
11 |
- TDNN
|
|
|
12 |
license: "apache-2.0"
|
13 |
datasets:
|
14 |
-
-
|
15 |
metrics:
|
16 |
-
-
|
17 |
-
|
18 |
---
|
19 |
|
20 |
<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
|
21 |
<br/><br/>
|
22 |
|
23 |
-
#
|
24 |
-
|
25 |
-
This repository provides all the necessary tools to
|
26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
For a better experience, we encourage you to learn more about
|
29 |
-
[SpeechBrain](https://speechbrain.github.io). The given model performance on
|
30 |
|
31 |
-
| Release |
|
32 |
|:-------------:|:--------------:|
|
33 |
-
|
|
34 |
|
35 |
|
36 |
## Pipeline description
|
37 |
-
This system is composed of a TDNN model coupled with statistical pooling.
|
38 |
|
39 |
## Install SpeechBrain
|
40 |
|
@@ -47,21 +62,23 @@ pip install speechbrain
|
|
47 |
Please notice that we encourage you to read our tutorials and learn more about
|
48 |
[SpeechBrain](https://speechbrain.github.io).
|
49 |
|
50 |
-
###
|
51 |
|
52 |
```python
|
53 |
import torchaudio
|
54 |
from speechbrain.pretrained import EncoderClassifier
|
55 |
-
classifier = EncoderClassifier.from_hparams(source="speechbrain/
|
56 |
-
|
57 |
-
|
|
|
|
|
58 |
```
|
59 |
|
60 |
### Inference on GPU
|
61 |
To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
|
62 |
|
63 |
### Training
|
64 |
-
The model was trained with SpeechBrain (
|
65 |
To train it from scratch follows these steps:
|
66 |
1. Clone SpeechBrain:
|
67 |
```bash
|
@@ -76,11 +93,11 @@ pip install -e .
|
|
76 |
|
77 |
3. Run Training:
|
78 |
```
|
79 |
-
cd
|
80 |
-
python
|
81 |
```
|
82 |
|
83 |
-
You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/
|
84 |
|
85 |
### Limitations
|
86 |
The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
|
@@ -100,6 +117,21 @@ The SpeechBrain team does not provide any warranty on the performance achieved b
|
|
100 |
}
|
101 |
```
|
102 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
103 |
|
104 |
#### Referencing SpeechBrain
|
105 |
|
@@ -110,7 +142,7 @@ The SpeechBrain team does not provide any warranty on the performance achieved b
|
|
110 |
year = {2021},
|
111 |
publisher = {GitHub},
|
112 |
journal = {GitHub repository},
|
113 |
-
howpublished = {
|
114 |
}
|
115 |
```
|
116 |
|
|
|
3 |
thumbnail:
|
4 |
tags:
|
5 |
- embeddings
|
6 |
+
- Commands
|
7 |
+
- Keywords
|
8 |
+
- Keyword Spotting
|
9 |
- pytorch
|
10 |
- xvectors
|
11 |
- TDNN
|
12 |
+
- Command Recognition
|
13 |
license: "apache-2.0"
|
14 |
datasets:
|
15 |
+
- google speech commands
|
16 |
metrics:
|
17 |
+
- Accuracy
|
18 |
+
|
19 |
---
|
20 |
|
21 |
<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
|
22 |
<br/><br/>
|
23 |
|
24 |
+
# Command Recognition with xvector embeddings on Google Speech Commands
|
25 |
+
|
26 |
+
This repository provides all the necessary tools to perform command recognition with SpeechBrain using a model pretrained on Google Speech Commands.
|
27 |
+
You can download the dataset [here](https://www.tensorflow.org/datasets/catalog/speech_commands)
|
28 |
+
The dataset is primary provides small training, validation, and test sets useful for detecting single keywords in short audio clips. The provided system can recognize the following 12 keywords:
|
29 |
+
|
30 |
+
-'yes'
|
31 |
+
-'no'
|
32 |
+
-'up'
|
33 |
+
-'down'
|
34 |
+
-'left'
|
35 |
+
-'right'
|
36 |
+
-'on'
|
37 |
+
-'off'
|
38 |
+
-'stop'
|
39 |
+
-'go'
|
40 |
+
-'unknown'
|
41 |
+
-'silence'
|
42 |
|
43 |
For a better experience, we encourage you to learn more about
|
44 |
+
[SpeechBrain](https://speechbrain.github.io). The given model performance on the test set is:
|
45 |
|
46 |
+
| Release | Accuracy(%)
|
47 |
|:-------------:|:--------------:|
|
48 |
+
| 06-02-21 | 98.14 |
|
49 |
|
50 |
|
51 |
## Pipeline description
|
52 |
+
This system is composed of a TDNN model coupled with statistical pooling. A classifier, trained with Categorical Cross-Entropy Loss, is applied on the top of that.
|
53 |
|
54 |
## Install SpeechBrain
|
55 |
|
|
|
62 |
Please notice that we encourage you to read our tutorials and learn more about
|
63 |
[SpeechBrain](https://speechbrain.github.io).
|
64 |
|
65 |
+
### Perform Command Recognition
|
66 |
|
67 |
```python
|
68 |
import torchaudio
|
69 |
from speechbrain.pretrained import EncoderClassifier
|
70 |
+
classifier = EncoderClassifier.from_hparams(source="speechbrain/google_speech_command_xvector", savedir="pretrained_models/google_speech_command_xvector")
|
71 |
+
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/yes.wav')
|
72 |
+
print(text_lab)
|
73 |
+
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/stop.wav')
|
74 |
+
print(text_lab)
|
75 |
```
|
76 |
|
77 |
### Inference on GPU
|
78 |
To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
|
79 |
|
80 |
### Training
|
81 |
+
The model was trained with SpeechBrain (b7ff9dc4).
|
82 |
To train it from scratch follows these steps:
|
83 |
1. Clone SpeechBrain:
|
84 |
```bash
|
|
|
93 |
|
94 |
3. Run Training:
|
95 |
```
|
96 |
+
cd recipes/Google-speech-commands
|
97 |
+
python train.py hparams/xvect.yaml --data_folder=your_data_folder
|
98 |
```
|
99 |
|
100 |
+
You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1BKwtr1mBRICRe56PcQk2sCFq63Lsvdpc?usp=sharing).
|
101 |
|
102 |
### Limitations
|
103 |
The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
|
|
|
117 |
}
|
118 |
```
|
119 |
|
120 |
+
#### Referencing Google Speech Commands
|
121 |
+
```@article{speechcommands,
|
122 |
+
author = { {Warden}, P.},
|
123 |
+
title = "{Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition}",
|
124 |
+
journal = {ArXiv e-prints},
|
125 |
+
archivePrefix = "arXiv",
|
126 |
+
eprint = {1804.03209},
|
127 |
+
primaryClass = "cs.CL",
|
128 |
+
keywords = {Computer Science - Computation and Language, Computer Science - Human-Computer Interaction},
|
129 |
+
year = 2018,
|
130 |
+
month = apr,
|
131 |
+
url = {https://arxiv.org/abs/1804.03209},
|
132 |
+
}
|
133 |
+
```
|
134 |
+
|
135 |
|
136 |
#### Referencing SpeechBrain
|
137 |
|
|
|
142 |
year = {2021},
|
143 |
publisher = {GitHub},
|
144 |
journal = {GitHub repository},
|
145 |
+
howpublished = {\\url{https://github.com/speechbrain/speechbrain}},
|
146 |
}
|
147 |
```
|
148 |
|