speechbrainteam commited on
Commit
4aec8d8
1 Parent(s): 2dfe872

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -23
README.md CHANGED
@@ -3,38 +3,53 @@ language: "en"
3
  thumbnail:
4
  tags:
5
  - embeddings
6
- - Speaker
7
- - Verification
8
- - Identification
9
  - pytorch
10
  - xvectors
11
  - TDNN
 
12
  license: "apache-2.0"
13
  datasets:
14
- - voxceleb
15
  metrics:
16
- - EER
17
- - min_dct
18
  ---
19
 
20
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
21
  <br/><br/>
22
 
23
- # Speaker Verification with xvector embeddings on Voxceleb
24
-
25
- This repository provides all the necessary tools to extract speaker embeddings with a pretrained TDNN model using SpeechBrain.
26
- The system is trained on Voxceleb 1+ Voxceleb2 training data.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  For a better experience, we encourage you to learn more about
29
- [SpeechBrain](https://speechbrain.github.io). The given model performance on Voxceleb1-test set (Cleaned) is:
30
 
31
- | Release | EER(%)
32
  |:-------------:|:--------------:|
33
- | 05-03-21 | 3.2 |
34
 
35
 
36
  ## Pipeline description
37
- This system is composed of a TDNN model coupled with statistical pooling. The system is trained with Categorical Cross-Entropy Loss.
38
 
39
  ## Install SpeechBrain
40
 
@@ -47,21 +62,23 @@ pip install speechbrain
47
  Please notice that we encourage you to read our tutorials and learn more about
48
  [SpeechBrain](https://speechbrain.github.io).
49
 
50
- ### Compute your speaker embeddings
51
 
52
  ```python
53
  import torchaudio
54
  from speechbrain.pretrained import EncoderClassifier
55
- classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-xvect-voxceleb", savedir="pretrained_models/spkrec-xvect-voxceleb")
56
- signal, fs =torchaudio.load('samples/audio_samples/example1.wav')
57
- embeddings = classifier.encode_batch(signal)
 
 
58
  ```
59
 
60
  ### Inference on GPU
61
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
62
 
63
  ### Training
64
- The model was trained with SpeechBrain (aa018540).
65
  To train it from scratch follows these steps:
66
  1. Clone SpeechBrain:
67
  ```bash
@@ -76,11 +93,11 @@ pip install -e .
76
 
77
  3. Run Training:
78
  ```
79
- cd recipes/VoxCeleb/SpeakerRec/
80
- python train_speaker_embeddings.py hparams/train_x_vectors.yaml --data_folder=your_data_folder
81
  ```
82
 
83
- You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1RtCBJ3O8iOCkFrJItCKT9oL-Q1MNCwMH?usp=sharing).
84
 
85
  ### Limitations
86
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
@@ -100,6 +117,21 @@ The SpeechBrain team does not provide any warranty on the performance achieved b
100
  }
101
  ```
102
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
  #### Referencing SpeechBrain
105
 
@@ -110,7 +142,7 @@ The SpeechBrain team does not provide any warranty on the performance achieved b
110
  year = {2021},
111
  publisher = {GitHub},
112
  journal = {GitHub repository},
113
- howpublished = {\url{https://github.com/speechbrain/speechbrain}},
114
  }
115
  ```
116
 
 
3
  thumbnail:
4
  tags:
5
  - embeddings
6
+ - Commands
7
+ - Keywords
8
+ - Keyword Spotting
9
  - pytorch
10
  - xvectors
11
  - TDNN
12
+ - Command Recognition
13
  license: "apache-2.0"
14
  datasets:
15
+ - google speech commands
16
  metrics:
17
+ - Accuracy
18
+
19
  ---
20
 
21
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
22
  <br/><br/>
23
 
24
+ # Command Recognition with xvector embeddings on Google Speech Commands
25
+
26
+ This repository provides all the necessary tools to perform command recognition with SpeechBrain using a model pretrained on Google Speech Commands.
27
+ You can download the dataset [here](https://www.tensorflow.org/datasets/catalog/speech_commands)
28
+ The dataset is primary provides small training, validation, and test sets useful for detecting single keywords in short audio clips. The provided system can recognize the following 12 keywords:
29
+
30
+ -'yes'
31
+ -'no'
32
+ -'up'
33
+ -'down'
34
+ -'left'
35
+ -'right'
36
+ -'on'
37
+ -'off'
38
+ -'stop'
39
+ -'go'
40
+ -'unknown'
41
+ -'silence'
42
 
43
  For a better experience, we encourage you to learn more about
44
+ [SpeechBrain](https://speechbrain.github.io). The given model performance on the test set is:
45
 
46
+ | Release | Accuracy(%)
47
  |:-------------:|:--------------:|
48
+ | 06-02-21 | 98.14 |
49
 
50
 
51
  ## Pipeline description
52
+ This system is composed of a TDNN model coupled with statistical pooling. A classifier, trained with Categorical Cross-Entropy Loss, is applied on the top of that.
53
 
54
  ## Install SpeechBrain
55
 
 
62
  Please notice that we encourage you to read our tutorials and learn more about
63
  [SpeechBrain](https://speechbrain.github.io).
64
 
65
+ ### Perform Command Recognition
66
 
67
  ```python
68
  import torchaudio
69
  from speechbrain.pretrained import EncoderClassifier
70
+ classifier = EncoderClassifier.from_hparams(source="speechbrain/google_speech_command_xvector", savedir="pretrained_models/google_speech_command_xvector")
71
+ out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/yes.wav')
72
+ print(text_lab)
73
+ out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/stop.wav')
74
+ print(text_lab)
75
  ```
76
 
77
  ### Inference on GPU
78
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
79
 
80
  ### Training
81
+ The model was trained with SpeechBrain (b7ff9dc4).
82
  To train it from scratch follows these steps:
83
  1. Clone SpeechBrain:
84
  ```bash
 
93
 
94
  3. Run Training:
95
  ```
96
+ cd recipes/Google-speech-commands
97
+ python train.py hparams/xvect.yaml --data_folder=your_data_folder
98
  ```
99
 
100
+ You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1BKwtr1mBRICRe56PcQk2sCFq63Lsvdpc?usp=sharing).
101
 
102
  ### Limitations
103
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
 
117
  }
118
  ```
119
 
120
+ #### Referencing Google Speech Commands
121
+ ```@article{speechcommands,
122
+ author = { {Warden}, P.},
123
+ title = "{Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition}",
124
+ journal = {ArXiv e-prints},
125
+ archivePrefix = "arXiv",
126
+ eprint = {1804.03209},
127
+ primaryClass = "cs.CL",
128
+ keywords = {Computer Science - Computation and Language, Computer Science - Human-Computer Interaction},
129
+ year = 2018,
130
+ month = apr,
131
+ url = {https://arxiv.org/abs/1804.03209},
132
+ }
133
+ ```
134
+
135
 
136
  #### Referencing SpeechBrain
137
 
 
142
  year = {2021},
143
  publisher = {GitHub},
144
  journal = {GitHub repository},
145
+ howpublished = {\\url{https://github.com/speechbrain/speechbrain}},
146
  }
147
  ```
148