mechanicalsea
commited on
Commit
•
5388d50
1
Parent(s):
3b66fd5
update README.md
Browse files
README.md
CHANGED
@@ -18,11 +18,16 @@ tags:
|
|
18 |
- Speaker Recognition
|
19 |
---
|
20 |
|
21 |
-
## SpeechT5 SID
|
22 |
|
23 |
| [**Github**](https://github.com/microsoft/SpeechT5) | [**Huggingface**](https://huggingface.co/mechanicalsea/speecht5-sid) |
|
24 |
|
25 |
-
This manifest is an attempt to recreate the Speaker Identification recipe used for training [SpeechT5](https://aclanthology.org/2022.acl-long.393). This manifest was constructed using [
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
### Requirements
|
28 |
|
@@ -35,8 +40,9 @@ This manifest is an attempt to recreate the Speaker Identification recipe used f
|
|
35 |
|
36 |
### Model and Results
|
37 |
|
38 |
-
- [`speecht5_sid.pt`](.) are reimplemented Speaker Identification fine-tuning on the released manifest **but with a smaller batch size** (Ensure the manifest is ok).
|
39 |
-
- `results` are reproduced by the released fine-tuned model.
|
|
|
40 |
|
41 |
### Reference
|
42 |
|
|
|
18 |
- Speaker Recognition
|
19 |
---
|
20 |
|
21 |
+
## SpeechT5 SID
|
22 |
|
23 |
| [**Github**](https://github.com/microsoft/SpeechT5) | [**Huggingface**](https://huggingface.co/mechanicalsea/speecht5-sid) |
|
24 |
|
25 |
+
This manifest is an attempt to recreate the Speaker Identification recipe used for training [SpeechT5](https://aclanthology.org/2022.acl-long.393). This manifest was constructed using [VoxCeleb1](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html) containing over 100,000 utterances for 1,251 celebrities. The identification split are given as follows.
|
26 |
+
|
27 |
+
| | train | valid | test |
|
28 |
+
| ------------------- | ------: | ----: | ----: |
|
29 |
+
| **# of speakers** | 1,251 | 1,251 | 1,251 |
|
30 |
+
| **# of utterances** | 138,361 | 6,904 | 8,251 |
|
31 |
|
32 |
### Requirements
|
33 |
|
|
|
40 |
|
41 |
### Model and Results
|
42 |
|
43 |
+
- [`speecht5_sid.pt`](./speecht5_sid.pt) are reimplemented Speaker Identification fine-tuning on the released manifest **but with a smaller batch size** (Ensure the manifest is ok).
|
44 |
+
- `results` are reproduced by the released fine-tuned model and the accuracy is $96.194\%$.
|
45 |
+
- `log` is the tensorboard log of fine-tuning the released model.
|
46 |
|
47 |
### Reference
|
48 |
|