mechanicalsea commited on
Commit
5388d50
1 Parent(s): 3b66fd5

update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -4
README.md CHANGED
@@ -18,11 +18,16 @@ tags:
18
  - Speaker Recognition
19
  ---
20
 
21
- ## SpeechT5 SID Manifest
22
 
23
  | [**Github**](https://github.com/microsoft/SpeechT5) | [**Huggingface**](https://huggingface.co/mechanicalsea/speecht5-sid) |
24
 
25
- This manifest is an attempt to recreate the Speaker Identification recipe used for training [SpeechT5](https://aclanthology.org/2022.acl-long.393). This manifest was constructed using [CMU ARCTIC](http://www.festvox.org/cmu_arctic/) four speakers, e.g., bdl, clb, rms, slt. There are 932 utterances for training, 100 utterances for validation, and 100 utterance for evaluation.
 
 
 
 
 
26
 
27
  ### Requirements
28
 
@@ -35,8 +40,9 @@ This manifest is an attempt to recreate the Speaker Identification recipe used f
35
 
36
  ### Model and Results
37
 
38
- - [`speecht5_sid.pt`](.) are reimplemented Speaker Identification fine-tuning on the released manifest **but with a smaller batch size** (Ensure the manifest is ok).
39
- - `results` are reproduced by the released fine-tuned model.
 
40
 
41
  ### Reference
42
 
 
18
  - Speaker Recognition
19
  ---
20
 
21
+ ## SpeechT5 SID
22
 
23
  | [**Github**](https://github.com/microsoft/SpeechT5) | [**Huggingface**](https://huggingface.co/mechanicalsea/speecht5-sid) |
24
 
25
+ This manifest is an attempt to recreate the Speaker Identification recipe used for training [SpeechT5](https://aclanthology.org/2022.acl-long.393). This manifest was constructed using [VoxCeleb1](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html) containing over 100,000 utterances for 1,251 celebrities. The identification split are given as follows.
26
+
27
+ | | train | valid | test |
28
+ | ------------------- | ------: | ----: | ----: |
29
+ | **# of speakers** | 1,251 | 1,251 | 1,251 |
30
+ | **# of utterances** | 138,361 | 6,904 | 8,251 |
31
 
32
  ### Requirements
33
 
 
40
 
41
  ### Model and Results
42
 
43
+ - [`speecht5_sid.pt`](./speecht5_sid.pt) are reimplemented Speaker Identification fine-tuning on the released manifest **but with a smaller batch size** (Ensure the manifest is ok).
44
+ - `results` are reproduced by the released fine-tuned model and the accuracy is $96.194\%$.
45
+ - `log` is the tensorboard log of fine-tuning the released model.
46
 
47
  ### Reference
48