mechanicalsea
/

speecht5-sid

Audio Classification

self-supervised learning

Speaker Identification

Speaker Recognition

Model card Files Files and versions Metrics Training metrics Community

mechanicalsea commited on Jan 31, 2023

Commit

5388d50

•

1 Parent(s): 3b66fd5

update README.md

Files changed (1) hide show

README.md +10 -4

README.md CHANGED Viewed

@@ -18,11 +18,16 @@ tags:
 - Speaker Recognition
 ---
-## SpeechT5 SID Manifest
 | [**Github**](https://github.com/microsoft/SpeechT5) | [**Huggingface**](https://huggingface.co/mechanicalsea/speecht5-sid) |
-This manifest is an attempt to recreate the Speaker Identification recipe used for training [SpeechT5](https://aclanthology.org/2022.acl-long.393). This manifest was constructed using [CMU ARCTIC](http://www.festvox.org/cmu_arctic/) four speakers, e.g., bdl, clb, rms, slt. There are 932 utterances for training, 100 utterances for validation, and 100 utterance for evaluation.
 ### Requirements
@@ -35,8 +40,9 @@ This manifest is an attempt to recreate the Speaker Identification recipe used f
 ### Model and Results
-- [`speecht5_sid.pt`](.) are reimplemented Speaker Identification fine-tuning on the released manifest **but with a smaller batch size** (Ensure the manifest is ok).
-- `results` are reproduced by the released fine-tuned model.
 ### Reference

 - Speaker Recognition
 ---
+## SpeechT5 SID
 | [**Github**](https://github.com/microsoft/SpeechT5) | [**Huggingface**](https://huggingface.co/mechanicalsea/speecht5-sid) |
+This manifest is an attempt to recreate the Speaker Identification recipe used for training [SpeechT5](https://aclanthology.org/2022.acl-long.393). This manifest was constructed using [VoxCeleb1](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html) containing over 100,000 utterances for 1,251 celebrities. The identification split are given as follows.
+|                     |   train | valid |  test |
+| ------------------- | ------: | ----: | ----: |
+| **# of speakers**   |   1,251 | 1,251 | 1,251 |
+| **# of utterances** | 138,361 | 6,904 | 8,251 |
 ### Requirements
 ### Model and Results
+- [`speecht5_sid.pt`](./speecht5_sid.pt) are reimplemented Speaker Identification fine-tuning on the released manifest **but with a smaller batch size** (Ensure the manifest is ok).
+- `results` are reproduced by the released fine-tuned model and the accuracy is $96.194\%$.
+- `log` is the tensorboard log of fine-tuning the released model.
 ### Reference