Spaces:
Running
Running
kamilakesbi
commited on
Commit
•
83407f3
1
Parent(s):
117475e
Update README.md
Browse files
README.md
CHANGED
@@ -7,28 +7,26 @@ sdk: static
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
-
[diarizers-community](https://huggingface.co/diarizers-community) aims to promote speaker diarization on the Hugging Face hub. It
|
11 |
|
12 |
-
|
13 |
|
14 |
-
|
15 |
-
|
16 |
-
The currently available datasets are the CallHome (Japanese, Chinese, German, Spanish, English), the AMI Corpus (English), Vox-Converse (English) and Simsamu (French). We aim at adding more datasets in the future to support speaker diarization on the Hub.
|
17 |
|
18 |
- A collection of [5 fine-tuned segmentation model](https://huggingface.co/collections/diarizers-community/models-66261d0f9277b825c807ff2a) baselines that can be used in a pyannote speaker diarization pipeline.
|
19 |
|
20 |
-
|
21 |
|
22 |
|
23 |
** ADD BENCHMARK **
|
24 |
|
25 |
-
Note: Results have been obtained using
|
26 |
|
27 |
-
|
28 |
|
29 |
-
-
|
30 |
|
31 |
-
- A google colab [notebook](https://colab.research.google.com/github/kamilakesbi/notebooks/blob/main/fine_tune_pyannote.ipynb), whith a step-by-step guide on how to use diarizers.
|
32 |
|
33 |
|
34 |
Edit this `README.md` markdown file to author your organization card.
|
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
[diarizers-community](https://huggingface.co/diarizers-community) aims to promote speaker diarization on the Hugging Face hub. It contains:
|
11 |
|
12 |
+
- A collection of [multilingual speaker diarization datasets](https://huggingface.co/collections/diarizers-community/speaker-diarization-datasets-66261b8d571552066e003788) that are compatible with the [diarizers](https://github.com/kamilakesbi/diarizers) library. They have been processed using [diarizers scripts](https://github.com/kamilakesbi/diarizers/blob/main/datasets/README.md).
|
13 |
|
14 |
+
The currently available datasets are the CallHome (Japanese, Chinese, German, Spanish, English), the AMI Corpus (English), Vox-Converse (English) and Simsamu (French). We aim to add more datasets in the future to better support speaker diarising on the Hub.
|
|
|
|
|
15 |
|
16 |
- A collection of [5 fine-tuned segmentation model](https://huggingface.co/collections/diarizers-community/models-66261d0f9277b825c807ff2a) baselines that can be used in a pyannote speaker diarization pipeline.
|
17 |
|
18 |
+
Each model has been fine-tuned on a specific language of the Callhome dataset. Compared to the pre-trained pyannote [segmentation model](https://huggingface.co/pyannote/segmentation-3.0), they achieve better performances on multlingual data:
|
19 |
|
20 |
|
21 |
** ADD BENCHMARK **
|
22 |
|
23 |
+
Note: Results have been obtained using [test scripts](https://github.com/kamilakesbi/diarizers/blob/main/test_segmentation.py) from diarizers.
|
24 |
|
25 |
+
diarizers-community comes with:
|
26 |
|
27 |
+
- [diarizers](https://github.com/kamilakesbi/diarizers/tree/main)a library for fine-tuning pyannote speaker diarization models using the Hugging Face ecosystem. It can be used to improve performance on both English and multilingual diarization datasets with simple example scripts, with as little as ten hours of labelled diarization data and just 5 minutes of GPU compute time.
|
28 |
|
29 |
+
- A google colab [notebook](https://colab.research.google.com/github/kamilakesbi/notebooks/blob/main/fine_tune_pyannote.ipynb), whith a step-by-step guide on how to use diarizers for fine-tunning pyannote segmentation model.
|
30 |
|
31 |
|
32 |
Edit this `README.md` markdown file to author your organization card.
|