pinned
Running
on
T4
110
🎹
speaker diarization // speaker recognition // speaker segmentation // voice activity detection // overlapped speech detection // speaker change detection
pyannote.audio is an open-source toolkit for speaker diarization.
Pretrained pipelines reach state-of-the-art performance on most academic benchmarks.
Using it in production?
Consider switching to pyannoteAI for better and faster options.
Benchmark | v2.1 | v3.1 | pyannoteAI |
---|---|---|---|
AISHELL-4 | 14.1 | 12.2 | 11.2 |
AliMeeting (channel 1) | 27.4 | 24.4 | 19.3 |
AMI (IHM) | 18.9 | 18.8 | 15.8 |
AMI (SDM) | 27.1 | 22.4 | 19.3 |
AVA-AVD | 66.3 | 50.0 | 44.8 |
CALLHOME (part 2) | 31.6 | 28.4 | 19.8 |
DIHARD 3 (full) | 26.9 | 21.7 | 16.8 |
Earnings21 | 17.0 | 9.4 | 9.1 |
Ego4D (dev.) | 61.5 | 51.2 | 44.0 |
MSDWild | 32.8 | 25.3 | 19.8 |
RAMC | 22.5 | 22.2 | 11.1 |
REPERE (phase2) | 8.2 | 7.8 | 7.6 |
VoxConverse (v0.3) | 11.2 | 11.3 | 9.8 |
Diarization error rate (in %) |
Using high-end NVIDIA hardware,