metadata
title: README
emoji: π
colorFrom: yellow
colorTo: green
sdk: static
pinned: false
pyannote.audio is an open-source toolkit for speaker diarization.
Pretrained pipelines reach state-of-the-art performance on most academic benchmarks and are used in production by dozens of companies.
Benchmark | v2.1 | v3.1 | pyannoteAI |
---|---|---|---|
AISHELL-4 | 14.1 | 12.2 | 11.2 |
AliMeeting (channel 1) | 27.4 | 24.4 | 19.3 |
AMI (IHM) | 18.9 | 18.8 | 15.8 |
AMI (SDM) | 27.1 | 22.4 | 19.3 |
AVA-AVD | 66.3 | 50.0 | 44.8 |
CALLHOME (part 2) | 31.6 | 28.4 | 19.8 |
DIHARD 3 (full) | 26.9 | 21.7 | 16.8 |
Earnings21 | 17.0 | 9.4 | 9.1 |
Ego4D (dev.) | 61.5 | 51.2 | 44.0 |
MSDWild | 32.8 | 25.3 | 19.8 |
RAMC | 22.5 | 22.2 | 11.1 |
REPERE (phase2) | 8.2 | 7.8 | 7.6 |
VoxConverse (v0.3) | 11.2 | 11.3 | 9.8 |
Diarization error rate (in %) |
Using high-end NVIDIA hardware,
- v2.1 takes around 1m30s to process 1h of audio
- v3.1 takes around 1m20s to process 1h of audio
- On-premise pyannoteAI takes less than 30s to process 1h of audio