whisperkittools generated README.md
Browse files
README.md
CHANGED
@@ -10,43 +10,48 @@ tags:
|
|
10 |
- asr
|
11 |
- quantized
|
12 |
---
|
13 |
-
#
|
14 |
|
15 |
|
16 |
|
17 |
## Dataset: `librispeech`
|
18 |
Short-form Audio (<30s/clip) - 5 hours of English audiobook clips
|
19 |
|
20 |
-
|
|
21 |
-
|
22 |
-
| WhisperOpenAIAPI/openai_whisper-large-v2
|
23 |
-
| [WhisperKit/openai_whisper-large-
|
24 |
-
| [WhisperKit/openai_whisper-large-
|
25 |
-
| [WhisperKit/openai_whisper-large-
|
26 |
-
| [WhisperKit/openai_whisper-large-
|
27 |
-
| [WhisperKit/openai_whisper-large-
|
28 |
-
| [WhisperKit/openai_whisper-large-
|
29 |
-
| [WhisperKit/openai_whisper-large-
|
30 |
-
| [WhisperKit/openai_whisper-
|
31 |
-
| [WhisperKit/
|
32 |
-
| [WhisperKit/
|
33 |
-
| [WhisperKit/
|
34 |
-
| [WhisperKit/
|
35 |
-
| [WhisperKit/openai_whisper-
|
36 |
-
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
## Dataset: `earnings22`
|
39 |
Long-Form Audio (>1hr/clip) - 120 hours of earnings call recordings in English with various accents
|
40 |
|
41 |
-
| | WER (β) | QoI (β) | File Size (MB) | Code Commit
|
42 |
-
|
43 |
-
| WhisperOpenAIAPI/openai_whisper-large-v2 | [16.27](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/earnings22) | 100 | 3100 | N/A
|
44 |
-
| [WhisperKit/openai_whisper-large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3) | [15.17](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/earnings22) | 58.5 | 3100 | 2846fd9
|
45 |
-
| [WhisperKit/openai_whisper-base.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base.en) | [23.49](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/earnings22) | 6.5 | 145 | dda6571
|
46 |
-
| [WhisperKit/openai_whisper-tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en) | [28.64](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/earnings22) | 5.7 | 66 | dda6571
|
47 |
-
| whisper.cpp/openai_whisper-large-v3 | [33.58](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/whisper.cpp/openai_whisper-large-v3/earnings22) | 6.5 | 3100 | 25d313b |
|
48 |
|
49 |
|
|
|
|
|
50 |
We believe that rigorously measuring the quality of inference is necessary for developers and
|
51 |
enterprises to make informed decisions when opting to use optimized or compressed variants of
|
52 |
any machine learning model in production. To contextualize `WhisperKit`, we take the following Whisper
|
@@ -91,11 +96,11 @@ the tooling necessary to run the same measurements on such custom test sets, ple
|
|
91 |
- [earnings22](https://huggingface.co/datasets/argmaxinc/earnings22): ~120 hours of English audio clips from earnings calls with various accents, tests long-form transcription quality
|
92 |
|
93 |
### Reproducing Results
|
94 |
-
|
95 |
-
Github Actions as our CI infrastructure. Due to [security concerns](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#hardening-for-self-hosted-runners),
|
96 |
we are unable to open up the cluster to the public. However, any Apple Silicon Mac (even with 8GB RAM) can be used to
|
97 |
run identical [evaluation jobs](#evaluation) locally. For reference, our M2 Ultra devices complete a `librispeech` + `openai/whisper-large-v3`
|
98 |
-
evaluation in under 1 hour regardless of the Whisper implementation.
|
99 |
|
100 |
|
101 |
|
|
|
10 |
- asr
|
11 |
- quantized
|
12 |
---
|
13 |
+
# Whisper Transcription Quality
|
14 |
|
15 |
|
16 |
|
17 |
## Dataset: `librispeech`
|
18 |
Short-form Audio (<30s/clip) - 5 hours of English audiobook clips
|
19 |
|
20 |
+
| | WER (β) | QoI (β) | File Size (MB) | Code Commit |
|
21 |
+
|:--------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------|----------:|-----------------:|:---------------------------------------------------------------|
|
22 |
+
| WhisperOpenAIAPI/openai_whisper-large-v2 | [2.35](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/librispeech) | 100 | 3100 | N/A |
|
23 |
+
| [WhisperKit/openai_whisper-large-v2](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2) | [2.77](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2/librispeech) | 96.6 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
|
24 |
+
| [WhisperKit/openai_whisper-large-v2_949MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_949MB) | [2.4](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_949MB/librispeech) | 94.6 | 949 | [Link](https://github.com/argmaxinc/WhisperKit/commit/eca4a2e) |
|
25 |
+
| [WhisperKit/openai_whisper-large-v2_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_turbo) | [2.76](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo/librispeech) | 96.6 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
|
26 |
+
| [WhisperKit/openai_whisper-large-v2_turbo_955MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_turbo_955MB) | [2.41](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo_955MB/librispeech) | 94.6 | 955 | [Link](https://github.com/argmaxinc/WhisperKit/commit/cf75348) |
|
27 |
+
| [WhisperKit/openai_whisper-large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3) | [2.04](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/librispeech) | 95.2 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
|
28 |
+
| [WhisperKit/openai_whisper-large-v3_947MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3_947MB) | [2.46](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_947MB/librispeech) | 93.9 | 947 | [Link](https://github.com/argmaxinc/WhisperKit/commit/eca4a2e) |
|
29 |
+
| [WhisperKit/openai_whisper-large-v3_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3_turbo) | [2.03](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo/librispeech) | 95.4 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
|
30 |
+
| [WhisperKit/openai_whisper-large-v3_turbo_954MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3_turbo_954MB) | [2.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo_954MB/librispeech) | 93.9 | 954 | [Link](https://github.com/argmaxinc/WhisperKit/commit/cf75348) |
|
31 |
+
| [WhisperKit/distil-whisper_distil-large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3) | [2.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3/librispeech) | 89.7 | 1510 | [Link](https://github.com/argmaxinc/WhisperKit/commit/cf75348) |
|
32 |
+
| [WhisperKit/distil-whisper_distil-large-v3_594MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3_594MB) | [2.96](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3_594MB/librispeech) | 85.4 | 594 | [Link](https://github.com/argmaxinc/WhisperKit/commit/508240f) |
|
33 |
+
| [WhisperKit/distil-whisper_distil-large-v3_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3_turbo) | [2.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3_turbo/librispeech) | 89.7 | 1510 | [Link](https://github.com/argmaxinc/WhisperKit/commit/508240f) |
|
34 |
+
| [WhisperKit/distil-whisper_distil-large-v3_turbo_600MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3_turbo_600MB) | [2.78](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3_turbo_600MB/librispeech) | 86.2 | 600 | [Link](https://github.com/argmaxinc/WhisperKit/commit/ae1cf96) |
|
35 |
+
| [WhisperKit/openai_whisper-small.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-small.en) | [3.12](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small.en/librispeech) | 85.8 | 483 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
|
36 |
+
| [WhisperKit/openai_whisper-small](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-small) | [3.45](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small/librispeech) | 83 | 483 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
|
37 |
+
| [WhisperKit/openai_whisper-base.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base.en) | [3.98](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/librispeech) | 75.3 | 145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
|
38 |
+
| [WhisperKit/openai_whisper-base](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base) | [4.97](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base/librispeech) | 67.2 | 145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
|
39 |
+
| [WhisperKit/openai_whisper-tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en) | [5.61](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/librispeech) | 63.9 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
|
40 |
+
| [WhisperKit/openai_whisper-tiny](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny) | [7.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny/librispeech) | 52.5 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
|
41 |
|
42 |
## Dataset: `earnings22`
|
43 |
Long-Form Audio (>1hr/clip) - 120 hours of earnings call recordings in English with various accents
|
44 |
|
45 |
+
| | WER (β) | QoI (β) | File Size (MB) | Code Commit |
|
46 |
+
|:------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------|----------:|-----------------:|:---------------------------------------------------------------|
|
47 |
+
| WhisperOpenAIAPI/openai_whisper-large-v2 | [16.27](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/earnings22) | 100 | 3100 | N/A |
|
48 |
+
| [WhisperKit/openai_whisper-large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3) | [15.17](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/earnings22) | 58.5 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
|
49 |
+
| [WhisperKit/openai_whisper-base.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base.en) | [23.49](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/earnings22) | 6.5 | 145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/dda6571) |
|
50 |
+
| [WhisperKit/openai_whisper-tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en) | [28.64](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/earnings22) | 5.7 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/dda6571) |
|
|
|
51 |
|
52 |
|
53 |
+
### Explanation
|
54 |
+
|
55 |
We believe that rigorously measuring the quality of inference is necessary for developers and
|
56 |
enterprises to make informed decisions when opting to use optimized or compressed variants of
|
57 |
any machine learning model in production. To contextualize `WhisperKit`, we take the following Whisper
|
|
|
96 |
- [earnings22](https://huggingface.co/datasets/argmaxinc/earnings22): ~120 hours of English audio clips from earnings calls with various accents, tests long-form transcription quality
|
97 |
|
98 |
### Reproducing Results
|
99 |
+
Benchmark results on this page were automatically generated by [whisperkittools](https://github.com/argmaxinc/whisperkittools). We use our cluster of Apple Silicon Macs as self-hosted runners on
|
100 |
+
Github Actions as our CI infrastructure to periodically recompute these benchmarks. Due to [security concerns](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#hardening-for-self-hosted-runners),
|
101 |
we are unable to open up the cluster to the public. However, any Apple Silicon Mac (even with 8GB RAM) can be used to
|
102 |
run identical [evaluation jobs](#evaluation) locally. For reference, our M2 Ultra devices complete a `librispeech` + `openai/whisper-large-v3`
|
103 |
+
evaluation in under 1 hour regardless of the Whisper implementation. Oldest Apple Silicon Macs should take less than 1 day to complete the same evaluation.
|
104 |
|
105 |
|
106 |
|