whisperkittools generated README.md
Browse files
README.md
CHANGED
@@ -17,34 +17,34 @@ tags:
|
|
17 |
## Dataset: `librispeech`
|
18 |
Short-form Audio (<30s/clip) - 5 hours of English audiobook clips
|
19 |
|
20 |
-
|
|
21 |
-
|
22 |
-
|
|
23 |
-
| [WhisperKit/openai_whisper-large-v3](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/librispeech)
|
24 |
-
| [WhisperKit/openai_whisper-large-v3_turbo](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo/librispeech)
|
25 |
-
| [WhisperKit/openai_whisper-large-v3_turbo_1018MB](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo_1018MB/librispeech) |
|
26 |
-
| [WhisperKit/openai_whisper-large-v2](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2/librispeech)
|
27 |
-
| [WhisperKit/openai_whisper-large-v2_1050MB](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_1050MB/librispeech)
|
28 |
-
| [WhisperKit/openai_whisper-large-v2_turbo](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo/librispeech)
|
29 |
-
| [WhisperKit/openai_whisper-large-v2_turbo_1022MB](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo_1022MB/librispeech) |
|
30 |
-
| [WhisperKit/openai_whisper-small.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small.en/librispeech)
|
31 |
-
| [WhisperKit/openai_whisper-small](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small/librispeech)
|
32 |
-
| [WhisperKit/openai_whisper-base.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/librispeech)
|
33 |
-
| [WhisperKit/openai_whisper-base](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base/librispeech)
|
34 |
-
| [WhisperKit/openai_whisper-tiny.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/librispeech)
|
35 |
-
| [WhisperKit/openai_whisper-tiny](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny/librispeech)
|
36 |
-
|
|
37 |
|
38 |
## Dataset: `earnings22`
|
39 |
Long-Form Audio (>1hr/clip) - 120 hours of earnings call recordings in English with various accents
|
40 |
|
41 |
-
|
|
42 |
-
|
43 |
-
|
|
44 |
-
| [WhisperKit/openai_whisper-large-v3](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/earnings22)
|
45 |
-
| [WhisperKit/openai_whisper-base.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/earnings22)
|
46 |
-
| [WhisperKit/openai_whisper-tiny.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/earnings22)
|
47 |
-
|
|
48 |
|
49 |
|
50 |
We believe that rigorously measuring the quality of inference is necessary for developers and
|
@@ -54,12 +54,14 @@ implementations and benchmark them using a consistent evaluation harness:
|
|
54 |
|
55 |
Server-side:
|
56 |
- `WhisperOpenAIAPI`: [OpenAI's Whisper API](https://platform.openai.com/docs/guides/speech-to-text)
|
|
|
57 |
($0.36 per hour of audio as of 02/29/24, 25MB file size limit per request)
|
58 |
|
59 |
On-device:
|
60 |
- `WhisperKit`: Argmax's implementation [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L100) [[Repo]](https://github.com/argmaxinc/WhisperKit)
|
61 |
- `whisper.cpp`: A C++ implementation form ggerganov [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L212) [[Repo]](https://github.com/ggerganov/whisper.cpp)
|
62 |
- `WhisperMLX`: A Python implementation from Apple MLX [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L338) [[Repo]](https://github.com/ml-explore/mlx-examples/blob/main/whisper/whisper/transcribe.py)
|
|
|
63 |
(All on-device implementations are available for free under MIT license as of 03/19/2024)
|
64 |
|
65 |
`WhisperOpenAIAPI` sets the reference and we assume that it is using the equivalent of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)
|
|
|
17 |
## Dataset: `librispeech`
|
18 |
Short-form Audio (<30s/clip) - 5 hours of English audiobook clips
|
19 |
|
20 |
+
| | WER (β) | QoI (β) | File Size (MB) |
|
21 |
+
|:--------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------|----------:|-----------------:|
|
22 |
+
| WhisperOpenAIAPI/openai_whisper-large-v2 | [2.35](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/librispeech) | 100 | 3100 |
|
23 |
+
| [WhisperKit/openai_whisper-large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3) | [2.04](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/librispeech) | 95.2 | 3100 |
|
24 |
+
| [WhisperKit/openai_whisper-large-v3_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3_turbo) | [2.03](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo/librispeech) | 95.4 | 3100 |
|
25 |
+
| [WhisperKit/openai_whisper-large-v3_turbo_1018MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3_turbo_1018MB) | [1.99](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo_1018MB/librispeech) | 94.8 | 1018 |
|
26 |
+
| [WhisperKit/openai_whisper-large-v2](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2) | [2.77](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2/librispeech) | 96.6 | 3100 |
|
27 |
+
| [WhisperKit/openai_whisper-large-v2_1050MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_1050MB) | [2.81](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_1050MB/librispeech) | 95 | 1050 |
|
28 |
+
| [WhisperKit/openai_whisper-large-v2_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_turbo) | [2.76](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo/librispeech) | 96.6 | 3100 |
|
29 |
+
| [WhisperKit/openai_whisper-large-v2_turbo_1022MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_turbo_1022MB) | [2.66](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo_1022MB/librispeech) | 94.9 | 1022 |
|
30 |
+
| [WhisperKit/openai_whisper-small.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-small.en) | [3.12](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small.en/librispeech) | 85.8 | 483 |
|
31 |
+
| [WhisperKit/openai_whisper-small](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-small) | [3.45](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small/librispeech) | 83 | 483 |
|
32 |
+
| [WhisperKit/openai_whisper-base.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base.en) | [3.98](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/librispeech) | 75.3 | 145 |
|
33 |
+
| [WhisperKit/openai_whisper-base](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base) | [4.97](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base/librispeech) | 67.2 | 145 |
|
34 |
+
| [WhisperKit/openai_whisper-tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en) | [5.61](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/librispeech) | 63.9 | 66 |
|
35 |
+
| [WhisperKit/openai_whisper-tiny](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny) | [7.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny/librispeech) | 52.5 | 66 |
|
36 |
+
| whisper.cpp/openai_whisper-large-v3 | [1.97](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/whisper.cpp/openai_whisper-large-v3/librispeech) | 95.4 | 3100 |
|
37 |
|
38 |
## Dataset: `earnings22`
|
39 |
Long-Form Audio (>1hr/clip) - 120 hours of earnings call recordings in English with various accents
|
40 |
|
41 |
+
| | WER (β) | QoI (β) | File Size (MB) |
|
42 |
+
|:------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------|----------:|-----------------:|
|
43 |
+
| WhisperOpenAIAPI/openai_whisper-large-v2 | [16.27](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/earnings22) | 100 | 3100 |
|
44 |
+
| [WhisperKit/openai_whisper-large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3) | [15.17](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/earnings22) | 58.5 | 3100 |
|
45 |
+
| [WhisperKit/openai_whisper-base.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base.en) | [23.49](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/earnings22) | 6.5 | 145 |
|
46 |
+
| [WhisperKit/openai_whisper-tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en) | [28.64](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/earnings22) | 5.7 | 66 |
|
47 |
+
| whisper.cpp/openai_whisper-large-v3 | [33.58](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/whisper.cpp/openai_whisper-large-v3/earnings22) | 6.5 | 3100 |
|
48 |
|
49 |
|
50 |
We believe that rigorously measuring the quality of inference is necessary for developers and
|
|
|
54 |
|
55 |
Server-side:
|
56 |
- `WhisperOpenAIAPI`: [OpenAI's Whisper API](https://platform.openai.com/docs/guides/speech-to-text)
|
57 |
+
|
58 |
($0.36 per hour of audio as of 02/29/24, 25MB file size limit per request)
|
59 |
|
60 |
On-device:
|
61 |
- `WhisperKit`: Argmax's implementation [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L100) [[Repo]](https://github.com/argmaxinc/WhisperKit)
|
62 |
- `whisper.cpp`: A C++ implementation form ggerganov [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L212) [[Repo]](https://github.com/ggerganov/whisper.cpp)
|
63 |
- `WhisperMLX`: A Python implementation from Apple MLX [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L338) [[Repo]](https://github.com/ml-explore/mlx-examples/blob/main/whisper/whisper/transcribe.py)
|
64 |
+
|
65 |
(All on-device implementations are available for free under MIT license as of 03/19/2024)
|
66 |
|
67 |
`WhisperOpenAIAPI` sets the reference and we assume that it is using the equivalent of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)
|