asahi417 commited on
Commit
80fed0a
1 Parent(s): 28b3c9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -9
README.md CHANGED
@@ -32,20 +32,26 @@ for segment in segments:
32
  ```
33
 
34
  ### Benchmark
35
- We measure the inference speed with four different Japanese speech audio on MacBook Pro with the following spec:
36
  - Apple M2 Pro
37
  - 32GB
38
  - 14-inch, 2023
39
  - OS Sonoma Version 14.4.1 (23E224)
40
 
41
-
42
-
43
- | audio file | audio duration (min)| inference time (sec) |
44
- |--|---------------------|-------------|
45
- |audio 1 | 50.3 | 2601 |
46
- |audio 2 | 5.6 | 73 |
47
- |audio 3 | 4.9 | 141 |
48
- |audio 4 | 5.6 | 126 |
 
 
 
 
 
 
49
 
50
 
51
  ## Conversion details
 
32
  ```
33
 
34
  ### Benchmark
35
+ We measure the inference speed of different kotoba-whisper-v1.0 implementations with four different Japanese speech audio on MacBook Pro with the following spec:
36
  - Apple M2 Pro
37
  - 32GB
38
  - 14-inch, 2023
39
  - OS Sonoma Version 14.4.1 (23E224)
40
 
41
+ | audio file | audio duration (min)| [whisper.cpp](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml) (sec) | [faster-whisper](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-faster) (sec)| [hf pipeline](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0) (sec)
42
+ |--------|------|-----|------|-----|
43
+ |audio 1 | 50.3 | 581 | 2601 | 807 |
44
+ |audio 2 | 5.6 | 41 | 73 | 61 |
45
+ |audio 3 | 4.9 | 30 | 141 | 54 |
46
+ |audio 4 | 5.6 | 35 | 126 | 69 |
47
+
48
+ Scripts to re-run the experiment can be found bellow:
49
+ * [whisper.cpp](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml/blob/main/benchmark.sh)
50
+ * [faster-whisper](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-faster/blob/main/benchmark.sh)
51
+ * [hf pipeline](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0/blob/main/benchmark.sh)
52
+ Also, currently whisper.cpp and faster-whisper support the [sequential long-form decoding](https://huggingface.co/distil-whisper/distil-large-v3#sequential-long-form),
53
+ and only Huggingface pipeline supports the [chunked long-form decoding](https://huggingface.co/distil-whisper/distil-large-v3#chunked-long-form), which we empirically
54
+ found better than the sequnential long-form decoding.
55
 
56
 
57
  ## Conversion details