README / README.md
asahi417's picture
Update README.md
c03d6e0 verified
---
title: README
emoji: 🐨
colorFrom: blue
colorTo: yellow
sdk: static
pinned: false
---
# Japanese ASR
This repository contains all the models and datasets for train/evaluate the Japanese ASR dataset generated through the process of achieving [kotoba-whisper models](https://huggingface.co/collections/kotoba-tech/kotoba-whisper-661d04846a2892cc27a23921).
Following table shows CER comparison with different data size of ReazonSpeech used to distill [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3). The model names follows
`japanese-asr/distil-whisper-large-v3-ja-reazonspeech-{size of reazonspeech}`.
***CER***
| model | [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) | [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) | [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
|:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:|
| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all) | 9.2 | 8.4 | 11.6 |
| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large) | 9.4 | 8.5 | 12.2 |
| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium) | 10.9 | 11.3 | 14.8 |
| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small) | 30.2 | 39 | 40.7 |
| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny) | 94.8 | 96.3 | 96.7 |
| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 8.5 | 7.1 | 14.9 |
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 9.7 | 8.2 | 28.1 |
| [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 10 | 8.9 | 34.1 |
| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 11.5 | 10 | 33.2 |
| [openai/whisper-base](https://huggingface.co/openai/whisper-base) | 28.6 | 24.9 | 70.4 |
| [openai/whisper-small](https://huggingface.co/openai/whisper-small) | 15.1 | 14.2 | 41.5 |
| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 53.7 | 36.5 | 137.9 |
| [reazon-research/reazonspeech-nemo-v2](https://huggingface.co/reazon-research/reazonspeech-nemo-v2) | 9.1 | 7.4 | 11.2 |
***WER***
| model | [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) | [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) | [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
|:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:|
| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all) | 58.8 | 63.7 | 55.6 |
| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large) | 59.2 | 64.3 | 56.4 |
| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium) | 64.6 | 72.1 | 63 |
| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small) | 85 | 94.2 | 82.1 |
| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny) | 100 | 100 | 99 |
| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 55.1 | 59.2 | 60.2 |
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 59.3 | 63.2 | 74.1 |
| [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 61.1 | 66.4 | 74.9 |
| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 63.4 | 69.5 | 76 |
| [openai/whisper-base](https://huggingface.co/openai/whisper-base) | 87.2 | 93 | 91.8 |
| [openai/whisper-small](https://huggingface.co/openai/whisper-small) | 74.2 | 81.9 | 83 |
| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 93.8 | 97.6 | 94.9 |
| [reazon-research/reazonspeech-nemo-v2](https://huggingface.co/reazon-research/reazonspeech-nemo-v2) | 57.5 | 60.6 | 47.5 |
Note that [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0) is an alias of [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large)
and [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0) is an alias of [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all).
Please find more detailed results at [kotoba-whisper codebase](https://github.com/kotoba-tech/kotoba-whisper).