ASR Model Card: parakeet-ctc-1.1b-ja
Model Details
- Model Name: parakeet-ctc-1.1b-ja
- Type: Automatic Speech Recognition (ASR)
- Language: Japanese
- Framework: NVIDIA NeMo
Installation
To use this model, you need to install the NeMo toolkit:
pip install nemo-toolkit==2.0.0rc0 nemo-toolkit[asr]==2.0.0rc0
Usage
Here's a basic example of how to use the model:
import nemo.collections.asr as nemo_asr
# Load the model
nemo_model = nemo_asr.models.ASRModel.restore_from("/path/to/parakeet-ja.nemo")
# Transcribe audio files
audio_files = ["path/to/audio1.wav", "path/to/audio2.wav"]
transcriptions = nemo_model.transcribe(audio_files)
# Print transcriptions
for audio_file, transcription in zip(audio_files, transcriptions):
print(f"Transcription for {audio_file}: {transcription}")
Limitations
- This model is specifically trained for Japanese language and may not perform well on other languages.
- The accuracy of transcription may vary depending on the audio quality, background noise, and speaker accent.
- The model may struggle with specialized vocabulary or technical terms not encountered during training.
Performance
The following table compares the performance of the NeMo model (Parakeet-JA) with Whisper v2 large and Whisper v3 large across different Japanese ASR datasets:
Model | Dataset | WER | CER |
---|---|---|---|
Whisper v2 large | japanese-asr/ja_asr.reazonspeech_test | 1.1378 | 0.3472 |
japanese-asr/ja_asr.jsut_basic5000 | 0.8988 | 0.1063 | |
japanese-asr/ja_asr.common_voice_8_0 | 1.0314 | 0.1594 | |
Whisper v3 large | japanese-asr/ja_asr.reazonspeech_test | 0.9685 | 0.2107 |
japanese-asr/ja_asr.jsut_basic5000 | 0.9936 | 0.1360 | |
japanese-asr/ja_asr.common_voice_8_0 | 1.0178 | 0.1548 | |
NeMo (parakeet-ctc-1.1b-ja) | japanese-asr/ja_asr.reazonspeech_test | 0.7785 | 0.1521 |
japanese-asr/ja_asr.jsut_basic5000 | 0.9462 | 0.1291 | |
japanese-asr/ja_asr.common_voice_8_0 | 1.0002 | 0.1290 |
Ethical Considerations
- Ensure that you have the necessary permissions and comply with local laws when recording and transcribing audio.
- Be aware of potential biases in the model, especially regarding different Japanese dialects or accents.
- Consider the privacy implications of transcribing personal or sensitive conversations.
Additional Information
For more detailed information on using ASR models with the NeMo toolkit, please refer to the NeMo ASR documentation.
- Downloads last month
- 5
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.