Automatic Speech Recognition
audio
Edit model card

Whisper-small OpenVINO IR

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning.

Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al from OpenAI. The original code repository can be found here.

Disclaimer: Content for this model card has partly been copied and pasted from this model card.

Model details

Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model.

Pipeline Image

Model Type Parameters n_audio_ctx n_audio_state n_audio_head n_audio_layer n_text_ctx n_text_state n_text_head n_text_layer n_mels n_vocab
whisper-tiny 39 M 1500 384 6 4 224 384 6 4 80 51865
whisper-base 74 M 1500 512 8 6 224 512 8 6 80 51865
whisper-small 244 M 1500 768 12 12 224 768 12 12 80 51865
whisper-medium 769 M 1500 1024 16 24 224 1024 16 16 80 51865
whisper-large-v1 1550 M 1500 1280 20 32 224 1280 20 20 80 51865
whisper-large-v2 1550 M 1500 1280 20 32 224 1280 20 20 80 51865
distil-whisper-large-v2 756 M 1500 1280 20 32 224 1280 20 2 80 51865
whisper-large-v3 1550 M 1500 1280 20 32 224 1280 20 20 128 51866
distil-whisper-large-v3 756 M 1500 1280 20 32 224 1280 20 2 128 51866
whisper-large-v3-turbo 809 M 1500 1280 20 32 224 1280 20 4 128 51866
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for Intel/whisper-small-openvino

Finetuned
(1928)
this model