Automatic Speech Recognition
audio

Whisper-small OpenVINO IR

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning.

Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al from OpenAI. The original code repository can be found here.

Disclaimer: Content for this model card has partly been copied and pasted from this model card.

Model details

Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model.

Pipeline Image

Model Type Parameters n_audio_ctx n_audio_state n_audio_head n_audio_layer n_text_ctx n_text_state n_text_head n_text_layer n_mels n_vocab
whisper-tiny 39 M 1500 384 6 4 224 384 6 4 80 51865
whisper-base 74 M 1500 512 8 6 224 512 8 6 80 51865
whisper-small 244 M 1500 768 12 12 224 768 12 12 80 51865
whisper-medium 769 M 1500 1024 16 24 224 1024 16 16 80 51865
whisper-large-v1 1550 M 1500 1280 20 32 224 1280 20 20 80 51865
whisper-large-v2 1550 M 1500 1280 20 32 224 1280 20 20 80 51865
distil-whisper-large-v2 756 M 1500 1280 20 32 224 1280 20 2 80 51865
whisper-large-v3 1550 M 1500 1280 20 32 224 1280 20 20 128 51866
distil-whisper-large-v3 756 M 1500 1280 20 32 224 1280 20 2 128 51866
whisper-large-v3-turbo 809 M 1500 1280 20 32 224 1280 20 4 128 51866
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for Intel/whisper-small-openvino

Finetuned
(2312)
this model