Best Speech Recognition (Whisper) Models

Whisper variants are the dominant open-source speech-to-text family, with newer additions like Distil-Whisper and Faster-Whisper offering 4–6× speedups. Pick based on your latency vs accuracy tradeoff.

1
Whisper Large v3 Turbo
0.81B params
Optimized large Whisper model. Near-best accuracy with faster inference.
Min VRAM: 2.01GBQuant: Q8_0Size: 1.513GBLicense: mit
2
Whisper Large v3
1.55B params
Largest Whisper model. Best accuracy across all languages and accents.
Min VRAM: 3.38GBQuant: Q8_0Size: 2.882GBLicense: mit
3
Whisper Small
0.24B params
Compact Whisper model. Good accuracy for everyday transcription tasks.
Min VRAM: 0.95GBQuant: Q8_0Size: 0.454GBLicense: mit
4
Whisper Base
0.074B params
Base whisper model. Good balance of speed and accuracy. 142MB.
Min VRAM: 0.3GBQuant: Q8_0Size: 0.142GBLicense: mit
5
Distil-Whisper Large v3
0.76B params
Distilled Whisper. 6x faster than large-v3 with 1% accuracy loss.
Min VRAM: 1.92GBQuant: Q8_0Size: 1.415GBLicense: mit
6
Whisper Tiny
0.039B params
Tiny multilingual speech recognition. Only 75MB. Supports 99 languages. Runs on any device.
Min VRAM: 0.2GBQuant: Q8_0Size: 0.075GBLicense: mit
7
Whisper Medium
0.77B params
Mid-size Whisper model. Strong multilingual speech recognition.
Min VRAM: 1.93GBQuant: Q8_0Size: 1.428GBLicense: mit
8
Whisper Tiny English (Quantized)
0.039B params
Smallest possible speech recognition model. Only 32MB. English only. Default speech model - guaranteed to run on any iPhone.
Min VRAM: 0.1GBQuant: Q5_1Size: 0.032GBLicense: mit
9
Whisper Base English
0.074B params
English-only base model. Faster and more accurate for English.
Min VRAM: 0.3GBQuant: Q8_0Size: 0.142GBLicense: mit

Related

All speech models