Best Speech Recognition (Whisper) Models
Whisper variants are the dominant open-source speech-to-text family, with newer additions like Distil-Whisper and Faster-Whisper offering 4–6× speedups. Pick based on your latency vs accuracy tradeoff.
- 1
Whisper Large v3 Turbo
0.81B paramsOptimized large Whisper model. Near-best accuracy with faster inference.
Min VRAM: 2.01GBQuant: Q8_0Size: 1.513GBLicense: mit - 2
Whisper Large v3
1.55B paramsLargest Whisper model. Best accuracy across all languages and accents.
Min VRAM: 3.38GBQuant: Q8_0Size: 2.882GBLicense: mit - 3
Whisper Small
0.24B paramsCompact Whisper model. Good accuracy for everyday transcription tasks.
Min VRAM: 0.95GBQuant: Q8_0Size: 0.454GBLicense: mit - 4
Whisper Base
0.074B paramsBase whisper model. Good balance of speed and accuracy. 142MB.
Min VRAM: 0.3GBQuant: Q8_0Size: 0.142GBLicense: mit - 5
Distil-Whisper Large v3
0.76B paramsDistilled Whisper. 6x faster than large-v3 with 1% accuracy loss.
Min VRAM: 1.92GBQuant: Q8_0Size: 1.415GBLicense: mit - 6
Whisper Tiny
0.039B paramsTiny multilingual speech recognition. Only 75MB. Supports 99 languages. Runs on any device.
Min VRAM: 0.2GBQuant: Q8_0Size: 0.075GBLicense: mit - 7
Whisper Medium
0.77B paramsMid-size Whisper model. Strong multilingual speech recognition.
Min VRAM: 1.93GBQuant: Q8_0Size: 1.428GBLicense: mit - 8
Whisper Tiny English (Quantized)
0.039B paramsSmallest possible speech recognition model. Only 32MB. English only. Default speech model - guaranteed to run on any iPhone.
Min VRAM: 0.1GBQuant: Q5_1Size: 0.032GBLicense: mit - 9
Whisper Base English
0.074B paramsEnglish-only base model. Faster and more accurate for English.
Min VRAM: 0.3GBQuant: Q8_0Size: 0.142GBLicense: mit