Best Speech Recognition (Whisper) Models

Whisper variants are the dominant open-source speech-to-text family, with newer additions like Distil-Whisper and Faster-Whisper offering 4–6× speedups. Pick based on your latency vs accuracy tradeoff.

  1. 1

    Whisper Large v3 Turbo

    0.81B params

    Optimized large Whisper model. Near-best accuracy with faster inference.

    Min VRAM: 2.01GBQuant: Q8_0Size: 1.513GBLicense: mit
  2. 2

    Whisper Large v3

    1.55B params

    Largest Whisper model. Best accuracy across all languages and accents.

    Min VRAM: 3.38GBQuant: Q8_0Size: 2.882GBLicense: mit
  3. 3

    Whisper Small

    0.24B params

    Compact Whisper model. Good accuracy for everyday transcription tasks.

    Min VRAM: 0.95GBQuant: Q8_0Size: 0.454GBLicense: mit
  4. 4

    Whisper Base

    0.074B params

    Base whisper model. Good balance of speed and accuracy. 142MB.

    Min VRAM: 0.3GBQuant: Q8_0Size: 0.142GBLicense: mit
  5. 5

    Distil-Whisper Large v3

    0.76B params

    Distilled Whisper. 6x faster than large-v3 with 1% accuracy loss.

    Min VRAM: 1.92GBQuant: Q8_0Size: 1.415GBLicense: mit
  6. 6

    Whisper Tiny

    0.039B params

    Tiny multilingual speech recognition. Only 75MB. Supports 99 languages. Runs on any device.

    Min VRAM: 0.2GBQuant: Q8_0Size: 0.075GBLicense: mit
  7. 7

    Whisper Medium

    0.77B params

    Mid-size Whisper model. Strong multilingual speech recognition.

    Min VRAM: 1.93GBQuant: Q8_0Size: 1.428GBLicense: mit
  8. 8

    Whisper Tiny English (Quantized)

    0.039B params

    Smallest possible speech recognition model. Only 32MB. English only. Default speech model - guaranteed to run on any iPhone.

    Min VRAM: 0.1GBQuant: Q5_1Size: 0.032GBLicense: mit
  9. 9

    Whisper Base English

    0.074B params

    English-only base model. Faster and more accurate for English.

    Min VRAM: 0.3GBQuant: Q8_0Size: 0.142GBLicense: mit

Related