Best Tiny Models (Under 2B Params)

Tiny models punch surprisingly far above their weight. SmolLM, Qwen 0.5B, and Phi-3 Mini all run on phones and in browser tabs. If you're targeting edge deployment, start here.

  1. 1

    all-MiniLM-L6-v2

    0.023B params

    Tiny embedding model. Only 23MB. Perfect for on-device search.

    Min VRAM: 0.1GBQuant: Q8_0Size: 0.023GBLicense: apache-2.0
  2. 2

    BGE Small EN v1.5

    0.033B params

    Compact English embedding model. Good for basic semantic search.

    Min VRAM: 0.1GBQuant: Q8_0Size: 0.036GBLicense: mit
  3. 3

    Nomic Embed Text v1.5

    0.137B params

    High quality text embedding model. 137M params. Good for RAG and search.

    Min VRAM: 0.3GBQuant: Q8_0Size: 0.139GBLicense: apache-2.0
  4. 4

    Qwen 2.5 1.5B

    1.5B params

    Compact 1.5B model with strong multilingual and coding abilities.

    Min VRAM: 1.54GBQuant: Q4_K_MSize: 1.041GBLicense: apache-2.0
  5. 5

    BGE Large EN v1.5

    0.335B params

    High quality English embedding model. Best accuracy for English search.

    Min VRAM: 0.83GBQuant: Q8_0Size: 0.334GBLicense: mit
  6. 6

    Whisper Large v3 Turbo

    0.81B params

    Optimized large Whisper model. Near-best accuracy with faster inference.

    Min VRAM: 2.01GBQuant: Q8_0Size: 1.513GBLicense: mit
  7. 7

    BGE Reranker v2 M3

    0.568B params

    Multilingual reranker. 100+ languages. 1.1GB.

    Min VRAM: 1.58GBQuant: FP16Size: 1.08GBLicense: mit
  8. 8

    Qwen 2.5 0.5B

    0.5B params

    Ultra-small 0.5B model from Alibaba. Minimal resource requirements.

    Min VRAM: 0.96GBQuant: Q4_K_MSize: 0.458GBLicense: apache-2.0
  9. 9

    Whisper Large v3

    1.55B params

    Largest Whisper model. Best accuracy across all languages and accents.

    Min VRAM: 3.38GBQuant: Q8_0Size: 2.882GBLicense: mit
  10. 10

    Llama 3.2 1B Instruct

    1.24B params

    Ultra-compact 1B model. Runs on virtually any device including smartphones.

    Min VRAM: 1.25GBQuant: Q4_K_MSize: 0.752GBLicense: llama3.2
  11. 11

    TinyLlama 1.1B

    1.1B params

    Lightweight 1.1B chat model based on Llama architecture. Great for phones.

    Min VRAM: 1.12GBQuant: Q4_K_MSize: 0.623GBLicense: apache-2.0
  12. 12

    Moondream 2

    1.8B params

    Ultra-compact vision model. Only 1GB. Answers questions about images.

    Min VRAM: 1.5GBQuant: Q4_K_MSize: 1GBLicense: apache-2.0
  13. 13

    Whisper Small

    0.24B params

    Compact Whisper model. Good accuracy for everyday transcription tasks.

    Min VRAM: 0.95GBQuant: Q8_0Size: 0.454GBLicense: mit
  14. 14

    Stable Diffusion 1.5 (CoreML)

    0.86B params

    Classic image generation model. Pre-converted to CoreML for iOS/Mac. Downloads as zip, auto-extracts.

    Min VRAM: 2.5GBQuant: CoreML-PalettizedSize: 1.46GBLicense: creativeml-openrail-m
  15. 15

    Whisper Base

    0.074B params

    Base whisper model. Good balance of speed and accuracy. 142MB.

    Min VRAM: 0.3GBQuant: Q8_0Size: 0.142GBLicense: mit

Related