Best Tiny Models (Under 2B Params)
Tiny models punch surprisingly far above their weight. SmolLM, Qwen 0.5B, and Phi-3 Mini all run on phones and in browser tabs. If you're targeting edge deployment, start here.
- 1
all-MiniLM-L6-v2
0.023B paramsTiny embedding model. Only 23MB. Perfect for on-device search.
Min VRAM: 0.1GBQuant: Q8_0Size: 0.023GBLicense: apache-2.0 - 2
BGE Small EN v1.5
0.033B paramsCompact English embedding model. Good for basic semantic search.
Min VRAM: 0.1GBQuant: Q8_0Size: 0.036GBLicense: mit - 3
Nomic Embed Text v1.5
0.137B paramsHigh quality text embedding model. 137M params. Good for RAG and search.
Min VRAM: 0.3GBQuant: Q8_0Size: 0.139GBLicense: apache-2.0 - 4
Qwen 2.5 1.5B
1.5B paramsCompact 1.5B model with strong multilingual and coding abilities.
Min VRAM: 1.54GBQuant: Q4_K_MSize: 1.041GBLicense: apache-2.0 - 5
BGE Large EN v1.5
0.335B paramsHigh quality English embedding model. Best accuracy for English search.
Min VRAM: 0.83GBQuant: Q8_0Size: 0.334GBLicense: mit - 6
Whisper Large v3 Turbo
0.81B paramsOptimized large Whisper model. Near-best accuracy with faster inference.
Min VRAM: 2.01GBQuant: Q8_0Size: 1.513GBLicense: mit - 7
BGE Reranker v2 M3
0.568B paramsMultilingual reranker. 100+ languages. 1.1GB.
Min VRAM: 1.58GBQuant: FP16Size: 1.08GBLicense: mit - 8
Qwen 2.5 0.5B
0.5B paramsUltra-small 0.5B model from Alibaba. Minimal resource requirements.
Min VRAM: 0.96GBQuant: Q4_K_MSize: 0.458GBLicense: apache-2.0 - 9
Whisper Large v3
1.55B paramsLargest Whisper model. Best accuracy across all languages and accents.
Min VRAM: 3.38GBQuant: Q8_0Size: 2.882GBLicense: mit - 10
Llama 3.2 1B Instruct
1.24B paramsUltra-compact 1B model. Runs on virtually any device including smartphones.
Min VRAM: 1.25GBQuant: Q4_K_MSize: 0.752GBLicense: llama3.2 - 11
TinyLlama 1.1B
1.1B paramsLightweight 1.1B chat model based on Llama architecture. Great for phones.
Min VRAM: 1.12GBQuant: Q4_K_MSize: 0.623GBLicense: apache-2.0 - 12
Moondream 2
1.8B paramsUltra-compact vision model. Only 1GB. Answers questions about images.
Min VRAM: 1.5GBQuant: Q4_K_MSize: 1GBLicense: apache-2.0 - 13
Whisper Small
0.24B paramsCompact Whisper model. Good accuracy for everyday transcription tasks.
Min VRAM: 0.95GBQuant: Q8_0Size: 0.454GBLicense: mit - 14
Stable Diffusion 1.5 (CoreML)
0.86B paramsClassic image generation model. Pre-converted to CoreML for iOS/Mac. Downloads as zip, auto-extracts.
Min VRAM: 2.5GBQuant: CoreML-PalettizedSize: 1.46GBLicense: creativeml-openrail-m - 15
Whisper Base
0.074B paramsBase whisper model. Good balance of speed and accuracy. 142MB.
Min VRAM: 0.3GBQuant: Q8_0Size: 0.142GBLicense: mit