Best Tiny Models (Under 2B Params)

Tiny models punch surprisingly far above their weight. SmolLM, Qwen 0.5B, and Phi-3 Mini all run on phones and in browser tabs. If you're targeting edge deployment, start here.

1
all-MiniLM-L6-v2
0.023B params
Tiny embedding model. Only 23MB. Perfect for on-device search.
Min VRAM: 0.1GBQuant: Q8_0Size: 0.023GBLicense: apache-2.0
2
BGE Small EN v1.5
0.033B params
Compact English embedding model. Good for basic semantic search.
Min VRAM: 0.1GBQuant: Q8_0Size: 0.036GBLicense: mit
3
Nomic Embed Text v1.5
0.137B params
High quality text embedding model. 137M params. Good for RAG and search.
Min VRAM: 0.3GBQuant: Q8_0Size: 0.139GBLicense: apache-2.0
4
Qwen 2.5 1.5B
1.5B params
Compact 1.5B model with strong multilingual and coding abilities.
Min VRAM: 1.54GBQuant: Q4_K_MSize: 1.041GBLicense: apache-2.0
5
BGE Large EN v1.5
0.335B params
High quality English embedding model. Best accuracy for English search.
Min VRAM: 0.83GBQuant: Q8_0Size: 0.334GBLicense: mit
6
Whisper Large v3 Turbo
0.81B params
Optimized large Whisper model. Near-best accuracy with faster inference.
Min VRAM: 2.01GBQuant: Q8_0Size: 1.513GBLicense: mit
7
BGE Reranker v2 M3
0.568B params
Multilingual reranker. 100+ languages. 1.1GB.
Min VRAM: 1.58GBQuant: FP16Size: 1.08GBLicense: mit
8
Qwen 2.5 0.5B
0.5B params
Ultra-small 0.5B model from Alibaba. Minimal resource requirements.
Min VRAM: 0.96GBQuant: Q4_K_MSize: 0.458GBLicense: apache-2.0
9
Whisper Large v3
1.55B params
Largest Whisper model. Best accuracy across all languages and accents.
Min VRAM: 3.38GBQuant: Q8_0Size: 2.882GBLicense: mit
10
Llama 3.2 1B Instruct
1.24B params
Ultra-compact 1B model. Runs on virtually any device including smartphones.
Min VRAM: 1.25GBQuant: Q4_K_MSize: 0.752GBLicense: llama3.2
11
TinyLlama 1.1B
1.1B params
Lightweight 1.1B chat model based on Llama architecture. Great for phones.
Min VRAM: 1.12GBQuant: Q4_K_MSize: 0.623GBLicense: apache-2.0
12
Moondream 2
1.8B params
Ultra-compact vision model. Only 1GB. Answers questions about images.
Min VRAM: 1.5GBQuant: Q4_K_MSize: 1GBLicense: apache-2.0
13
Whisper Small
0.24B params
Compact Whisper model. Good accuracy for everyday transcription tasks.
Min VRAM: 0.95GBQuant: Q8_0Size: 0.454GBLicense: mit
14
Stable Diffusion 1.5 (CoreML)
0.86B params
Classic image generation model. Pre-converted to CoreML for iOS/Mac. Downloads as zip, auto-extracts.
Min VRAM: 2.5GBQuant: CoreML-PalettizedSize: 1.46GBLicense: creativeml-openrail-m
15
Whisper Base
0.074B params
Base whisper model. Good balance of speed and accuracy. 142MB.
Min VRAM: 0.3GBQuant: Q8_0Size: 0.142GBLicense: mit

Related

All tiny models All 4GB-fit models