Best Long-Context Models (32K+)
When 8K isn't enough — long-context models keep entire codebases, manuscripts, and document corpora in working memory. Listed by context length and adoption.
- 1
Qwen 2.5 7B Instruct
7.6B paramsEfficient 7B model with strong coding and reasoning abilities.
Min VRAM: 5.3GBQuant: Q4_K_MSize: 4.7GBLicense: apache-2.0 - 2
Llama 3.1 8B Instruct
8B paramsMeta's 8B parameter instruction-tuned model. Great balance of performance and efficiency for local deployment.
Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: llama3.1 - 3
Llama 3.2 3B Instruct
3.2B paramsMeta's compact 3B model designed for edge and mobile deployment.
Min VRAM: 2.38GBQuant: Q4_K_MSize: 1.881GBLicense: llama3.2 - 4
Llama 3.2 1B Instruct
1.24B paramsUltra-compact 1B model. Runs on virtually any device including smartphones.
Min VRAM: 1.25GBQuant: Q4_K_MSize: 0.752GBLicense: llama3.2 - 5
Qwen 2.5 32B
32B paramsPremium 32B model. Top-tier reasoning. Mac with 32GB+ RAM.
Min VRAM: 18.99GBQuant: Q4_K_MSize: 18.488GBLicense: apache-2.0 - 6
DeepSeek R1 Distill 8B
8B paramsCompact reasoning model. Good reasoning capabilities in a small package.
Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: mit - 7
Phi-3.5 Vision
4.2B paramsVision-language model from Microsoft. Can understand images and documents.
Min VRAM: 3.2GBQuant: Q4_K_MSize: 2.5GBLicense: mit - 8
Qwen 2.5 1.5B
1.5B paramsCompact 1.5B model with strong multilingual and coding abilities.
Min VRAM: 1.54GBQuant: Q4_K_MSize: 1.041GBLicense: apache-2.0 - 9
Qwen 2.5 3B
3B paramsVersatile 3B model with strong reasoning and multilingual capabilities.
Min VRAM: 2.46GBQuant: Q4_K_MSize: 1.96GBLicense: apache-2.0 - 10
Qwen 2.5 0.5B
0.5B paramsUltra-small 0.5B model from Alibaba. Minimal resource requirements.
Min VRAM: 0.96GBQuant: Q4_K_MSize: 0.458GBLicense: apache-2.0 - 11
Gemma 3 12B
12B paramsHigh quality 12B model. Excellent for iPad Pro and Mac.
Min VRAM: 7.3GBQuant: Q4_K_MSize: 6.799GBLicense: gemma - 12
Qwen 2.5 Coder 7B
7.6B paramsStrong 7B code model rivaling larger coding models. Excellent for local development.
Min VRAM: 4.86GBQuant: Q4_K_MSize: 4.361GBLicense: apache-2.0 - 13
Mistral 7B Instruct v0.3
7.3B paramsEfficient 7B model from Mistral AI with strong performance for its size.
Min VRAM: 4.57GBQuant: Q4_K_MSize: 4.072GBLicense: apache-2.0 - 14
Qwen2-VL 2B
2.2B paramsCompact vision-language model. Default multimodal model. Can understand images and answer questions about them.
Min VRAM: 1.42GBQuant: Q4_K_MSize: 0.918GBLicense: apache-2.0 - 15
Gemma 3 4B
4B paramsBalanced 4B model with strong reasoning. Great for iPhones.
Min VRAM: 2.82GBQuant: Q4_K_MSize: 2.319GBLicense: gemma