Best Models for Apple Silicon
Apple Silicon's unified memory makes it surprisingly competent at running large local models — an M2 Max with 96GB can fit models that need an A100 elsewhere. Below are picks that run well on the M-series with MLX or llama.cpp Metal.
- 1
Llama 3.2 1B Instruct
1.24B paramsUltra-compact 1B model. Runs on virtually any device including smartphones.
Min VRAM: 1.25GBQuant: Q4_K_MSize: 0.752GBLicense: llama3.2 - 2
Llama 3.2 3B Instruct
3.2B paramsMeta's compact 3B model designed for edge and mobile deployment.
Min VRAM: 2.38GBQuant: Q4_K_MSize: 1.881GBLicense: llama3.2 - 3
Phi-3.5 Mini 3.8B
3.8B paramsTiny but capable 3.8B model. Runs on almost any hardware including phones.
Min VRAM: 2.73GBQuant: Q4_K_MSize: 2.229GBLicense: mit - 4
Qwen 2.5 7B Instruct
7.6B paramsEfficient 7B model with strong coding and reasoning abilities.
Min VRAM: 5.3GBQuant: Q4_K_MSize: 4.7GBLicense: apache-2.0 - 5
Gemma 2 9B Instruct
9.2B paramsGoogle's efficient 9B model. Great performance-to-size ratio.
Min VRAM: 5.87GBQuant: Q4_K_MSize: 5.365GBLicense: gemma - 6
DeepSeek R1 Distill 8B
8B paramsCompact reasoning model. Good reasoning capabilities in a small package.
Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: mit - 7
Mistral 7B Instruct v0.3
7.3B paramsEfficient 7B model from Mistral AI with strong performance for its size.
Min VRAM: 4.57GBQuant: Q4_K_MSize: 4.072GBLicense: apache-2.0 - 8
Phi-4
14B paramsMicrosoft's 14B parameter model. Punches well above its weight on reasoning.
Min VRAM: 8.93GBQuant: Q4_K_MSize: 8.431GBLicense: mit - 9
Llama 3.1 8B Instruct
8B paramsMeta's 8B parameter instruction-tuned model. Great balance of performance and efficiency for local deployment.
Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: llama3.1 - 10
Llama 3.1 70B Instruct
70B paramsMeta's flagship 70B parameter model. Excellent performance rivaling GPT-4 on many benchmarks.
Min VRAM: 40.1GBQuant: Q4_K_MSize: 39.6GBLicense: llama3.1 - 11
Qwen 2.5 Coder 7B
7.6B paramsStrong 7B code model rivaling larger coding models. Excellent for local development.
Min VRAM: 4.86GBQuant: Q4_K_MSize: 4.361GBLicense: apache-2.0 - 12
Stable Diffusion 2.1 Base (CoreML)
0.86B paramsSmallest CoreML image generation model. Palettized for minimal size (1.14GB). Runs on any iPhone with 6GB RAM. Default image generation model.
Min VRAM: 1.56GBQuant: CoreML-PalettizedSize: 1.063GBLicense: creativeml-openrail-m - 13
Stable Diffusion XL (CoreML)
3.5B paramsHigher quality image generation. CoreML optimized for iOS. Requires 6GB+ usable memory (iPad/Mac recommended).
Min VRAM: 3.34GBQuant: CoreMLSize: 2.843GBLicense: creativeml-openrail-m - 14
Stable Diffusion 1.5 (GGUF)
0.86B paramsSD 1.5 in single-file GGUF format. Alternative to CoreML. Uses stable-diffusion.cpp with Metal acceleration.
Min VRAM: 2.13GBQuant: Q4_0Size: 1.627GBLicense: creativeml-openrail-m - 15
Whisper Tiny English (Quantized)
0.039B paramsSmallest possible speech recognition model. Only 32MB. English only. Default speech model - guaranteed to run on any iPhone.
Min VRAM: 0.1GBQuant: Q5_1Size: 0.032GBLicense: mit - 16
Whisper Medium
0.77B paramsMid-size Whisper model. Strong multilingual speech recognition.
Min VRAM: 1.93GBQuant: Q8_0Size: 1.428GBLicense: mit - 17
Whisper Large v3
1.55B paramsLargest Whisper model. Best accuracy across all languages and accents.
Min VRAM: 3.38GBQuant: Q8_0Size: 2.882GBLicense: mit - 18
Whisper Large v3 Turbo
0.81B paramsOptimized large Whisper model. Near-best accuracy with faster inference.
Min VRAM: 2.01GBQuant: Q8_0Size: 1.513GBLicense: mit