Best Models for Apple Silicon

Apple Silicon's unified memory makes it surprisingly competent at running large local models — an M2 Max with 96GB can fit models that need an A100 elsewhere. Below are picks that run well on the M-series with MLX or llama.cpp Metal.

1
Llama 3.2 1B Instruct
1.24B params
Ultra-compact 1B model. Runs on virtually any device including smartphones.
Min VRAM: 1.25GBQuant: Q4_K_MSize: 0.752GBLicense: llama3.2
2
Llama 3.2 3B Instruct
3.2B params
Meta's compact 3B model designed for edge and mobile deployment.
Min VRAM: 2.38GBQuant: Q4_K_MSize: 1.881GBLicense: llama3.2
3
Phi-3.5 Mini 3.8B
3.8B params
Tiny but capable 3.8B model. Runs on almost any hardware including phones.
Min VRAM: 2.73GBQuant: Q4_K_MSize: 2.229GBLicense: mit
4
Qwen 2.5 7B Instruct
7.6B params
Efficient 7B model with strong coding and reasoning abilities.
Min VRAM: 5.3GBQuant: Q4_K_MSize: 4.7GBLicense: apache-2.0
5
Gemma 2 9B Instruct
9.2B params
Google's efficient 9B model. Great performance-to-size ratio.
Min VRAM: 5.87GBQuant: Q4_K_MSize: 5.365GBLicense: gemma
6
DeepSeek R1 Distill 8B
8B params
Compact reasoning model. Good reasoning capabilities in a small package.
Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: mit
7
Mistral 7B Instruct v0.3
7.3B params
Efficient 7B model from Mistral AI with strong performance for its size.
Min VRAM: 4.57GBQuant: Q4_K_MSize: 4.072GBLicense: apache-2.0
8
Phi-4
14B params
Microsoft's 14B parameter model. Punches well above its weight on reasoning.
Min VRAM: 8.93GBQuant: Q4_K_MSize: 8.431GBLicense: mit
9
Llama 3.1 8B Instruct
8B params
Meta's 8B parameter instruction-tuned model. Great balance of performance and efficiency for local deployment.
Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: llama3.1
10
Llama 3.1 70B Instruct
70B params
Meta's flagship 70B parameter model. Excellent performance rivaling GPT-4 on many benchmarks.
Min VRAM: 40.1GBQuant: Q4_K_MSize: 39.6GBLicense: llama3.1
11
Qwen 2.5 Coder 7B
7.6B params
Strong 7B code model rivaling larger coding models. Excellent for local development.
Min VRAM: 4.86GBQuant: Q4_K_MSize: 4.361GBLicense: apache-2.0
12
Stable Diffusion 2.1 Base (CoreML)
0.86B params
Smallest CoreML image generation model. Palettized for minimal size (1.14GB). Runs on any iPhone with 6GB RAM. Default image generation model.
Min VRAM: 1.56GBQuant: CoreML-PalettizedSize: 1.063GBLicense: creativeml-openrail-m
13
Stable Diffusion XL (CoreML)
3.5B params
Higher quality image generation. CoreML optimized for iOS. Requires 6GB+ usable memory (iPad/Mac recommended).
Min VRAM: 3.34GBQuant: CoreMLSize: 2.843GBLicense: creativeml-openrail-m
14
Stable Diffusion 1.5 (GGUF)
0.86B params
SD 1.5 in single-file GGUF format. Alternative to CoreML. Uses stable-diffusion.cpp with Metal acceleration.
Min VRAM: 2.13GBQuant: Q4_0Size: 1.627GBLicense: creativeml-openrail-m
15
Whisper Tiny English (Quantized)
0.039B params
Smallest possible speech recognition model. Only 32MB. English only. Default speech model - guaranteed to run on any iPhone.
Min VRAM: 0.1GBQuant: Q5_1Size: 0.032GBLicense: mit
16
Whisper Medium
0.77B params
Mid-size Whisper model. Strong multilingual speech recognition.
Min VRAM: 1.93GBQuant: Q8_0Size: 1.428GBLicense: mit
17
Whisper Large v3
1.55B params
Largest Whisper model. Best accuracy across all languages and accents.
Min VRAM: 3.38GBQuant: Q8_0Size: 2.882GBLicense: mit
18
Whisper Large v3 Turbo
0.81B params
Optimized large Whisper model. Near-best accuracy with faster inference.
Min VRAM: 2.01GBQuant: Q8_0Size: 1.513GBLicense: mit

Related

M4 Max compatibility M3 Pro compatibility