Best Models for Apple Silicon

Apple Silicon's unified memory makes it surprisingly competent at running large local models — an M2 Max with 96GB can fit models that need an A100 elsewhere. Below are picks that run well on the M-series with MLX or llama.cpp Metal.

  1. 1

    Llama 3.2 1B Instruct

    1.24B params

    Ultra-compact 1B model. Runs on virtually any device including smartphones.

    Min VRAM: 1.25GBQuant: Q4_K_MSize: 0.752GBLicense: llama3.2
  2. 2

    Llama 3.2 3B Instruct

    3.2B params

    Meta's compact 3B model designed for edge and mobile deployment.

    Min VRAM: 2.38GBQuant: Q4_K_MSize: 1.881GBLicense: llama3.2
  3. 3

    Phi-3.5 Mini 3.8B

    3.8B params

    Tiny but capable 3.8B model. Runs on almost any hardware including phones.

    Min VRAM: 2.73GBQuant: Q4_K_MSize: 2.229GBLicense: mit
  4. 4

    Qwen 2.5 7B Instruct

    7.6B params

    Efficient 7B model with strong coding and reasoning abilities.

    Min VRAM: 5.3GBQuant: Q4_K_MSize: 4.7GBLicense: apache-2.0
  5. 5

    Gemma 2 9B Instruct

    9.2B params

    Google's efficient 9B model. Great performance-to-size ratio.

    Min VRAM: 5.87GBQuant: Q4_K_MSize: 5.365GBLicense: gemma
  6. 6

    DeepSeek R1 Distill 8B

    8B params

    Compact reasoning model. Good reasoning capabilities in a small package.

    Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: mit
  7. 7

    Mistral 7B Instruct v0.3

    7.3B params

    Efficient 7B model from Mistral AI with strong performance for its size.

    Min VRAM: 4.57GBQuant: Q4_K_MSize: 4.072GBLicense: apache-2.0
  8. 8

    Phi-4

    14B params

    Microsoft's 14B parameter model. Punches well above its weight on reasoning.

    Min VRAM: 8.93GBQuant: Q4_K_MSize: 8.431GBLicense: mit
  9. 9

    Llama 3.1 8B Instruct

    8B params

    Meta's 8B parameter instruction-tuned model. Great balance of performance and efficiency for local deployment.

    Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: llama3.1
  10. 10

    Llama 3.1 70B Instruct

    70B params

    Meta's flagship 70B parameter model. Excellent performance rivaling GPT-4 on many benchmarks.

    Min VRAM: 40.1GBQuant: Q4_K_MSize: 39.6GBLicense: llama3.1
  11. 11

    Qwen 2.5 Coder 7B

    7.6B params

    Strong 7B code model rivaling larger coding models. Excellent for local development.

    Min VRAM: 4.86GBQuant: Q4_K_MSize: 4.361GBLicense: apache-2.0
  12. 12

    Stable Diffusion 2.1 Base (CoreML)

    0.86B params

    Smallest CoreML image generation model. Palettized for minimal size (1.14GB). Runs on any iPhone with 6GB RAM. Default image generation model.

    Min VRAM: 1.56GBQuant: CoreML-PalettizedSize: 1.063GBLicense: creativeml-openrail-m
  13. 13

    Stable Diffusion XL (CoreML)

    3.5B params

    Higher quality image generation. CoreML optimized for iOS. Requires 6GB+ usable memory (iPad/Mac recommended).

    Min VRAM: 3.34GBQuant: CoreMLSize: 2.843GBLicense: creativeml-openrail-m
  14. 14

    Stable Diffusion 1.5 (GGUF)

    0.86B params

    SD 1.5 in single-file GGUF format. Alternative to CoreML. Uses stable-diffusion.cpp with Metal acceleration.

    Min VRAM: 2.13GBQuant: Q4_0Size: 1.627GBLicense: creativeml-openrail-m
  15. 15

    Whisper Tiny English (Quantized)

    0.039B params

    Smallest possible speech recognition model. Only 32MB. English only. Default speech model - guaranteed to run on any iPhone.

    Min VRAM: 0.1GBQuant: Q5_1Size: 0.032GBLicense: mit
  16. 16

    Whisper Medium

    0.77B params

    Mid-size Whisper model. Strong multilingual speech recognition.

    Min VRAM: 1.93GBQuant: Q8_0Size: 1.428GBLicense: mit
  17. 17

    Whisper Large v3

    1.55B params

    Largest Whisper model. Best accuracy across all languages and accents.

    Min VRAM: 3.38GBQuant: Q8_0Size: 2.882GBLicense: mit
  18. 18

    Whisper Large v3 Turbo

    0.81B params

    Optimized large Whisper model. Near-best accuracy with faster inference.

    Min VRAM: 2.01GBQuant: Q8_0Size: 1.513GBLicense: mit

Related