Best Long-Context Models (32K+)

When 8K isn't enough — long-context models keep entire codebases, manuscripts, and document corpora in working memory. Listed by context length and adoption.

  1. 1

    Qwen 2.5 7B Instruct

    7.6B params

    Efficient 7B model with strong coding and reasoning abilities.

    Min VRAM: 5.3GBQuant: Q4_K_MSize: 4.7GBLicense: apache-2.0
  2. 2

    Llama 3.1 8B Instruct

    8B params

    Meta's 8B parameter instruction-tuned model. Great balance of performance and efficiency for local deployment.

    Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: llama3.1
  3. 3

    Llama 3.2 3B Instruct

    3.2B params

    Meta's compact 3B model designed for edge and mobile deployment.

    Min VRAM: 2.38GBQuant: Q4_K_MSize: 1.881GBLicense: llama3.2
  4. 4

    Llama 3.2 1B Instruct

    1.24B params

    Ultra-compact 1B model. Runs on virtually any device including smartphones.

    Min VRAM: 1.25GBQuant: Q4_K_MSize: 0.752GBLicense: llama3.2
  5. 5

    Qwen 2.5 32B

    32B params

    Premium 32B model. Top-tier reasoning. Mac with 32GB+ RAM.

    Min VRAM: 18.99GBQuant: Q4_K_MSize: 18.488GBLicense: apache-2.0
  6. 6

    DeepSeek R1 Distill 8B

    8B params

    Compact reasoning model. Good reasoning capabilities in a small package.

    Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: mit
  7. 7

    Phi-3.5 Vision

    4.2B params

    Vision-language model from Microsoft. Can understand images and documents.

    Min VRAM: 3.2GBQuant: Q4_K_MSize: 2.5GBLicense: mit
  8. 8

    Qwen 2.5 1.5B

    1.5B params

    Compact 1.5B model with strong multilingual and coding abilities.

    Min VRAM: 1.54GBQuant: Q4_K_MSize: 1.041GBLicense: apache-2.0
  9. 9

    Qwen 2.5 3B

    3B params

    Versatile 3B model with strong reasoning and multilingual capabilities.

    Min VRAM: 2.46GBQuant: Q4_K_MSize: 1.96GBLicense: apache-2.0
  10. 10

    Qwen 2.5 0.5B

    0.5B params

    Ultra-small 0.5B model from Alibaba. Minimal resource requirements.

    Min VRAM: 0.96GBQuant: Q4_K_MSize: 0.458GBLicense: apache-2.0
  11. 11

    Gemma 3 12B

    12B params

    High quality 12B model. Excellent for iPad Pro and Mac.

    Min VRAM: 7.3GBQuant: Q4_K_MSize: 6.799GBLicense: gemma
  12. 12

    Qwen 2.5 Coder 7B

    7.6B params

    Strong 7B code model rivaling larger coding models. Excellent for local development.

    Min VRAM: 4.86GBQuant: Q4_K_MSize: 4.361GBLicense: apache-2.0
  13. 13

    Mistral 7B Instruct v0.3

    7.3B params

    Efficient 7B model from Mistral AI with strong performance for its size.

    Min VRAM: 4.57GBQuant: Q4_K_MSize: 4.072GBLicense: apache-2.0
  14. 14

    Qwen2-VL 2B

    2.2B params

    Compact vision-language model. Default multimodal model. Can understand images and answer questions about them.

    Min VRAM: 1.42GBQuant: Q4_K_MSize: 0.918GBLicense: apache-2.0
  15. 15

    Gemma 3 4B

    4B params

    Balanced 4B model with strong reasoning. Great for iPhones.

    Min VRAM: 2.82GBQuant: Q4_K_MSize: 2.319GBLicense: gemma

Related