Best LLMs Under 8GB VRAM

If you have an 8GB GPU (RTX 3060 8GB, RTX 4060, RX 7600, or many laptop GPUs), these are the language models that run cleanly with comfortable headroom. All entries below have been graded against an 8GB reference card and sorted by quality.

  1. 1

    DeepSeek R1 Distill 8B

    8B paramsS grade

    Compact reasoning model. Good reasoning capabilities in a small package.

    Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: mit
  2. 2

    Llama 3.1 8B Instruct

    8B paramsS grade

    Meta's 8B parameter instruction-tuned model. Great balance of performance and efficiency for local deployment.

    Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: llama3.1
  3. 3

    Granite 3.3 8B

    8B paramsS grade

    IBM's 8B instruction model. Enterprise quality.

    Min VRAM: 5.1GBQuant: Q4_K_MSize: 4.603GBLicense: apache-2.0
  4. 4

    EXAONE 3.5 7.8B

    7.8B paramsS grade

    7.8B model from LG. Strong bilingual Korean/English.

    Min VRAM: 4.94GBQuant: Q4_K_MSize: 4.443GBLicense: other
  5. 5

    InternLM 2.5 7B

    7.7B paramsS grade

    Strong 7B model from China. Good at tool use and math.

    Min VRAM: 4.89GBQuant: Q4_K_MSize: 4.389GBLicense: apache-2.0
  6. 6

    Qwen 2.5 7B Instruct

    7.6B paramsS grade

    Efficient 7B model with strong coding and reasoning abilities.

    Min VRAM: 5.3GBQuant: Q4_K_MSize: 4.7GBLicense: apache-2.0
  7. 7

    Mistral 7B Instruct v0.3

    7.3B paramsS grade

    Efficient 7B model from Mistral AI with strong performance for its size.

    Min VRAM: 5.28GBQuant: Q5_K_MSize: 4.783GBLicense: apache-2.0
  8. 8

    Falcon 3 7B

    7B paramsS grade

    Full-size Falcon 3 with strong performance across benchmarks.

    Min VRAM: 5GBQuant: Q4_K_MSize: 4.4GBLicense: apache-2.0
  9. 9

    OLMo 2 7B

    7B paramsS grade

    Fully open research model. Transparent training.

    Min VRAM: 4.67GBQuant: Q4_K_MSize: 4.165GBLicense: apache-2.0
  10. 10

    OpenChat 3.5 7B

    7B paramsS grade

    Fine-tuned Mistral 7B for chat. Strong instruction following.

    Min VRAM: 4.57GBQuant: Q4_K_MSize: 4.068GBLicense: apache-2.0
  11. 11

    Yi 1.5 6B Chat

    6B paramsS grade

    Efficient 6B bilingual (English/Chinese) model.

    Min VRAM: 3.92GBQuant: Q4_K_MSize: 3.422GBLicense: apache-2.0
  12. 12

    Gemma 3 4B

    4B paramsS grade

    Balanced 4B model with strong reasoning. Great for iPhones.

    Min VRAM: 4.35GBQuant: Q8_0Size: 3.847GBLicense: gemma
  13. 13

    Nemotron Mini 4B

    4B paramsS grade

    NVIDIA's compact 4B model optimized for edge deployment.

    Min VRAM: 4.65GBQuant: Q8_0Size: 4.154GBLicense: other
  14. 14

    Danube 3 4B

    4B paramsS grade

    Capable 4B model from H2O.ai. Good for phones.

    Min VRAM: 4.42GBQuant: Q8_0Size: 3.922GBLicense: apache-2.0
  15. 15

    Phi-3.5 Mini 3.8B

    3.8B paramsS grade

    Tiny but capable 3.8B model. Runs on almost any hardware including phones.

    Min VRAM: 4.28GBQuant: Q8_0Size: 3.782GBLicense: mit

Related