Best LLMs Under 8GB VRAM

If you have an 8GB GPU (RTX 3060 8GB, RTX 4060, RX 7600, or many laptop GPUs), these are the language models that run cleanly with comfortable headroom. All entries below have been graded against an 8GB reference card and sorted by quality.

1
DeepSeek R1 Distill 8B
8B paramsS grade
Compact reasoning model. Good reasoning capabilities in a small package.
Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: mit
2
Llama 3.1 8B Instruct
8B paramsS grade
Meta's 8B parameter instruction-tuned model. Great balance of performance and efficiency for local deployment.
Min VRAM: 5.08GBQuant: Q4_K_MSize: 4.583GBLicense: llama3.1
3
Granite 3.3 8B
8B paramsS grade
IBM's 8B instruction model. Enterprise quality.
Min VRAM: 5.1GBQuant: Q4_K_MSize: 4.603GBLicense: apache-2.0
4
EXAONE 3.5 7.8B
7.8B paramsS grade
7.8B model from LG. Strong bilingual Korean/English.
Min VRAM: 4.94GBQuant: Q4_K_MSize: 4.443GBLicense: other
5
InternLM 2.5 7B
7.7B paramsS grade
Strong 7B model from China. Good at tool use and math.
Min VRAM: 4.89GBQuant: Q4_K_MSize: 4.389GBLicense: apache-2.0
6
Qwen 2.5 7B Instruct
7.6B paramsS grade
Efficient 7B model with strong coding and reasoning abilities.
Min VRAM: 5.3GBQuant: Q4_K_MSize: 4.7GBLicense: apache-2.0
7
Mistral 7B Instruct v0.3
7.3B paramsS grade
Efficient 7B model from Mistral AI with strong performance for its size.
Min VRAM: 5.28GBQuant: Q5_K_MSize: 4.783GBLicense: apache-2.0
8
Falcon 3 7B
7B paramsS grade
Full-size Falcon 3 with strong performance across benchmarks.
Min VRAM: 5GBQuant: Q4_K_MSize: 4.4GBLicense: apache-2.0
9
OLMo 2 7B
7B paramsS grade
Fully open research model. Transparent training.
Min VRAM: 4.67GBQuant: Q4_K_MSize: 4.165GBLicense: apache-2.0
10
OpenChat 3.5 7B
7B paramsS grade
Fine-tuned Mistral 7B for chat. Strong instruction following.
Min VRAM: 4.57GBQuant: Q4_K_MSize: 4.068GBLicense: apache-2.0
11
Yi 1.5 6B Chat
6B paramsS grade
Efficient 6B bilingual (English/Chinese) model.
Min VRAM: 3.92GBQuant: Q4_K_MSize: 3.422GBLicense: apache-2.0
12
Gemma 3 4B
4B paramsS grade
Balanced 4B model with strong reasoning. Great for iPhones.
Min VRAM: 4.35GBQuant: Q8_0Size: 3.847GBLicense: gemma
13
Nemotron Mini 4B
4B paramsS grade
NVIDIA's compact 4B model optimized for edge deployment.
Min VRAM: 4.65GBQuant: Q8_0Size: 4.154GBLicense: other
14
Danube 3 4B
4B paramsS grade
Capable 4B model from H2O.ai. Good for phones.
Min VRAM: 4.42GBQuant: Q8_0Size: 3.922GBLicense: apache-2.0
15
Phi-3.5 Mini 3.8B
3.8B paramsS grade
Tiny but capable 3.8B model. Runs on almost any hardware including phones.
Min VRAM: 4.28GBQuant: Q8_0Size: 3.782GBLicense: mit

Related

All models for 8GB VRAM RTX 4060 compatibility