Best LLMs for 12GB VRAM
12GB GPUs (RTX 3060 12GB, RTX 4070, RTX 5070) hit a sweet spot for local LLMs. You can run 7B models at near-FP16 quality and 13B models at Q4–Q5. Top picks below are ranked by grade against a 12GB reference card.
- 1
Gemma 3 12B
12B paramsS gradeHigh quality 12B model. Excellent for iPad Pro and Mac.
Min VRAM: 7.3GBQuant: Q4_K_MSize: 6.799GBLicense: gemma - 2
Mistral Nemo 12B
12B paramsS gradeMistral's 12B model with excellent instruction following.
Min VRAM: 7.46GBQuant: Q4_K_MSize: 6.964GBLicense: apache-2.0 - 3
Solar 10.7B
10.7B paramsS gradeDepth-upscaled 10.7B model. Strong reasoning.
Min VRAM: 6.52GBQuant: Q4_K_MSize: 6.018GBLicense: apache-2.0 - 4
Falcon 3 10B
10B paramsS grade10B Falcon model. Good iPad model.
Min VRAM: 6.36GBQuant: Q4_K_MSize: 5.856GBLicense: apache-2.0 - 5
Gemma 2 9B Instruct
9.2B paramsS gradeGoogle's efficient 9B model. Great performance-to-size ratio.
Min VRAM: 6.69GBQuant: Q5_K_MSize: 6.191GBLicense: gemma - 6
Yi 1.5 9B Chat
9B paramsS grade9B bilingual model with strong reasoning.
Min VRAM: 5.46GBQuant: Q4_K_MSize: 4.963GBLicense: apache-2.0 - 7
DeepSeek R1 Distill 8B
8B paramsS gradeCompact reasoning model. Good reasoning capabilities in a small package.
Min VRAM: 5.84GBQuant: Q5_K_MSize: 5.339GBLicense: mit - 8
Llama 3.1 8B Instruct
8B paramsS gradeMeta's 8B parameter instruction-tuned model. Great balance of performance and efficiency for local deployment.
Min VRAM: 5.84GBQuant: Q5_K_MSize: 5.339GBLicense: llama3.1 - 9
Granite 3.3 8B
8B paramsS gradeIBM's 8B instruction model. Enterprise quality.
Min VRAM: 5.1GBQuant: Q4_K_MSize: 4.603GBLicense: apache-2.0 - 10
EXAONE 3.5 7.8B
7.8B paramsS grade7.8B model from LG. Strong bilingual Korean/English.
Min VRAM: 4.94GBQuant: Q4_K_MSize: 4.443GBLicense: other - 11
InternLM 2.5 7B
7.7B paramsS gradeStrong 7B model from China. Good at tool use and math.
Min VRAM: 4.89GBQuant: Q4_K_MSize: 4.389GBLicense: apache-2.0 - 12
Qwen 2.5 7B Instruct
7.6B paramsS gradeEfficient 7B model with strong coding and reasoning abilities.
Min VRAM: 6.2GBQuant: Q5_K_MSize: 5.5GBLicense: apache-2.0 - 13
Mistral 7B Instruct v0.3
7.3B paramsS gradeEfficient 7B model from Mistral AI with strong performance for its size.
Min VRAM: 7.67GBQuant: Q8_0Size: 7.174GBLicense: apache-2.0 - 14
Falcon 3 7B
7B paramsS gradeFull-size Falcon 3 with strong performance across benchmarks.
Min VRAM: 5GBQuant: Q4_K_MSize: 4.4GBLicense: apache-2.0 - 15
OLMo 2 7B
7B paramsS gradeFully open research model. Transparent training.
Min VRAM: 7.73GBQuant: Q8_0Size: 7.227GBLicense: apache-2.0