nvidiaampere

NVIDIA RTX 3060 Laptop AI Model Compatibility

What AI models can you run on a NVIDIA RTX 3060 Laptop? With 6GB VRAM, this card runs 103 of 109 models in our database. Below: full grades, recommended quantizations, and tokens-per-second estimates for every model.

VRAM
6GB
Excellent fit
86
S + A grade
Will run
12
B + C grade
Too large
6
Cannot run any quant

Language Models43 of 47 run

S
Yi 1.5 6B Chat
6B · 01.AI
Q4_K_M · 3.92GB
46 tok/s
S
Gemma 3 4B
4B · Google
Q4_K_M · 2.82GB
78 tok/s
S
Nemotron Mini 4B
4B · NVIDIA
Q4_K_M · 3.01GB
78 tok/s
S
Danube 3 4B
4B · H2O.ai
Q4_K_M · 2.73GB
78 tok/s
S
Phi-3.5 Mini 3.8B
3.8B · Microsoft
Q5_K_M · 3.12GB
78 tok/s
S
Phi-4 Mini 3.8B
3.8B · Microsoft
Q4_K_M · 2.82GB
78 tok/s
S
Llama 3.2 3B Instruct
3.2B · Meta
Q8_0 · 3.69GB
78 tok/s
S
Qwen 2.5 3B
3B · Alibaba
Q8_0 · 3.87GB
78 tok/s
S
Falcon 3 3B
3B · TII
Q8_0 · 3.8GB
78 tok/s
S
StableLM Zephyr 3B
3B · Stability AI
Q8_0 · 3.27GB
78 tok/s
S
Rocket 3B
3B · Pansophic
Q8_0 · 3.27GB
78 tok/s
S
Gemma 2 2B
2.6B · Google
Q8_0 · 3.09GB
78 tok/s
S
EXAONE 3.5 2.4B
2.4B · LG AI
Q8_0 · 3.14GB
78 tok/s
S
Granite 3.3 2B
2B · IBM
Q8_0 · 3.01GB
114 tok/s
S
SmolLM2 1.7B
1.7B · HuggingFace
Q8_0 · 2.2GB
114 tok/s
S
Qwen 2.5 1.5B
1.5B · Alibaba
Q8_0 · 2.26GB
114 tok/s
S
DeepSeek R1 Distill 1.5B
1.5B · DeepSeek
Q8_0 · 2.26GB
114 tok/s
S
Llama 3.2 1B Instruct
1.24B · Meta
FP16 · 2.81GB
114 tok/s
S
TinyLlama 1.1B
1.1B · TinyLlama
Q8_0 · 1.59GB
114 tok/s
S
Gemma 3 1B
1B · Google
Q8_0 · 1.5GB
114 tok/s
S
Falcon 3 1B
1B · TII
Q8_0 · 2.16GB
114 tok/s
S
Qwen 2.5 0.5B
0.5B · Alibaba
Q8_0 · 1.13GB
114 tok/s
S
Danube 3 500M
0.5B · H2O.ai
Q8_0 · 1.01GB
114 tok/s
S
SmolLM2 360M
0.36B · HuggingFace
Q8_0 · 0.86GB
114 tok/s
S
SmolLM2 135M
0.135B · HuggingFace
FP16 · 0.75GB
114 tok/s
A
EXAONE 3.5 7.8B
7.8B · LG AI
Q4_K_M · 4.94GB
46 tok/s
A
InternLM 2.5 7B
7.7B · Shanghai AI Lab
Q4_K_M · 4.89GB
46 tok/s
A
Mistral 7B Instruct v0.3
7.3B · Mistral AI
Q4_K_M · 4.57GB
46 tok/s
A
Falcon 3 7B
7B · TII
Q4_K_M · 5GB
46 tok/s
A
OLMo 2 7B
7B · Allen AI
Q4_K_M · 4.67GB
46 tok/s
A
OpenChat 3.5 7B
7B · OpenChat
Q4_K_M · 4.57GB
46 tok/s
B
Gemma 2 9B Instruct
9.2B · Google
Q4_K_M · 5.87GB
39 tok/s
B
Yi 1.5 9B Chat
9B · 01.AI
Q4_K_M · 5.46GB
42 tok/s
B
DeepSeek R1 Distill 8B
8B · DeepSeek
Q5_K_M · 5.84GB
39 tok/s
B
Llama 3.1 8B Instruct
8B · Meta
Q5_K_M · 5.84GB
39 tok/s
B
Granite 3.3 8B
8B · IBM
Q4_K_M · 5.1GB
45 tok/s
B
Qwen 2.5 7B Instruct
7.6B · Alibaba
Q4_K_M · 5.3GB
43 tok/s
C
Gemma 3 12B
12B · Google
Q4_K_M · 7.3GB
C
Mistral Nemo 12B
12B · Mistral AI
Q4_K_M · 7.46GB
C
Solar 10.7B
10.7B · Upstage
Q4_K_M · 6.52GB
C
Falcon 3 10B
10B · TII
Q4_K_M · 6.36GB
36 tok/s
D
Phi-4
14B · Microsoft
Q5_K_M · 10.38GB
D
Qwen 2.5 14B
14B · Alibaba
Q4_K_M · 8.87GB
F
Llama 3.1 70B Instruct
70B · Meta
Q4_K_M · 40.1GB
F
Qwen 2.5 32B
32B · Alibaba
Q4_K_M · 18.99GB
F
Gemma 3 27B
27B · Google
Q4_K_M · 15.91GB
F
Mistral Small 22B
22B · Mistral AI
Q4_K_M · 12.93GB

Code Models16 of 16 run

Multimodal & Vision6 of 6 run

Image Generation7 of 9 run

Speech Recognition9 of 9 run

Text-to-Speech14 of 14 run

Audio Generation1 of 1 run

Embedding Models5 of 5 run

Reranker Models2 of 2 run

How these grades work

Grades are computed from the ratio of NVIDIA RTX 3060 Laptop's effective VRAM (6GB) to each model's required VRAM at its highest-quality quantization that still fits. S: comfortable headroom (1.5×+). A: smooth (1.2×+). B: tight but works (1.0×+). C: partial offload (0.8×+). D: heavy offload (0.5×+). F: cannot run.

Tokens-per-second figures are based on real community benchmarks (llama.cpp discussions, MLX, vLLM) scaled to model size. Real-world numbers vary with batch size, context length, and driver version.