nvidiaampere

NVIDIA RTX 3060 Laptop AI Model Compatibility

What AI models can you run on a NVIDIA RTX 3060 Laptop? With 6GB VRAM, this card runs 103 of 109 models in our database. Below: full grades, recommended quantizations, and tokens-per-second estimates for every model.

VRAM

6GB

Excellent fit

86

S + A grade

Will run

12

B + C grade

Too large

6

Cannot run any quant

Language Models43 of 47 run

Grade

Model

Best quant · VRAM

Speed

Experience

Q4_K_M · 3.92GB

Q4_K_M · 2.82GB

Nemotron Mini 4B

Q4_K_M · 3.01GB

Q4_K_M · 2.73GB

Phi-3.5 Mini 3.8B

3.8B · Microsoft

Q5_K_M · 3.12GB

Phi-4 Mini 3.8B

3.8B · Microsoft

Q4_K_M · 2.82GB

Llama 3.2 3B Instruct

StableLM Zephyr 3B

3B · Stability AI

3B · Pansophic

EXAONE 3.5 2.4B

1.7B · HuggingFace

1.5B · Alibaba

DeepSeek R1 Distill 1.5B

1.5B · DeepSeek

Llama 3.2 1B Instruct

1.1B · TinyLlama

0.5B · Alibaba

0.36B · HuggingFace

0.135B · HuggingFace

EXAONE 3.5 7.8B

Q4_K_M · 4.94GB

InternLM 2.5 7B

7.7B · Shanghai AI Lab

Q4_K_M · 4.89GB

Mistral 7B Instruct v0.3

7.3B · Mistral AI

Q4_K_M · 4.57GB

Q4_K_M · 4.67GB

OpenChat 3.5 7B

Q4_K_M · 4.57GB

Gemma 2 9B Instruct

Q4_K_M · 5.87GB

Q4_K_M · 5.46GB

DeepSeek R1 Distill 8B

Q5_K_M · 5.84GB

Llama 3.1 8B Instruct

Q5_K_M · 5.84GB

Q4_K_M · 5.1GB

Qwen 2.5 7B Instruct

7.6B · Alibaba

Q4_K_M · 5.3GB

Q4_K_M · 7.3GB

Mistral Nemo 12B

12B · Mistral AI

Q4_K_M · 7.46GB

10.7B · Upstage

Q4_K_M · 6.52GB

Q4_K_M · 6.36GB

14B · Microsoft

Q5_K_M · 10.38GB

Q4_K_M · 8.87GB

Llama 3.1 70B Instruct

Q4_K_M · 40.1GB

Q4_K_M · 18.99GB

Q4_K_M · 15.91GB

Mistral Small 22B

22B · Mistral AI

Q4_K_M · 12.93GB

Code Models16 of 16 run

Grade

Model

Best quant · VRAM

Speed

Experience

Qwen 2.5 Coder 3B

3B · Stability AI

Qwen 2.5 Coder 1.5B

1.5B · Alibaba

DeepSeek Coder 1.3B

1.3B · DeepSeek

Qwen 2.5 Coder 0.5B

0.5B · Alibaba

Qwen 2.5 Coder 7B

7.6B · Alibaba

Q4_K_M · 4.86GB

Q4_K_M · 4.66GB

Q4_K_M · 4.3GB

DeepSeek Coder 6.7B

6.7B · DeepSeek

Q4_K_M · 4.3GB

Q4_K_M · 5.46GB

Q4_K_M · 5.46GB

Qwen 2.5 Coder 14B

Q4_K_M · 8.87GB

Code Llama 13B Instruct

Q4_K_M · 7.83GB

Multimodal & Vision6 of 6 run

Grade

Model

Best quant · VRAM

Speed

Experience

4.2B · Microsoft

Q4_K_M · 3.2GB

Q4_K_M · 2.5GB

2.2B · Alibaba

1.8B · Moondream

Q4_K_M · 1.5GB

Image Generation7 of 9 run

Grade

Model

Best quant · VRAM

Speed

Experience

Stable Diffusion XL (CoreML)

3.5B · Stability AI

CoreML · 3.34GB

Stable Diffusion 2.1 Base (CoreML)

0.86B · Stability AI / Apple

CoreML-Palettized · 1.56GB

Stable Diffusion 1.5 (CoreML)

0.86B · Runway

CoreML-Palettized · 2.5GB

Stable Diffusion 1.5 (GGUF)

0.86B · Runway / GPUStack

Stable Diffusion 2.1 (GGUF)

0.86B · Stability AI

SDXL Turbo (GGUF)

3.5B · Stability AI

Stable Diffusion 3 Medium (GGUF)

2.5B · Stability AI

FLUX.1 Schnell (GGUF)

12B · Black Forest Labs

FLUX.1 Dev (GGUF)

12B · Black Forest Labs

Speech Recognition9 of 9 run

Grade

Model

Best quant · VRAM

Speed

Experience

Whisper Large v3

1.55B · OpenAI

Whisper Large v3 Turbo

0.81B · OpenAI

0.77B · OpenAI

Distil-Whisper Large v3

0.76B · HuggingFace

0.24B · OpenAI

0.074B · OpenAI

Whisper Base English

0.074B · OpenAI

Whisper Tiny English (Quantized)

0.039B · OpenAI

0.039B · OpenAI

Text-to-Speech14 of 14 run

Grade

Model

Best quant · VRAM

Speed

Experience

0.082B · Kokoro

ONNX-Q8F16 · 0.58GB

Piper TTS - Amy (English)

0.02B · Rhasspy

Piper TTS - Lessac (English)

0.02B · Rhasspy

Piper TTS - LibriTTS-R (English)

0.02B · Rhasspy

Piper TTS - Spanish (MLS)

0.02B · Rhasspy

Piper TTS - French (Siwis)

0.02B · Rhasspy

Piper TTS - German (Thorsten)

0.02B · Rhasspy

Piper TTS - Chinese (Huayan)

0.02B · Rhasspy

Piper TTS - Japanese (Kokoro)

0.02B · Rhasspy

Piper TTS - Korean

0.02B · Rhasspy

Piper TTS - Russian (Irina)

0.02B · Rhasspy

Piper TTS - Portuguese (Faber)

0.02B · Rhasspy

Piper TTS - Italian (Riccardo)

0.02B · Rhasspy

Piper TTS - Arabic (Kareem)

0.02B · Rhasspy

Audio Generation1 of 1 run

Grade

Model

Best quant · VRAM

Speed

Experience

ONNX-Q4F16 · 0.78GB

Embedding Models5 of 5 run

Grade

Model

Best quant · VRAM

Speed

Experience

BGE Large EN v1.5

Nomic Embed Text v1.5

0.137B · Nomic AI

BGE Small EN v1.5

Snowflake Arctic Embed S

0.033B · Snowflake

all-MiniLM-L6-v2

0.023B · Sentence Transformers

Reranker Models2 of 2 run

Grade

Model

Best quant · VRAM

Speed

Experience

BGE Reranker v2 M3

Jina Reranker Tiny EN

0.033B · Jina AI

How these grades work

Grades are computed from the ratio of NVIDIA RTX 3060 Laptop's effective VRAM (6GB) to each model's required VRAM at its highest-quality quantization that still fits. S: comfortable headroom (1.5×+). A: smooth (1.2×+). B: tight but works (1.0×+). C: partial offload (0.8×+). D: heavy offload (0.5×+). F: cannot run.

Tokens-per-second figures are based on real community benchmarks (llama.cpp discussions, MLX, vLLM) scaled to model size. Real-world numbers vary with batch size, context length, and driver version.