nvidiaada lovelaceMSRP $499
NVIDIA GeForce RTX 4060 Ti 16GB AI Model Compatibility
What AI models can you run on a NVIDIA GeForce RTX 4060 Ti 16GB? With 16GB VRAM, this card runs 108 of 109 models in our database. Below: full grades, recommended quantizations, and tokens-per-second estimates for every model.
VRAM
16GB
Excellent fit
104
S + A grade
Will run
4
B + C grade
Too large
1
Cannot run any quant
Language Models46 of 47 run
Grade
Model
Best quant · VRAM
Speed
Experience
S
Phi-4
14B · Microsoft
Q5_K_M · 10.38GB
—
Cannot run
S
Qwen 2.5 14B
14B · Alibaba
Q4_K_M · 8.87GB
—
Cannot run
S
Gemma 3 12B
12B · Google
Q4_K_M · 7.3GB
—
Cannot run
S
Mistral Nemo 12B
12B · Mistral AI
Q4_K_M · 7.46GB
—
Cannot run
S
Solar 10.7B
10.7B · Upstage
Q4_K_M · 6.52GB
—
Cannot run
S
Falcon 3 10B
10B · TII
Q4_K_M · 6.36GB
46 tok/s
Fast
S
Gemma 2 9B Instruct
9.2B · Google
Q8_0 · 9.65GB
46 tok/s
Fast
S
Yi 1.5 9B Chat
9B · 01.AI
Q8_0 · 9.24GB
46 tok/s
Fast
S
DeepSeek R1 Distill 8B
8B · DeepSeek
Q8_0 · 8.45GB
46 tok/s
Fast
S
Llama 3.1 8B Instruct
8B · Meta
Q8_0 · 8.45GB
46 tok/s
Fast
S
Granite 3.3 8B
8B · IBM
Q8_0 · 8.59GB
46 tok/s
Fast
S
EXAONE 3.5 7.8B
7.8B · LG AI
Q8_0 · 8.24GB
46 tok/s
Fast
S
InternLM 2.5 7B
7.7B · Shanghai AI Lab
Q8_0 · 8.16GB
46 tok/s
Fast
S
Qwen 2.5 7B Instruct
7.6B · Alibaba
Q8_0 · 9GB
46 tok/s
Fast
S
Mistral 7B Instruct v0.3
7.3B · Mistral AI
Q8_0 · 7.67GB
46 tok/s
Fast
S
Falcon 3 7B
7B · TII
Q8_0 · 8.3GB
46 tok/s
Fast
S
OLMo 2 7B
7B · Allen AI
Q8_0 · 7.73GB
46 tok/s
Fast
S
OpenChat 3.5 7B
7B · OpenChat
Q8_0 · 7.67GB
46 tok/s
Fast
S
Yi 1.5 6B Chat
6B · 01.AI
Q8_0 · 6.5GB
46 tok/s
Fast
S
Gemma 3 4B
4B · Google
Q8_0 · 4.35GB
78 tok/s
Instant
S
Nemotron Mini 4B
4B · NVIDIA
Q8_0 · 4.65GB
78 tok/s
Instant
S
Danube 3 4B
4B · H2O.ai
Q8_0 · 4.42GB
78 tok/s
Instant
S
Phi-3.5 Mini 3.8B
3.8B · Microsoft
Q8_0 · 4.28GB
78 tok/s
Instant
S
Phi-4 Mini 3.8B
3.8B · Microsoft
Q8_0 · 4.3GB
78 tok/s
Instant
S
Llama 3.2 3B Instruct
3.2B · Meta
Q8_0 · 3.69GB
78 tok/s
Instant
S
Qwen 2.5 3B
3B · Alibaba
Q8_0 · 3.87GB
78 tok/s
Instant
S
Falcon 3 3B
3B · TII
Q8_0 · 3.8GB
78 tok/s
Instant
S
StableLM Zephyr 3B
3B · Stability AI
Q8_0 · 3.27GB
78 tok/s
Instant
S
Rocket 3B
3B · Pansophic
Q8_0 · 3.27GB
78 tok/s
Instant
S
Gemma 2 2B
2.6B · Google
Q8_0 · 3.09GB
78 tok/s
Instant
S
EXAONE 3.5 2.4B
2.4B · LG AI
Q8_0 · 3.14GB
78 tok/s
Instant
S
Granite 3.3 2B
2B · IBM
Q8_0 · 3.01GB
114 tok/s
Instant
S
SmolLM2 1.7B
1.7B · HuggingFace
Q8_0 · 2.2GB
114 tok/s
Instant
S
Qwen 2.5 1.5B
1.5B · Alibaba
Q8_0 · 2.26GB
114 tok/s
Instant
S
DeepSeek R1 Distill 1.5B
1.5B · DeepSeek
Q8_0 · 2.26GB
114 tok/s
Instant
S
Llama 3.2 1B Instruct
1.24B · Meta
FP16 · 2.81GB
114 tok/s
Instant
S
TinyLlama 1.1B
1.1B · TinyLlama
Q8_0 · 1.59GB
114 tok/s
Instant
S
Gemma 3 1B
1B · Google
Q8_0 · 1.5GB
114 tok/s
Instant
S
Falcon 3 1B
1B · TII
Q8_0 · 2.16GB
114 tok/s
Instant
S
Qwen 2.5 0.5B
0.5B · Alibaba
Q8_0 · 1.13GB
114 tok/s
Instant
S
Danube 3 500M
0.5B · H2O.ai
Q8_0 · 1.01GB
114 tok/s
Instant
S
SmolLM2 360M
0.36B · HuggingFace
Q8_0 · 0.86GB
114 tok/s
Instant
S
SmolLM2 135M
0.135B · HuggingFace
FP16 · 0.75GB
114 tok/s
Instant
A
Mistral Small 22B
22B · Mistral AI
Q4_K_M · 12.93GB
—
Cannot run
B
Gemma 3 27B
27B · Google
Q4_K_M · 15.91GB
—
Cannot run
C
Qwen 2.5 32B
32B · Alibaba
Q4_K_M · 18.99GB
—
Cannot run
F
Llama 3.1 70B Instruct
70B · Meta
Q4_K_M · 40.1GB
—
Cannot run
Code Models16 of 16 run
Grade
Model
Best quant · VRAM
Speed
Experience
S
Qwen 2.5 Coder 14B
14B · Alibaba
Q4_K_M · 8.87GB
—
Cannot run
S
Code Llama 13B Instruct
13B · Meta
Q4_K_M · 7.83GB
—
Cannot run
S
Yi Coder 9B
9B · 01.AI
Q8_0 · 9.24GB
46 tok/s
Fast
S
CodeGemma 7B
8.5B · Google
Q8_0 · 8.95GB
46 tok/s
Fast
S
Qwen 2.5 Coder 7B
7.6B · Alibaba
Q8_0 · 8.04GB
46 tok/s
Fast
S
StarCoder2 7B
7B · BigCode
Q8_0 · 7.61GB
46 tok/s
Fast
S
Code Llama 7B
7B · Meta
Q8_0 · 7.17GB
46 tok/s
Fast
S
DeepSeek Coder 6.7B
6.7B · DeepSeek
Q8_0 · 7.17GB
46 tok/s
Fast
S
Qwen 2.5 Coder 3B
3B · Alibaba
Q8_0 · 3.87GB
78 tok/s
Instant
S
StarCoder2 3B
3B · BigCode
Q8_0 · 3.5GB
78 tok/s
Instant
S
Stable Code 3B
3B · Stability AI
Q8_0 · 3.27GB
78 tok/s
Instant
S
CodeGemma 2B
2B · Google
Q8_0 · 2.99GB
114 tok/s
Instant
S
Qwen 2.5 Coder 1.5B
1.5B · Alibaba
Q8_0 · 2.26GB
114 tok/s
Instant
S
Yi Coder 1.5B
1.5B · 01.AI
Q8_0 · 1.96GB
114 tok/s
Instant
S
DeepSeek Coder 1.3B
1.3B · DeepSeek
Q8_0 · 1.83GB
114 tok/s
Instant
S
Qwen 2.5 Coder 0.5B
0.5B · Alibaba
Q8_0 · 1.13GB
114 tok/s
Instant
Multimodal & Vision6 of 6 run
Grade
Model
Best quant · VRAM
Speed
Experience
S
LLaVA 1.6 7B
7B · LLaVA
Q8_0 · 8.5GB
46 tok/s
Fast
S
Phi-3.5 Vision
4.2B · Microsoft
Q4_K_M · 3.2GB
78 tok/s
Instant
S
PaliGemma 3B
3B · Google
Q4_K_M · 2.5GB
78 tok/s
Instant
S
Qwen2-VL 2B
2.2B · Alibaba
Q8_0 · 2.03GB
78 tok/s
Instant
S
MiniCPM-V 2.6
2B · OpenBMB
Q8_0 · 3GB
114 tok/s
Instant
S
Moondream 2
1.8B · Moondream
Q4_K_M · 1.5GB
114 tok/s
Instant
Image Generation9 of 9 run
Grade
Model
Best quant · VRAM
Speed
Experience
S
Stable Diffusion XL (CoreML)
3.5B · Stability AI
CoreML · 3.34GB
78 tok/s
Instant
S
SDXL Turbo (GGUF)
3.5B · Stability AI
Q5_0 · 5GB
78 tok/s
Instant
S
Stable Diffusion 3 Medium (GGUF)
2.5B · Stability AI
Q8_0 · 9.15GB
78 tok/s
Instant
S
Stable Diffusion 2.1 Base (CoreML)
0.86B · Stability AI / Apple
CoreML-Palettized · 1.56GB
114 tok/s
Instant
S
Stable Diffusion 1.5 (CoreML)
0.86B · Runway
CoreML-Palettized · 2.5GB
114 tok/s
Instant
S
Stable Diffusion 1.5 (GGUF)
0.86B · Runway / GPUStack
Q8_0 · 2.25GB
114 tok/s
Instant
S
Stable Diffusion 2.1 (GGUF)
0.86B · Stability AI
Q8_0 · 2.66GB
114 tok/s
Instant
B
FLUX.1 Schnell (GGUF)
12B · Black Forest Labs
Q5_0 · 14GB
—
Cannot run
B
FLUX.1 Dev (GGUF)
12B · Black Forest Labs
Q5_0 · 14GB
—
Cannot run
Speech Recognition9 of 9 run
Grade
Model
Best quant · VRAM
Speed
Experience
S
Whisper Large v3
1.55B · OpenAI
Q8_0 · 3.38GB
114 tok/s
Instant
S
Whisper Large v3 Turbo
0.81B · OpenAI
Q8_0 · 2.01GB
114 tok/s
Instant
S
Whisper Medium
0.77B · OpenAI
Q8_0 · 1.93GB
114 tok/s
Instant
S
Distil-Whisper Large v3
0.76B · HuggingFace
Q8_0 · 1.92GB
114 tok/s
Instant
S
Whisper Small
0.24B · OpenAI
Q8_0 · 0.95GB
114 tok/s
Instant
S
Whisper Base
0.074B · OpenAI
Q8_0 · 0.3GB
114 tok/s
Instant
S
Whisper Base English
0.074B · OpenAI
Q8_0 · 0.3GB
114 tok/s
Instant
S
Whisper Tiny English (Quantized)
0.039B · OpenAI
Q5_1 · 0.1GB
114 tok/s
Instant
S
Whisper Tiny
0.039B · OpenAI
Q8_0 · 0.2GB
114 tok/s
Instant
Text-to-Speech14 of 14 run
Grade
Model
Best quant · VRAM
Speed
Experience
S
Kokoro 82M TTS
0.082B · Kokoro
ONNX-Q8F16 · 0.58GB
114 tok/s
Instant
S
Piper TTS - Amy (English)
0.02B · Rhasspy
ONNX · 0.15GB
114 tok/s
Instant
S
Piper TTS - Lessac (English)
0.02B · Rhasspy
ONNX · 0.15GB
114 tok/s
Instant
S
Piper TTS - LibriTTS-R (English)
0.02B · Rhasspy
ONNX · 0.57GB
114 tok/s
Instant
S
Piper TTS - Spanish (MLS)
0.02B · Rhasspy
ONNX · 0.15GB
114 tok/s
Instant
S
Piper TTS - French (Siwis)
0.02B · Rhasspy
ONNX · 0.53GB
114 tok/s
Instant
S
Piper TTS - German (Thorsten)
0.02B · Rhasspy
ONNX · 0.15GB
114 tok/s
Instant
S
Piper TTS - Chinese (Huayan)
0.02B · Rhasspy
ONNX · 0.15GB
114 tok/s
Instant
S
Piper TTS - Japanese (Kokoro)
0.02B · Rhasspy
ONNX · 0.15GB
114 tok/s
Instant
S
Piper TTS - Korean
0.02B · Rhasspy
ONNX · 0.15GB
114 tok/s
Instant
S
Piper TTS - Russian (Irina)
0.02B · Rhasspy
ONNX · 0.15GB
114 tok/s
Instant
S
Piper TTS - Portuguese (Faber)
0.02B · Rhasspy
ONNX · 0.15GB
114 tok/s
Instant
S
Piper TTS - Italian (Riccardo)
0.02B · Rhasspy
ONNX · 0.53GB
114 tok/s
Instant
S
Piper TTS - Arabic (Kareem)
0.02B · Rhasspy
ONNX · 0.15GB
114 tok/s
Instant
Audio Generation1 of 1 run
Grade
Model
Best quant · VRAM
Speed
Experience
Embedding Models5 of 5 run
Grade
Model
Best quant · VRAM
Speed
Experience
S
BGE Large EN v1.5
0.335B · BAAI
FP16 · 1.12GB
114 tok/s
Instant
S
Nomic Embed Text v1.5
0.137B · Nomic AI
FP16 · 0.76GB
114 tok/s
Instant
S
BGE Small EN v1.5
0.033B · BAAI
Q8_0 · 0.1GB
114 tok/s
Instant
S
Snowflake Arctic Embed S
0.033B · Snowflake
Q8_0 · 0.1GB
114 tok/s
Instant
S
all-MiniLM-L6-v2
0.023B · Sentence Transformers
Q8_0 · 0.1GB
114 tok/s
Instant
Reranker Models2 of 2 run
Grade
Model
Best quant · VRAM
Speed
Experience
How these grades work
Grades are computed from the ratio of NVIDIA GeForce RTX 4060 Ti 16GB's effective VRAM (16GB) to each model's required VRAM at its highest-quality quantization that still fits. S: comfortable headroom (1.5×+). A: smooth (1.2×+). B: tight but works (1.0×+). C: partial offload (0.8×+). D: heavy offload (0.5×+). F: cannot run.
Tokens-per-second figures are based on real community benchmarks (llama.cpp discussions, MLX, vLLM) scaled to model size. Real-world numbers vary with batch size, context length, and driver version.