Question 1

Can I run Llama 3.1 70B Instruct on my device?

Accepted Answer

Llama 3.1 70B Instruct requires a minimum of 40.1GB VRAM. Use RunThisModel to check your specific hardware compatibility and find the best quantization for your device.

Question 2

How much VRAM does Llama 3.1 70B Instruct need?

Accepted Answer

Llama 3.1 70B Instruct needs 40.1GB VRAM at minimum (Q4_K_M quantization). Higher quality quantizations need more: Q4_K_M: 40.1GB, Q5_K_M: 50GB, Q8_0: 76GB, FP16: 142GB.

Question 3

How do I download Llama 3.1 70B Instruct?

Accepted Answer

You can download Llama 3.1 70B Instruct in GGUF format from HuggingFace (39.6GB minimum). Use the RunThisModel iOS app to download and run it directly on your device, or download manually from HuggingFace.

Question 4

Can Llama 3.1 70B Instruct run on iPhone?

Accepted Answer

Llama 3.1 70B Instruct at 70B parameters is too large for most iPhones. Consider using an iPad with M-series chip or Mac with Apple Silicon.

Quantization	Bits	File Size	VRAM Needed	RAM Needed	Quality
Q4_K_M	4.5	39.6 GB	40.1 GB	40.6 GB	85%
Q5_K_M	5.5	48 GB	50 GB	56 GB	90%
Q8_0	8	74 GB	76 GB	80 GB	98%
FP16	16	140 GB	142 GB	148 GB	100%

Llama 3.1 70B Instruct

Check Your Hardware

Quantization Options

Measured Inference Speed

Download & Run

See It In Action

Frequently Asked Questions