8GB VRAM guide

Best local LLMs for 8GB VRAM

For 8GB GPUs, start with small Q4 models that fit cleanly. Larger yellow models may run with offload, but the local experience can become slow or fragile.

Open 8GB calculator preset

Clean Q4 fits8

Offload candidates21

Best first routesmall local

Fallback triggerlong context

Start green

Use clean Q4 fits first for responsive local testing.

Keep context short

Long prompts, tools, and RAG increase memory pressure.

Scale later

If the model is yellow or red, use a smaller model, larger GPU, or cloud/API fallback.

Recommended 8GB local starts

Generated from current Q4 fit data.

6 models

Qwen3 4B Thinking 2507

Runs locally Q4 about 3.2 GB qwen3:4b-thinking-2507-q4_K_M CodingAgentsReasoning

Good 8GB starting point when you want a clean local Q4 fit before trying larger models.

Open calculator with this setup ->

Gemma 3 4B

Runs locally Q4 about 3.5 GB gemma3:4b AgentsVisionChat

Good 8GB starting point when you want a clean local Q4 fit before trying larger models.

Open calculator with this setup ->

Qwen2.5 Coder 7B

Runs locally Q4 about 5.5 GB qwen2.5-coder:7b CodingAgentsChat

Good 8GB starting point when you want a clean local Q4 fit before trying larger models.

Open calculator with this setup ->

Mistral 7B

Runs locally Q4 about 5.5 GB mistral:7b AgentsChat

Good 8GB starting point when you want a clean local Q4 fit before trying larger models.

Open calculator with this setup ->

Llama 3.1 8B Instruct

Runs locally Q4 about 6 GB llama3.1:8b AgentsChat

Good 8GB starting point when you want a clean local Q4 fit before trying larger models.

Open calculator with this setup ->

Qwen3 8B

Runs locally Q4 about 6 GB qwen3:8b CodingAgentsReasoning

Good 8GB starting point when you want a clean local Q4 fit before trying larger models.

Open calculator with this setup ->