8GB VRAM guide

Best local LLMs for 8GB VRAM

For 8GB GPUs, start with small Q4 models that fit cleanly. Larger yellow models may run with offload, but the local experience can become slow or fragile.

Open 8GB calculator preset

Clean Q4 fits6
Offload candidates15
Best first routesmall local
Fallback triggerlong context
Start green

Use clean Q4 fits first for responsive local testing.

Keep context short

Long prompts, tools, and RAG increase memory pressure.

Scale later

If the model is yellow or red, use a smaller model, larger GPU, or cloud/API fallback.

Recommended 8GB local starts

Generated from current Q4 fit data.
6 models