GPU requirements

Local LLM compatibility by GPU VRAM

Choose a GPU preset to see clean Q4 fits, RAM-offload cases, and models that are too large for local inference.

6 GB VRAM entry GPU

Small local chat models only · GTX 1660, RTX 2060 6GB

8 GB VRAM mainstream GPU

Good for 7B/8B Q4 models · RTX 3060 Ti, RTX 4060, RTX 3070

10 GB VRAM older high-end GPU

Strong 7B/8B, tight for 14B · RTX 3080 10GB

12 GB VRAM local agent GPU

Local routing, agents, and model testing · RTX 3060 12GB, RTX 4070

16 GB VRAM creator GPU

Comfortable 14B Q4, some 20B-class models · RTX 4060 Ti 16GB, RTX 4080

24 GB VRAM homelab workstation

Heavy local models and homelab inference · RTX 3090, RTX 4090

48 GB VRAM workstation

Large local models and long context · RTX A6000, L40S 48GB

Apple Silicon 32 GB unified memory

Unified memory; not directly comparable to discrete VRAM · M2 Max 32GB, M3 Max 36GB