GPU requirements
Choose a GPU preset to see clean Q4 fits, RAM-offload cases, and models that are too large for local inference.
Small local chat models only · GTX 1660, RTX 2060 6GB
Good for 7B/8B Q4 models · RTX 3060 Ti, RTX 4060, RTX 3070
Strong 7B/8B, tight for 14B · RTX 3080 10GB
Local routing, agents, and model testing · RTX 3060 12GB, RTX 4070
Comfortable 14B Q4, some 20B-class models · RTX 4060 Ti 16GB, RTX 4080
Heavy local models and homelab inference · RTX 3090, RTX 4090
Large local models and long context · RTX A6000, L40S 48GB
Unified memory; not directly comparable to discrete VRAM · M2 Max 32GB, M3 Max 36GB