Local LLM model fit
Qwen3.6 27B is a 27B Qwen3.6 model. This page estimates Q4 VRAM fit, Ollama command, context planning, and fallback choices for common local AI GPUs.
Newest 27B-class local multimodal coding, agent, and reasoning workloads on 24GB-class GPUs. Weakness: Long multimodal context can still eat the headroom on a single 24GB GPU.
| Hardware | Examples | Clean capacity | Q4 need | Status | Calculator |
|---|---|---|---|---|---|
| 6 GB VRAM entry GPU | GTX 1660, RTX 2060 6GB | 4.5 GB clean VRAM | 17 GB | RAM offload | Open calculator |
| 8 GB VRAM mainstream GPU | RTX 3060 Ti, RTX 4060, RTX 3070 | 6.5 GB clean VRAM | 17 GB | RAM offload | Open calculator |
| 10 GB VRAM older high-end GPU | RTX 3080 10GB | 8.5 GB clean VRAM | 17 GB | RAM offload | Open calculator |
| 12 GB VRAM local agent GPU | RTX 3060 12GB, RTX 4070, RTX 5070 | 10.5 GB clean VRAM | 17 GB | RAM offload | Open calculator |
| 16 GB VRAM creator GPU | RTX 4060 Ti 16GB, RTX 4080, RTX 5070 Ti, RTX 5080 | 14.5 GB clean VRAM | 17 GB | RAM offload | Open calculator |
| 24 GB VRAM homelab workstation | RTX 3090, RTX 4090 | 22.5 GB clean VRAM | 17 GB | Runs locally | Open calculator |
| 32 GB VRAM Blackwell workstation | RTX 5090 | 30.5 GB clean VRAM | 17 GB | Runs locally | Open calculator |
| 48 GB VRAM workstation | RTX A6000, L40S 48GB | 46.5 GB clean VRAM | 17 GB | Runs locally | Open calculator |
| Apple Silicon 32 GB unified memory | M2 Max 32GB, M3 Max 36GB | 26 GB unified | 17 GB | Runs locally | Open calculator |
| Quantization | Estimated memory | Use case |
|---|---|---|
| Q4 / 4-bit | 17 GB | Default local inference balance |
| Q5 / 5-bit | 21.3 GB | Better quality, more VRAM |
| Q8 / 8-bit | 34 GB | High quality, much more VRAM |
| FP16 / 16-bit | 68 GB | Mostly workstation/server use |
This is a practical planning estimate, not a benchmark. Real memory use changes with backend, context length, KV cache, quantization file, drivers, and offloading settings.
27B · 24GB-class multimodal agent, coding assistant, and reasoning workloads
27B · High-quality multimodal local assistant on 24GB GPUs
24B · Software engineering agents, repo navigation, patch planning, and local coding workflows
20B · Local reasoning, agent planning, and tool-use workflows on 16GB+ GPUs