Local LLM model fit

Can my GPU run Qwen3-Coder 30B-A3B?

Qwen3-Coder 30B-A3B is a 30B Qwen3-Coder model. This page estimates Q4 VRAM fit, Ollama command, context planning, and fallback choices for common local AI GPUs.

Check Qwen3-Coder 30B-A3B in the calculator

Q4 runtime estimate19 GB

Ollama commandollama run qwen3-coder:30b

Recommended GPURTX 3090/4090 24GB minimum, 32GB+ preferred

Best use

Agentic coding, repository-scale local code review, and tool-heavy development loops. Weakness: Long context leaves little headroom on single 24GB GPUs.

GPU fit table

Hardware	Examples	Clean capacity	Q4 need	Status	Calculator
6 GB VRAM entry GPU	GTX 1660, RTX 2060 6GB	4.5 GB clean VRAM	19 GB	RAM offload	Open calculator
8 GB VRAM mainstream GPU	RTX 3060 Ti, RTX 4060, RTX 3070	6.5 GB clean VRAM	19 GB	RAM offload	Open calculator
10 GB VRAM older high-end GPU	RTX 3080 10GB	8.5 GB clean VRAM	19 GB	RAM offload	Open calculator
12 GB VRAM local agent GPU	RTX 3060 12GB, RTX 4070, RTX 5070	10.5 GB clean VRAM	19 GB	RAM offload	Open calculator
16 GB VRAM creator GPU	RTX 4060 Ti 16GB, RTX 4080, RTX 5070 Ti, RTX 5080	14.5 GB clean VRAM	19 GB	RAM offload	Open calculator
24 GB VRAM homelab workstation	RTX 3090, RTX 4090	22.5 GB clean VRAM	19 GB	Runs locally	Open calculator
32 GB VRAM Blackwell workstation	RTX 5090	30.5 GB clean VRAM	19 GB	Runs locally	Open calculator
48 GB VRAM workstation	RTX A6000, L40S 48GB	46.5 GB clean VRAM	19 GB	Runs locally	Open calculator
Apple Silicon 32 GB unified memory	M2 Max 32GB, M3 Max 36GB	26 GB unified	19 GB	Runs locally	Open calculator
Apple Silicon 256 GB unified memory	Mac Studio M3 Ultra 256GB, Mac Studio M4 Ultra 256GB	250 GB unified	19 GB	Runs locally	Open calculator

Quantization memory estimate on a 12GB GPU preset

Quantization	Estimated memory	Use case
Q4 / 4-bit	19 GB	Default local inference balance
Q5 / 5-bit	23.8 GB	Better quality, more VRAM
Q8 / 8-bit	38 GB	High quality, much more VRAM
FP16 / 16-bit	76 GB	Mostly workstation/server use

Data sources and confidence

This is a practical planning estimate, not a benchmark. Real memory use changes with backend, context length, KV cache, quantization file, drivers, and offloading settings.

Verified

2026-05-26

Confidence

high

Cloud fallback

Compare API costs on apiroute.dev

Similar local models

GLM-4.7-Flash

30B · Local coding agents, terminal workflows, tool-heavy engineering tasks, and 30B-class reasoning on 24GB+ GPUs

Qwen3-VL 30B-A3B Instruct

30B · Higher-quality local multimodal reasoning, screenshot analysis, document/image extraction, and GUI-agent planning

Qwen3.5 27B

27B · 24GB-class multimodal agent, coding assistant, and reasoning workloads

Qwen3.6 27B

27B · Newest 27B-class local multimodal coding, agent, and reasoning workloads on 24GB-class GPUs