GPU compatibility

What LLMs can run on 32 GB VRAM Blackwell workstation?

Ultra high-end consumer workstation for 30B+ models with extra headroom. Examples: RTX 5090.

Open calculator with this GPU preset

Clean local fits22
Offload / slower2
Too large0
Planning capacity30.5 GB clean VRAM

Recommended route: local-first

This preset has enough clean Q4 headroom for most curated local models in this dataset.

Use locally forcoding assistants, chat, routing, agents, and larger 14B-32B class models depending on context length.
Watch out forvery large models, long context windows, and parallel workloads can still exceed practical memory.
Fallback triggerUse cloud/API when you need 70B+ models, long context, or reliable multi-user throughput.
Green

22 models fit inside the clean planning capacity.

Yellow

2 models can run with RAM/offload tradeoffs.

Red

Examples to avoid locally: None in this dataset.

Best clean fits

Start here for responsive local inference.
22 models

Possible with offload

Use only when slower generation and tighter settings are acceptable.
2 models

Full Q4 model fit table

ModelSizeQ4 needStatusCalculator
Gemma 3 4B4B3.5 GBRuns locallyOpen calculator
Qwen2.5 Coder 7B7B5.5 GBRuns locallyOpen calculator
Mistral 7B7B5.5 GBRuns locallyOpen calculator
Llama 3.1 8B Instruct8B6 GBRuns locallyOpen calculator
Qwen3 8B8B6 GBRuns locallyOpen calculator
DeepSeek-R1-0528-Qwen3-8B8B6 GBRuns locallyOpen calculator
Qwen3.5 9B9B6.6 GBRuns locallyOpen calculator
Gemma 3 12B12B9 GBRuns locallyOpen calculator
Gemma 4 E4B4B9.6 GBRuns locallyOpen calculator
Qwen2.5 Coder 14B14B10.5 GBRuns locallyOpen calculator
DeepSeek R1 Distill Qwen 14B14B10.5 GBRuns locallyOpen calculator
Phi-4 14B14B10.5 GBRuns locallyOpen calculator
gpt-oss 20B20B14 GBRuns locallyOpen calculator
Devstral Small 2 24B24B15 GBRuns locallyOpen calculator
Qwen3.5 27B27B17 GBRuns locallyOpen calculator
Qwen3.6 27B27B17 GBRuns locallyOpen calculator
Gemma 3 27B27B18 GBRuns locallyOpen calculator
Qwen3-Coder 30B-A3B30B19 GBRuns locallyOpen calculator
Gemma 4 31B31B20 GBRuns locallyOpen calculator
Qwen2.5 Coder 32B32B21 GBRuns locallyOpen calculator
DeepSeek R1 Distill Qwen 32B32B21 GBRuns locallyOpen calculator
Mixtral 8x7B46.7B28 GBRuns locallyOpen calculator
Llama 3.1 70B Instruct70B44 GBRAM offloadOpen calculator
gpt-oss 120B120B65 GBRAM offloadOpen calculator