RTX 5090 guide

Can RTX 5090 run local LLMs?

The RTX 5090's 32GB VRAM class is a strong local-AI workstation tier. This guide uses the same 1.5GB system reserve and Q4 planning estimates as the calculator, so it treats about 30.5GB as clean practical VRAM.

Open RTX 5090 calculator preset

Clean Q4 fits29

Usable VRAM estimate30.5 GB

Best routelocal-first

Fallback trigger70B+ / long context

Good fit

27B-32B Q4 models have room for local coding, chat, and agent tests.

Qwen3.6 27B

Qwen3.6 27B is a clean RTX 5090 fit at about 17 GB Q4 runtime before long-context overhead.

Watch context

Long-context agent loops still increase KV-cache pressure.

Cloud fallback

Use apiroute.dev when the workload needs frontier hosted APIs or very large models.

Strong RTX 5090 candidates

Generated from current Q4 fit data.

8 models

Qwen3.6 27B

Runs locally Q4 about 17 GB qwen3.6:27b CodingAgentsReasoning

Clean 32GB Q4 fit with useful headroom for agent, coding, or reasoning workflows.

Open calculator with this setup ->

Qwen3 4B Thinking 2507

Runs locally Q4 about 3.2 GB qwen3:4b-thinking-2507-q4_K_M CodingAgentsReasoning

Clean 32GB Q4 fit with useful headroom for agent, coding, or reasoning workflows.

Open calculator with this setup ->

Gemma 3 4B

Runs locally Q4 about 3.5 GB gemma3:4b AgentsVisionChat

Clean 32GB Q4 fit with useful headroom for agent, coding, or reasoning workflows.

Open calculator with this setup ->

Qwen2.5 Coder 7B

Runs locally Q4 about 5.5 GB qwen2.5-coder:7b CodingAgentsChat

Clean 32GB Q4 fit with useful headroom for agent, coding, or reasoning workflows.

Open calculator with this setup ->

Mistral 7B

Runs locally Q4 about 5.5 GB mistral:7b AgentsChat

Clean 32GB Q4 fit with useful headroom for agent, coding, or reasoning workflows.

Open calculator with this setup ->

Llama 3.1 8B Instruct

Runs locally Q4 about 6 GB llama3.1:8b AgentsChat

Clean 32GB Q4 fit with useful headroom for agent, coding, or reasoning workflows.

Open calculator with this setup ->

Qwen3 8B

Runs locally Q4 about 6 GB qwen3:8b CodingAgentsReasoning

Clean 32GB Q4 fit with useful headroom for agent, coding, or reasoning workflows.

Open calculator with this setup ->

DeepSeek-R1-0528-Qwen3-8B

Runs locally Q4 about 6 GB deepseek-r1:8b CodingReasoningChat

Clean 32GB Q4 fit with useful headroom for agent, coding, or reasoning workflows.

Open calculator with this setup ->