What does Can My GPU Run This LLM estimate?

It estimates whether selected local open-weight LLMs fit a GPU or hardware preset using curated model metadata, quantization assumptions, context profiles, and practical VRAM planning rules.

Are the compatibility results benchmarks?

No. The data is practical planning guidance, not benchmark proof. Real memory use and speed depend on runtime, quantization file, context length, KV cache, drivers, and offloading settings.

Should agents prefer JSON or the HTML page?

Agents should start with llms.txt, then use data/agent-model-guide.json and data/models.json for programmatic decisions. Human-readable guides are supporting summaries.

Can My GPU Run This LLM?

Loading model data

The calculator will run once the local dataset is available.

Terminal command i

Ollama

ollama run ...

Need -- GB

Available -- GB

Class --

Speed --

Selected route Choose use case and hardware The route updates after the local dataset loads.

Recommended -- --

What you need -- -- · --

Memory breakdown and model fit

Model weights

-- GB

KV / context

-- GB

Runtime overhead

-- GB

Model profile

Model family--

Context--

Confidence--

Model fit

Best for--

Weakness--

Recommended GPU--

Tune exact fit Override VRAM, RAM, context length, quantization, or model.

GPU VRAM (GB) i

System RAM (GB) i

Manual model i

Quantization i

Q4 / 4-bit

Default local inference balance.

Context load i

Context planning

Long context increases KV-cache pressure.

Try smaller models that fit Quick fallback list for the selected hardware.

Need a cloud route instead? When local VRAM is too small, compare API and rented GPU options.

Local models are the primary route when hardware fits: no API cost, no rate limits. If the model does not fit locally, compare API pricing, cache discounts, context limits, and routing suggestions on apiroute.dev. Local open-weight models can also act as a cost fallback when frontier API pricing rises or availability changes.

Compare cloud APIs

Compatibility is a planning estimate, not a benchmark. KV-cache and VRAM figures are heuristic; real memory and speed depend on backend, context length, drivers, quantization file, and offloading settings. Local models work well as a practical cost fallback when frontier API pricing or availability changes.

Can my GPU run this LLM?

Pick three presets. Read the answer.

Loading model data

Model profile

Model fit

Need rented VRAM now?