Local AI Compatibility

Can my GPU run this LLM?

Pick hardware and model. Get a VRAM fit check with an Ollama command. If local inference is not the right route, compare cloud API costs on apiroute.dev.

i

Use presets or override VRAM/RAM manually.

i
i
i
i
i
Q4 / 4-bit

Default local inference balance.

i
Context planning

Long context increases KV-cache pressure.

i
Choose a purpose The app fit check loads with the application dataset.
Local fit--
VRAM target--
Route--
Loading

Loading model data

The calculator will run once the local dataset is available.

Need -- GB
VRAM fit -- GB
Speed --
i
Ollama
ollama run ...

Compatibility is an estimate for planning. Real memory and speed depend on backend, context length, KV cache, drivers, quantization file, and offloading settings.