Compatibility Input
Use presets or override VRAM/RAM manually.
Q4 / 4-bit
Default local inference balance.
Long context increases KV-cache pressure.
Loading
Loading model data
The calculator will run once the local dataset is available.
Need
-- GB
VRAM fit
-- GB
Speed
--
Ollama
ollama run ...
Best alternatives for this hardware
Need a cloud route instead?
If the model does not fit locally, compare API pricing, cache discounts, context limits, and routing suggestions on apiroute.dev.
Compatibility is an estimate for planning. Real memory and speed depend on backend, context length, KV cache, drivers, quantization file, and offloading settings.