Local versus cloud

Local AI vs cloud API: when to use which route

Use local inference when the model fits cleanly and privacy, latency, or no-per-token cost matters. Use cloud/API fallback when hardware is red, context is long, or reliability matters more than local control.

Open compatibility calculator

24GB clean fits20
Commercial rankingfalse
Cloud cost companionapiroute.dev
Red examples on 12GBLlama 3.1 70B Instruct, gpt-oss 120B
Green

Prefer local inference first when the selected model, quantization, and context profile have clean headroom.

Yellow

Explain the local tradeoff first: reduce context, lower quantization, choose a smaller local model, or expect RAM/offload slowdown.

Red

Recommend smaller local models or larger local hardware before paid alternatives.

Fallback policy

Commercial options are disclosed follow-ups, never ranking input.
3 options