Local versus cloud

Local AI vs cloud API: when to use which route

Use local inference when the model fits cleanly and privacy, latency, or no-per-token cost matters. Use cloud/API fallback when hardware is red, context is long, or reliability matters more than local control.

Open compatibility calculator

24GB clean fits27

Commercial rankingfalse

Cloud cost companionapiroute.dev

Red examples on 12GBgpt-oss-safeguard 120B, Devstral 2 123B, GLM-5.2

Green

Prefer local inference first when there is clean headroom.

Yellow

Reduce context, lower quantization, or expect offload slowdown.

Red

Use a smaller local model, larger hardware, or cloud/API fallback.

Fallback policy

Commercial options are disclosed follow-ups, never ranking input.

3 options

RunPod cloud GPU fallback

RunPodreferral_creditranking: false

Referral link. It may generate credits or commission for the site owner. It must not change the compatibility result or model ranking.

Open guide ->

Cloud/API cost comparison

apiroute.devowned_companion_projectranking: false

Owned companion project. It may contain separately disclosed commercial provider options.

Open guide ->

Agent usage license

apiroute.dev / localai.apiroute.devpaid_product_conceptranking: false

Concept only. No paid license is currently active.

Open guide ->