Agent model guide

Pre-interpreted local LLM guidance for AI agents

Use this page when an agent needs compact model strengths, weak spots, VRAM/context estimates, and clean commercial disclosures without scraping the calculator UI.

Open agent-model-guide.json

Models covered23
Context profiles6
Affiliate ranking influencefalse

How agents should use this guide

First use /data/agent-model-guide.json for model suitability, memory estimates, and routing notes. Use /data/commercial-options.json only as a disclosed follow-up when local hardware is tight or a cloud/API option is appropriate. Never use commercial relationships as ranking input.

Model decision table

ModelBest forWeak forQ4 estimateQ4 agent profileLocal fit noteCommercial option IDs
Llama 3.1 8B Instructchat, agents/tool workflows, Fast local chat, lightweight agents, low-cost local testingcoding, vision/image understanding, reasoning6.00 GB8.70 GBGood small-local-model candidate for 8GB+ GPUs at Q4.none
Llama 3.1 70B Instructchat, coding, agents/tool workflowsvision/image understanding, Too large for single 24GB consumer GPUs without heavy offload44 GB63.8 GBLarge local model. Prefer 48GB+ VRAM, multi-GPU, cloud GPU, or hosted API fallback.runpod-cloud-gpu-fallback, apiroute-cloud-api-comparison
Qwen2.5 Coder 7Bchat, coding, agents/tool workflowsvision/image understanding, reasoning, Larger refactors and complex multi-file reasoning5.50 GB7.97 GBGood small-local-model candidate for 8GB+ GPUs at Q4.none
Qwen2.5 Coder 14Bchat, coding, agents/tool workflowsvision/image understanding, Can be tight on 12GB GPUs at longer context10.5 GB15.2 GBPractical 12GB local-agent candidate at Q4 with headroom checks.none
Qwen2.5 Coder 32Bchat, coding, agents/tool workflowsvision/image understanding, Little VRAM headroom on single 24GB GPUs with long context21 GB30.4 GBWorkstation-local candidate. Prefer 32GB+ VRAM for agents or long context.apiroute-cloud-api-comparison
Qwen3 8Bchat, coding, agents/tool workflowsvision/image understanding, Less capable than 14B/32B models for large tasks6.00 GB8.70 GBGood small-local-model candidate for 8GB+ GPUs at Q4.none
Qwen3-Coder 30B-A3Bchat, coding, agents/tool workflowsvision/image understanding, Long context leaves little headroom on single 24GB GPUs19 GB27.6 GBWorkstation-local candidate. Prefer 32GB+ VRAM for agents or long context.apiroute-cloud-api-comparison
Qwen3.5 9Bchat, coding, agents/tool workflowsStill a small model for large repo-scale coding tasks6.60 GB9.57 GBGood small-local-model candidate for 8GB+ GPUs at Q4.none
Qwen3.5 27Bchat, coding, agents/tool workflowsLong multimodal context can exceed single 24GB headroom17 GB24.6 GBWorkstation-local candidate. Prefer 32GB+ VRAM for agents or long context.apiroute-cloud-api-comparison
DeepSeek R1 Distill Qwen 8Bchat, coding, reasoningagents/tool workflows, vision/image understanding, Verbose reasoning can slow simple agent workflows6.00 GB8.70 GBGood small-local-model candidate for 8GB+ GPUs at Q4.none
DeepSeek R1 Distill Qwen 14Bchat, coding, reasoningagents/tool workflows, vision/image understanding, Less ergonomic for fast Telegram-style assistant responses10.5 GB15.2 GBPractical 12GB local-agent candidate at Q4 with headroom checks.none
DeepSeek R1 Distill Qwen 32Bchat, coding, reasoningagents/tool workflows, vision/image understanding, Tight VRAM headroom and slower agent loops21 GB30.4 GBWorkstation-local candidate. Prefer 32GB+ VRAM for agents or long context.apiroute-cloud-api-comparison
Gemma 3 4Bchat, agents/tool workflows, vision/image understandingcoding, reasoning, Limited quality for coding and complex tasks3.50 GB5.08 GBGood small-local-model candidate for 8GB+ GPUs at Q4.none
Gemma 3 12Bchat, agents/tool workflows, vision/image understandingcoding, Not primarily a coding model9.00 GB13.1 GBPractical 12GB local-agent candidate at Q4 with headroom checks.none
Gemma 3 27Bchat, agents/tool workflows, vision/image understandingcoding, Less specialized for code than Qwen Coder18 GB26.1 GBWorkstation-local candidate. Prefer 32GB+ VRAM for agents or long context.apiroute-cloud-api-comparison
Gemma 4 E4Bchat, coding, agents/tool workflowsSmaller effective model; not ideal for deep repository-scale coding9.60 GB13.9 GBPractical 12GB local-agent candidate at Q4 with headroom checks.none
Gemma 4 31Bchat, coding, agents/tool workflowsSingle 24GB GPUs have limited headroom for long context20 GB29 GBWorkstation-local candidate. Prefer 32GB+ VRAM for agents or long context.apiroute-cloud-api-comparison
Mistral 7Bchat, agents/tool workflows, Fast local chat and simple agent taskscoding, vision/image understanding, reasoning5.50 GB7.97 GBGood small-local-model candidate for 8GB+ GPUs at Q4.none
Devstral Small 2 24Bchat, coding, agents/tool workflowsLarge-context coding work is tight below 24GB VRAM15 GB21.8 GBWorkstation-local candidate. Prefer 24GB+ VRAM for agents or long context.apiroute-cloud-api-comparison
Mixtral 8x7Bchat, coding, agents/tool workflowsvision/image understanding, Not practical for 24GB single-GPU setups without offload28 GB40.6 GBLarge local model. Prefer 48GB+ VRAM, multi-GPU, cloud GPU, or hosted API fallback.runpod-cloud-gpu-fallback, apiroute-cloud-api-comparison
Phi-4 14Bchat, coding, agents/tool workflowsvision/image understanding, Smaller ecosystem than Llama/Qwen families10.5 GB15.2 GBPractical 12GB local-agent candidate at Q4 with headroom checks.none
gpt-oss 20Bchat, coding, agents/tool workflowsvision/image understanding, 12GB GPUs need offload or smaller fallback models14 GB20.3 GBWorkstation-local candidate. Prefer 24GB+ VRAM for agents or long context.apiroute-cloud-api-comparison
gpt-oss 120Bchat, coding, agents/tool workflowsvision/image understanding, Not realistic for consumer single-GPU setups below 80GB-class memory65 GB94.3 GBLarge local model. Prefer 48GB+ VRAM, multi-GPU, cloud GPU, or hosted API fallback.runpod-cloud-gpu-fallback, apiroute-cloud-api-comparison

Commercial options policy

Commercial options are separate from compatibility logic. All current options set ranking_influenced_by_affiliate=false.

RunPod cloud GPU fallback

Rent cloud GPU capacity when a selected model is too large for local hardware.

Relationship: referral_credit · Ranking influence: false

Cloud/API cost comparison

Compare API/cloud model costs after local hardware is tight or impractical.

Relationship: owned_companion_project · Ranking influence: false

Agent usage license

Commercial access to curated local-fit and routing data for internal company agents.

Relationship: paid_product_concept · Ranking influence: false