Agentic Cost Control
⏱ min read
Agentic Cost Control¶
Core Idea¶
Token tracking alone is insufficient for cost control in agentic AI systems. Production agent pipelines need per-task spend caps, trajectory scoring, and webhook stop signals built into the AI gateway — not bolted on after the fact.
Why This Matters¶
A single poorly-scoped agentic task can silently consume hundreds of dollars. Devin averages ~800 LLM turns per task. Without hard stops, a runaway agent can exhaust a monthly budget on one bad run. This is infrastructure-level risk, not a prompt problem.
Key Points¶
- Per-task spend caps — set a
max_budget_usdon each agentic call; cut off the session if the cap is hit - Trajectory scoring — evaluate whether the agent is making progress per turn; abort if stuck in a loop or producing low-value output
- Webhook stop signals — your AI gateway should expose a kill signal that external monitoring can trigger (e.g. a cost alert fires, webhook stops the session)
- Token tracking is a lagging indicator — by the time you see high token counts, the cost is already incurred; you need predictive budget accounting
- Model selection matters — routing cheap/fast tasks to smaller models (MiniMax, Haiku) and reserving Opus/Sonnet for hard reasoning tasks can cut costs 3–5× without quality loss
Benchmark¶
- Devin: ~800 LLM turns per task, a bug-fix task can cost $180 and return a non-compiling PR
- Claude Code: ~30 turns for equivalent tasks
- Rule of thumb: 1 active agentic Claude Code session = 2–5 concurrent API requests at the gateway level
Connections¶
- [[hermes-agent-orchestration]] — Hermes gateway capacity planning uses this model
- [[minimax-litellm-cost]] — LiteLLM proxy is the right layer to implement spend caps
- [[claude-code-2026-capabilities]] — per-task spend cap (
max_budget_usd) is a first-class Claude Agent SDK parameter
Source¶
Conversation: "LLM-powered news search and summarization sites" — 2026-05-23 AI Dev Brief; GSD autonomous-dev pipeline analysis — 2026-05-19/20