The MCP Token Tax — Your Agent Is Burning Tokens Before It Starts

Prefer to watch or listen? ▶ YouTube ✈ Telegram

The first time I ran a token count on an MCP-heavy agent setup, I thought I'd broken the counter.

The user turn was one sentence — "summarize yesterday's sales" — maybe 6 tokens. The system prompt was a few hundred more. The tools the agent would actually use? Two. Both simple. A query and a format call.

Billable input tokens for the first turn: 11,400.

The other 10,800 tokens were tool schemas. Descriptions, argument types, return types, enum values, the JSON Schema drafts around each one, all loaded into context for the model's benefit. Ten MCP servers × an average of 7 tools each × a couple hundred tokens per tool definition. Most of them the agent was never going to touch.

This post is about the quiet cost structure of that — the MCP token tax — and why most teams ship it without ever measuring it. It's also about when paying the tax is clearly worth it, when it clearly isn't, and the three or four things you can do to reduce it without giving up what MCP is actually good for.

Where The Tokens Go

A single MCP tool entry in the context, in typical form, looks something like this:

{
  "name": "create_calendar_event",
  "description": "Create a new event on the user's primary calendar. Use this when the user asks to schedule, book, or add an event. Always confirm the timezone.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "title":       {"type": "string", "description": "Event title"},
      "start_time":  {"type": "string", "format": "date-time"},
      "end_time":    {"type": "string", "format": "date-time"},
      "attendees":   {"type": "array", "items": {"type": "string"}},
      "location":    {"type": "string"},
      "description": {"type": "string"}
    },
    "required": ["title", "start_time", "end_time"]
  }
}

That's ~160 tokens. A simple tool. A Google Calendar MCP server exposes more than one of these — list events, get event, update event, delete event, check free/busy, list calendars. Suddenly one server is ~1,200 tokens.

Now multiply by a realistic stack:

Server	Tools	~Tokens
Calendar	6	1,000
Gmail	5	900
Drive	7	1,200
Postgres	4	700
Slack	8	1,500
Notion	6	1,100
GitHub	12	2,400
Jira	6	1,200
Filesystem	6	900
Custom app	5	800
Total	65	~11,700

Eleven thousand input tokens, loaded every session, whether the agent's job that day is "check my calendar" or "summarize the PR." Most models will use 1 to 3 of those tools. You're paying to put the other 62 in front of the model in case it needs them.

The Arithmetic, Brutal Edition

Sticker prices, early 2026:

Claude Sonnet 4.6 — ~$3 / 1M input tokens, ~$0.30 / 1M cached input tokens.
Gemini 1.5 Pro — ~$1.25 / 1M input tokens, ~$0.3125 / 1M cached.
Gemini 2.0 Flash — ~$0.075 / 1M input tokens.

Take the 11,700-token tool prelude at Sonnet 4.6, no caching, 10,000 sessions per day:

11,700 * 10,000 = 117M tokens/day
117M * $3/M = $351/day
~$128,000/year

That's the uncached worst case. Most teams don't live there, because prompt caching reduces the repeated-reads cost dramatically. Let's be honest about that, too.

With Anthropic prompt caching, a stable tool prefix is written once (write = 25% premium over input) and read on subsequent calls at 10% of normal input cost:

Cache write (once per cache lifetime, ~5 min): 11,700 * $3.75/M = $0.044
Cache reads (9,999 queries): 11,700 * 9,999 * $0.30/M = $35.08/day
~$12,800/year

Much better. But still five figures annually, for tool schemas the agent mostly doesn't use.

And caching assumes your tool prefix is stable. In practice, the minute your tool set changes — a new MCP server connected, a version bump — the cache invalidates and the next few calls are priced at full rate. Teams that adopt MCP aggressively and iterate on their tool mix end up hitting cache-write pricing often enough that the "10× discount" narrative stops being true.

What You're Actually Buying

MCP evangelists will (correctly) push back: "The token tax buys you something real." It does. Let's name it honestly, because the case for MCP is only interesting when you've quantified the cost first.

1. Runtime tool discovery. Your agent can be handed a brand-new MCP server and figure out what it does without redeploying. For agent platforms that need to integrate with unknown downstream systems (Claude Desktop, Cursor, vendor tools), this is the feature — it's why the protocol exists. For a single-purpose agent you own end-to-end, it's a feature you're not using.

2. Stateful sessions. MCP maintains state across calls in a way ad-hoc REST tool-calls don't. A multi-step workflow benefits from this; a one-shot lookup doesn't.

3. Portability. An MCP server works with every MCP-compatible client. If you expose your data as an MCP server, Claude Desktop, Cursor, your own agent, a hypothetical future one — they all just work. That's real, durable value for infrastructure-shaped projects.

4. Typed I/O. Structured arguments in, structured results out, validated on both ends. You can do this with REST; MCP makes it default.

If any of those matter for your use case, the token tax is cheap insurance. If none of them do — and for most in-house agents on a known set of actions, none of them do — you're paying for features you don't use.

The Three Scenarios

Scenario A — in-house agent, known action set, one model

You're building an agent inside your product. You know the 4 things it needs to do. You control the model, the prompt, the deployment.

MCP is almost certainly overkill. You're paying ~10,000 tokens per session for runtime discoverability you don't need and portability you'll never use. Ship REST endpoints, document them in the system prompt, move on. Your agent will be faster, cheaper, and easier to debug.

Scenario B — multi-tenant platform, many agent clients

You're building something like Claude Desktop, Cursor, a LangGraph-based platform, or an internal agent framework that different teams plug into. You don't know in advance which tools each instance needs.

MCP is the right call. The token tax is the entry fee for the interoperability you're buying. Without it you're reinventing a tool-discovery protocol badly. Pay the tax, use prompt caching aggressively, invest in lazy-load optimizations (see below).

Scenario C — hybrid, some tools dynamic, most static

You know the 5 core tools the agent uses daily. You also want the option for the agent to pull in third-party MCP servers for edge cases.

Ship both surfaces. The daily 5 go in as tool definitions loaded from your own tool catalog (cheap, stable, debuggable). The long-tail MCP servers are loaded on demand when a user requests them, not baked into every session. This is the pattern most mature MCP deployments converge toward, and it's the one that keeps the token tax bounded.

Reducing The Tax Without Killing The Feature

Assuming you've decided MCP is the right answer, there's still a lot of daylight between pay full tax on every session and pay nothing. Four levers worth pulling:

1. Prompt caching, correctly

Put the stable tool prefix as the first content in the system prompt so it hashes to the same cache key across sessions. Don't interleave tools with user-specific system context — that invalidates the cache on every user. If you're on Anthropic, mark the boundary explicitly so the provider knows where the static prefix ends.

Done right, this alone takes you from "full rate every call" to "10% rate after the first call per 5-minute window." It is the single highest-leverage optimization available.

2. Lazy tool loading

Most agent frameworks support loading tools on-demand rather than bulk-loading everything at session start. The pattern: at session start, expose a meta-tool called list_available_tools. The model calls it, sees the names and one-line descriptions (cheap), decides which ones it actually needs, and only then requests the full schemas for those specific tools.

You trade one extra round trip for 80%+ reduction in prelude tokens on any session where the agent only uses a handful of tools. For most sessions, that's a win.

3. Namespace pruning

Not every MCP server needs to be present in every session. A coding agent doesn't need a calendar. A scheduling agent doesn't need a Postgres client. Route sessions to tool subsets based on the user's stated task (or, better, based on a lightweight classifier that reads the first user turn).

This is less elegant than "expose everything, let the model decide," and it's also cheaper by 3–5× for the sessions where it fires.

4. Code Mode — the speculative bet

Cloudflare recently made the case for letting the agent generate code that calls MCP tools rather than emitting JSON-RPC calls directly. The argument: LLMs are better at writing idiomatic code than they are at filling out JSON schemas, and the code representation of "call tool X with args Y, then use the result as input to tool Z" is more compact than the equivalent multi-turn JSON-RPC dance.

The approach is young, not standardized, and not every MCP ecosystem supports it. But it's a live direction worth tracking, because it reframes the token-tax problem as an I/O-format problem rather than a protocol-design problem.

The Rule

Before you adopt MCP for a given integration, run this check:

Count the tokens. Add up the schema weight of every tool you'd expose. Multiply by expected session volume. Multiply by input token pricing. Is the number uncomfortable? If not, stop worrying and ship MCP.
Ask what you're buying. Runtime discovery? Multi-client portability? Stateful sessions? Typed I/O you weren't going to build yourself? If you can't name the feature, you're not using it.
Check your caching story. Is the tool prefix stable enough to stay in cache for a meaningful fraction of sessions? Or are you iterating fast enough that cache invalidations eat the discount?
Ship the cheap version first if in doubt. REST endpoints documented in a system prompt get you an agent that works, fast, cheap, and debuggable. You can graduate to MCP later when you have evidence you need it.

The mistake isn't adopting MCP. The mistake is adopting it by default, without measuring the tax, because a blog post said it was the future. The future might include MCP. It also includes an invoice.

Closing

Every LLM integration is a tradeoff between what the model should know and what the model costs to inform. MCP is a protocol that picks a specific point on that tradeoff — high upfront context cost, high runtime flexibility. That's a great fit for some agents and a terrible fit for others. The job of the builder is to know which one you're shipping.

If you take one thing from this: treat your tool prelude as a cost center, not a feature list. Count the tokens. Know what the agent actually used. Prune what it didn't. The teams that do this end up with cheaper, faster, more reliable agents — and the ones that don't end up with a $128,000/year line item they can't explain to the CFO.