The Token Bill Caught Up With Us

In partnership with

Uber's CTO, Praveen Neppalli Naga, said the quiet line out loud last week. "I'm back to the drawing board because the budget I thought I would need is blown away already." The number he was talking about is the 2026 AI tooling line. The reason it is blown away is that Claude Code adoption inside Uber went from 32% of his 5,000 engineers in February to 84% in March. Token spend runs $500 to $2,000 per engineer per month, billed by consumption, not by seat. Four months in, the full annual envelope was gone.

I have been reading that quote on a loop for three days.

The reason is not the headline number. It is the shape of the problem. The cost curve of the tool they installed does not match the curve they planned around, and most finance teams do not have the instruments to even read the gauge.

The token bill is the new headcount line.

Here's the deal. Per-seat SaaS pricing rewarded you for not using the tool. A Slack license, a Notion seat, a Linear seat, the marginal cost did not move when the user opened the app twice as much as last month. The CFO understood that contract. So did the CTO. The line item was flat, predictable, renegotiated once a year.

Token-priced agents flip the curve. Marginal cost is positive, sometimes steeply so, and the variable is not headcount. The variable is how hard each engineer leans on the agent that week, which is exactly the metric the company is now rewarding. Internal leaderboards, perf rubrics that score AI-assisted output, "you should be using Claude Code on every PR" guidance from staff+. The same pressure that drives the productivity gain drives the bill. Same dial.

Frankly, this is not a bug. The pricing model is working as designed. Ramp's May 2026 AI Index, published Wednesday, shows 34.4% of US businesses paying for Anthropic against 32.3% for OpenAI, the first time the order has flipped. A year ago the Anthropic number was 9%. Demand is real. The vendor is pricing scarce inference about as well as a young market can.

Finance teams are looking at usage-based AI as a percentage of total opex for the first time. That is the new part. At Uber's reported level, the coding-agent bill is roughly one to two senior engineers worth of cost per engineer per year. A coworker shaped as a line item, with a usage knob the CFO does not own.

Reply to everything. Edit nothing.

Your inbox is full. Slack is piling up. Client messages need a response yesterday. Typing thoughtful replies to all of it takes hours you don't have.

Wispr Flow turns your voice into clean, professional text you can send the moment you stop talking. Speak like you would to a colleague — tangents and all — and get polished output. Emails, Slack, LinkedIn, WhatsApp, whatever's open.

89% of messages sent with zero edits. Used by teams at OpenAI, Vercel, and Clay. Works on Mac, Windows, and iPhone.

Try Wispr Flow free

FinOps has a playbook. This is not in it.

Cloud FinOps is a fifteen-year-old discipline. It has units, dashboards, tagging conventions, rightsizing playbooks, reserved-instance math, vendor scorecards. It assumed the unit of work was a VM-hour or a request-second. Token consumption breaks the abstraction, because the unit is downstream of a human decision (which agent to invoke, how long to let it run, how aggressive to set the context window), and the human is being told to lean harder.

I was talking to a head of platform at a Series C this week. Real numbers, no names. Their monthly Claude Code spend is 3.2x what they spent on AWS compute on the dev-tools cluster eighteen months ago. They have no per-developer cap. They are reluctant to install one, because they are watching internal velocity metrics climb in exactly the way they were promised. They asked me, point-blank, what their peer set was doing. I had no clean answer. I have heard six different versions in the last month, none stable.

Here's the thing. The playbook arrives in roughly four quarters. Tooling vendors will ship per-developer caps, per-org pools, per-PR budgets, prompt-cache analytics, model-routing policies that downshift to cheaper models for cheap tasks. It will look obvious in retrospect, the same way Datadog looked obvious in 2014. Between now and then, every CTO is improvising, and the improvisation is the diligence opportunity.

The new question I am asking founders.

If you are pitching an AI-native company in 2026, I no longer ask only about your gross margin on the agent unit. I ask about your own engineering org's token bill.

I want four numbers, ranked by how much they move my underwriting. Token spend per engineer last month, as a real figure, not a band. Then the ratio of that spend to fully loaded engineer cost: anything north of fifteen percent says the cost curve is the strategy now. Then whether the line is forecasted inside the FP&A model, or whether it is sitting in a quiet R&D opex bucket the CFO has not separated out yet. Then the throttling tools the team is willing to deploy, meaning per-user caps, model routing, cache discipline. Teams that have those ship them with the same care they used to give CI/CD pipelines. Teams that do not are still in honeymoon.

A founder who can answer those four cleanly is running a different company than one who cannot, and the difference does not show up on the deck. It shows up in the data room six weeks into diligence, in the part of the cloud-cost waterfall where the new line item is hiding.

To be clear, none of this means Claude Code is overpriced. The productivity story is real, and I am living in it. What is mismatched is a vendor pricing model that scales with use, an internal incentive system that scales with use, and a budgeting process that does not scale with anything yet.

As for me?

I went into my own usage tab on Friday. I had not looked at it in three weeks. The number was higher than my rent, and the rent in this city is also higher than it should be. I closed the tab. I opened a new task. I let the agent run.

I do not know yet whether the productivity gain compounds faster than the bill does, or whether the answer is the same in twelve months as it is today. Uber's CTO does not know yet. The platform lead I talked to this week does not know yet. The first cohort of companies to figure this out is going to look very different from the cohort that figures it out second, and on a five-year fund clock, that gap is the whole game.

I will look at the tab again on Friday. I will probably let the agent run anyway.

— SWEdonym

The Token Bill Caught Up With Us

Reply to everything. Edit nothing.

Reply

Keep Reading

swe2vc