SecondBrain
Ask the Brain
Index/Conceptupdated Sat Jun 20 2026 08:00:00 GMT+0800 (Philippine Standard Time)

Token Scarcity

token-scarcitytoken-efficiencycostenterprise-aimodel-routingagentic

Token Scarcity

The rationing regime that follows the token-subsidy era. As usage shifts from assisted (a human in the loop, bounded turns) to agentic (continuous loops, fan-out sub-agents), AI demand outpaces a physically constrained token supply — GPUs, power, and data-center capacity can't scale as fast as agent-driven consumption. The subsidy era ends, prices rise, and every AI company is "now in the token-efficiency business."

The thesis

  • Demand outruns supply. Agentic use multiplies token consumption per task; supply is gated by physical infrastructure. The gap closes by rationing, not by more subsidy.
  • The subsidy era ends. Labs have been selling tokens below cost to win adoption. As demand hardens, that stops — prices rise toward true cost.
  • Firms ration. The enterprise response is a managed-input posture: mixed-basket model routing (cheapest sufficient model per task), token budgets, and per-seat caps.

Distinct from Token Maxing

Token Maxing is the prior behavior — the abundance/subsidy-era pathology of burning tokens because they were cheap and consumption was the scoreboard. Token scarcity is the regime that follows: once tokens are priced at cost and supply is constrained, the maximalist behavior is no longer affordable and rationing discipline becomes mandatory. Maxing is the disease of cheap tokens; scarcity is the economics that forces the cure.

2026-06-20 — "Token Reckoning" Economist coverage

Companies Are Scrambling to Curtail Soaring AI Costs (Economist) is the Business-section confirmation that the regime is now operating in the wild:

  • Ramp spend data: AI spending up 13× year-on-year. Token-heavy applications (reasoning models, agents that build agents) are the growth driver.
  • Uber spent its annual AI budget in four months. One unnamed firm spent $500m on tokens in a single month.
  • The distribution is bimodal — top-1% spenders at ~$7,450/month per employee, median client at $11. The Advantage Gap expressed financially.
  • Three corporate response patterns now visible: (1) Meta + Amazon killed their token-usage leaderboards; (2) Routing down-tier — Sonnet ~1/20 of Opus, Kimi ~1/20 of Sonnet (three orders of magnitude across the routing decision); (3) Per-seat / per-task caps (Uber's $1,500/month per coding tool is the canonical case).
  • Outcome-based pricing emerging: Intercom charges customers only for queries actually resolved by its IT-support agent — the SaaS pricing model that survives the agentic era.
  • The lab-subsidy era ends with the IPOs. Sam Altman called mounting customer costs "a huge issue." OpenAI's strategy for winning customers from Anthropic reportedly involves drastic price cuts — but once both labs IPO later in 2026, prices have to rise toward true cost.
  • The geography axis: AI bills are "low compared with hiring a developer in San Francisco, but high compared with employing one in Delhi" — token-cost is now a variable in the on/offshore equation. See Indian IT and AI.

2026-06-27 — Rationing is now operational at the labs (Economist)

Americas Data-Centre Backlash Puts the AI Boom at Risk (Economist) gives the top-of-funnel evidence that inference capacity is genuinely constrained:

  • "Anthropic has throttled model usage."
  • "OpenAI has scrapped its compute-intensive video tool."
  • "Microsoft has repriced its coding assistant so steeply that some programmers are returning to the lost art of writing software themselves."

The physical numbers behind it: ~12 GW of US AI compute today → ~10 GW dedicated to inference across the majors → demand grows faster than the ~30 GW new-build queue through 2028, with training alone potentially absorbing 5–16 GW per frontier model.

2026-06-27 — Total-cost-not-per-token is the buried lede on Chinese "cheap AI"

China Is Having Another AI Moment (Economist) adds the second-order refinement that stops the "we'll just switch to DeepSeek/Zhipu" reflex:

  • DeepSeek v4: $0.87 per 1M output tokens; Anthropic Fable 5: $50 per 1M output tokens. ~57× per-token gap.
  • Du Zheng (Georgia Tech) et al., June 2026: DeepSeek used 23× more tokens than an OpenAI rival to achieve basically the same result on the same tasks.
  • Correct comparison metric = total cost of tokens used, not price per token.
  • On a software-engineering benchmark, GLM 5.2 ended up costing more than systems from Anthropic and OpenAI.

The Token Scarcity regime therefore extends: even the "escape hatch to open-source Chinese models" has token-efficiency limits. Whichever model you route to, tokens are the variable cost that matters, not seat licenses or per-token headline prices.

Cross-references

Sources

  • The New Dumbest Chart in AI (AI Daily Brief) — "every AI company is now in the token-efficiency business"; the assisted → agentic demand-vs-supply framing
  • Companies Are Scrambling to Curtail Soaring AI Costs (Economist) — 2026-06-20; the operating data (Ramp 13×, Uber $1,500/seat cap, $7,450 top-1%, Sonnet 1/20 routing factor, Intercom outcome-pricing)
  • Americas Data-Centre Backlash Puts the AI Boom at Risk (Economist) — 2026-06-27; the top-of-funnel evidence (Anthropic throttling, OpenAI killing video tool, Microsoft repricing Copilot)
  • China Is Having Another AI Moment (Economist) — 2026-06-27; the 23× token overuse rebuttal to "we'll just switch to Chinese models"