Token Scarcity

The rationing regime that follows the token-subsidy era. As usage shifts from assisted (a human in the loop, bounded turns) to agentic (continuous loops, fan-out sub-agents), AI demand outpaces a physically constrained token supply — GPUs, power, and data-center capacity can't scale as fast as agent-driven consumption. The subsidy era ends, prices rise, and every AI company is "now in the token-efficiency business."

The thesis

Demand outruns supply. Agentic use multiplies token consumption per task; supply is gated by physical infrastructure. The gap closes by rationing, not by more subsidy.
The subsidy era ends. Labs have been selling tokens below cost to win adoption. As demand hardens, that stops — prices rise toward true cost.
Firms ration. The enterprise response is a managed-input posture: mixed-basket model routing (cheapest sufficient model per task), token budgets, and per-seat caps.

Distinct from Token Maxing

Token Maxing is the prior behavior — the abundance/subsidy-era pathology of burning tokens because they were cheap and consumption was the scoreboard. Token scarcity is the regime that follows: once tokens are priced at cost and supply is constrained, the maximalist behavior is no longer affordable and rationing discipline becomes mandatory. Maxing is the disease of cheap tokens; scarcity is the economics that forces the cure.

2026-06-20 — "Token Reckoning" Economist coverage

Companies Are Scrambling to Curtail Soaring AI Costs (Economist) is the Business-section confirmation that the regime is now operating in the wild:

Ramp spend data: AI spending up 13× year-on-year. Token-heavy applications (reasoning models, agents that build agents) are the growth driver.
Uber spent its annual AI budget in four months. One unnamed firm spent $500m on tokens in a single month.
The distribution is bimodal — top-1% spenders at ~$7,450/month per employee, median client at $11. The Advantage Gap expressed financially.
Three corporate response patterns now visible: (1) Meta + Amazon killed their token-usage leaderboards; (2) Routing down-tier — Sonnet ~1/20 of Opus, Kimi ~1/20 of Sonnet (three orders of magnitude across the routing decision); (3) Per-seat / per-task caps (Uber's $1,500/month per coding tool is the canonical case).
Outcome-based pricing emerging: Intercom charges customers only for queries actually resolved by its IT-support agent — the SaaS pricing model that survives the agentic era.
The lab-subsidy era ends with the IPOs. Sam Altman called mounting customer costs "a huge issue." OpenAI's strategy for winning customers from Anthropic reportedly involves drastic price cuts — but once both labs IPO later in 2026, prices have to rise toward true cost.
The geography axis: AI bills are "low compared with hiring a developer in San Francisco, but high compared with employing one in Delhi" — token-cost is now a variable in the on/offshore equation. See Indian IT and AI.

2026-06-27 — Rationing is now operational at the labs (Economist)

Americas Data-Centre Backlash Puts the AI Boom at Risk (Economist) gives the top-of-funnel evidence that inference capacity is genuinely constrained:

"Anthropic has throttled model usage."
"OpenAI has scrapped its compute-intensive video tool."
"Microsoft has repriced its coding assistant so steeply that some programmers are returning to the lost art of writing software themselves."

The physical numbers behind it: ~12 GW of US AI compute today → ~10 GW dedicated to inference across the majors → demand grows faster than the ~30 GW new-build queue through 2028, with training alone potentially absorbing 5–16 GW per frontier model.

2026-06-27 — Total-cost-not-per-token is the buried lede on Chinese "cheap AI"

China Is Having Another AI Moment (Economist) adds the second-order refinement that stops the "we'll just switch to DeepSeek/Zhipu" reflex:

DeepSeek v4: $0.87 per 1M output tokens; Anthropic Fable 5: $50 per 1M output tokens. ~57× per-token gap.
Du Zheng (Georgia Tech) et al., June 2026: DeepSeek used 23× more tokens than an OpenAI rival to achieve basically the same result on the same tasks.
Correct comparison metric = total cost of tokens used, not price per token.
On a software-engineering benchmark, GLM 5.2 ended up costing more than systems from Anthropic and OpenAI.

The Token Scarcity regime therefore extends: even the "escape hatch to open-source Chinese models" has token-efficiency limits. Whichever model you route to, tokens are the variable cost that matters, not seat licenses or per-token headline prices.

Cross-references

Token Maxing — the maximalist behavior this regime supersedes
Managing Enterprise IT Development in the Era of Token Scarcity — the enterprise-IT operating playbook for this regime
Agentic Loop — the usage shift (assisted → agentic) that drives demand past supply
Indian IT and AI — the on/offshore axis Token Scarcity compresses
SaaSpocalypse — outcome-based pricing as the survivable SaaS landing spot

Sources

The New Dumbest Chart in AI (AI Daily Brief) — "every AI company is now in the token-efficiency business"; the assisted → agentic demand-vs-supply framing
Companies Are Scrambling to Curtail Soaring AI Costs (Economist) — 2026-06-20; the operating data (Ramp 13×, Uber $1,500/seat cap, $7,450 top-1%, Sonnet 1/20 routing factor, Intercom outcome-pricing)
Americas Data-Centre Backlash Puts the AI Boom at Risk (Economist) — 2026-06-27; the top-of-funnel evidence (Anthropic throttling, OpenAI killing video tool, Microsoft repricing Copilot)
China Is Having Another AI Moment (Economist) — 2026-06-27; the 23× token overuse rebuttal to "we'll just switch to Chinese models"