Jagged Intelligence

The observation that frontier LLMs simultaneously demonstrate superhuman capability in some domains and trivially fail in others — capability profile is spiky, not smooth.

Two independent sources in this wiki use the term, suggesting it's becoming standard vocabulary for this phenomenon.

Two canonical examples

Karpathy's car wash (Andrej Karpathy on Agentic Engineering (Sequoia AI Ascent)):

"How is it possible that state-of-the-art Opus 4.7 will simultaneously refactor a 100,000-line codebase or find zero-day vulnerabilities and yet tells me to walk to a 50-meter car wash? This is insane."
Strawberry letter-counting — older but same shape; mostly patched now.

Why it happens (Karpathy's hypothesis)

Verifiable domains scale via RL. Frontier models are trained in giant RL environments with verification rewards. Math, code, and adjacent verifiable tasks get RL'd hard → capability peaks there.
Lab focus matters too. Capabilities often follow what labs decided to put in the data distribution, not just what's verifiable. Chess GPT-3.5 → GPT-4 is the example: someone at OpenAI added a lot of chess data, and capability peaked.
Combined: verifiable + lab cares. If you're in the circuits the labs RL'd, you fly. If you're outside, you struggle.

Implications

For founders (Andrej Karpathy on Agentic Engineering (Sequoia AI Ascent)): identify verifiable domains the labs aren't focused on; build RL environments; fine-tune for spiky capability.
For enterprise (Agentic AI in the Enterprise (Praveen Akkiraju, CXOTalk)): jaggedness is why the Harness (LLM Agents) matters so much. Harness encodes the missing context, guardrails, and observability to keep the agent in the circuits where it works.
For users: stay in the loop. Treat the model as a tool, verify outputs, especially in unfamiliar territory.

Karpathy's broader framing

The "ghosts not animals" piece — these aren't intelligences with intrinsic motivation, they're statistical simulation circuits. Yelling at them doesn't help; understanding what's in their training distribution does.

Jagged Intelligence

Two canonical examples

Why it happens (Karpathy's hypothesis)

Implications

Karpathy's broader framing

Sources