SecondBrain
Ask the Brain
Index/Conceptupdated Sat May 09 2026 08:00:00 GMT+0800 (Philippine Standard Time)

Jagged Intelligence

llmcapabilitiesjagged-intelligenceverifiability

Jagged Intelligence

The observation that frontier LLMs simultaneously demonstrate superhuman capability in some domains and trivially fail in others — capability profile is spiky, not smooth.

Two independent sources in this wiki use the term, suggesting it's becoming standard vocabulary for this phenomenon.

Two canonical examples

  • Karpathy's car wash (Andrej Karpathy on Agentic Engineering (Sequoia AI Ascent)):

    "How is it possible that state-of-the-art Opus 4.7 will simultaneously refactor a 100,000-line codebase or find zero-day vulnerabilities and yet tells me to walk to a 50-meter car wash? This is insane."

  • Strawberry letter-counting — older but same shape; mostly patched now.

Why it happens (Karpathy's hypothesis)

  • Verifiable domains scale via RL. Frontier models are trained in giant RL environments with verification rewards. Math, code, and adjacent verifiable tasks get RL'd hard → capability peaks there.
  • Lab focus matters too. Capabilities often follow what labs decided to put in the data distribution, not just what's verifiable. Chess GPT-3.5 → GPT-4 is the example: someone at OpenAI added a lot of chess data, and capability peaked.
  • Combined: verifiable + lab cares. If you're in the circuits the labs RL'd, you fly. If you're outside, you struggle.

Implications

Karpathy's broader framing

The "ghosts not animals" piece — these aren't intelligences with intrinsic motivation, they're statistical simulation circuits. Yelling at them doesn't help; understanding what's in their training distribution does.

Sources