SecondBrain
Ask the Brain
Index/Conceptupdated Fri Jun 12 2026 08:00:00 GMT+0800 (Philippine Standard Time)

Capabilities vs Instructions (Agent Keys)

agentsagent-risksecuritygovernancepermissionsharness

Capabilities vs Instructions (Agent Keys)

Nate Herk (AI Automation)'s sharpest safety principle: instructions are not the same as capabilities.

Picture every tool the agent has as a key on a key ring. There's a world of difference between:

  • "Hey, don't ever use that key" (an instruction — soft, ignorable, overridable by a confused plan), and
  • "You don't get to put this key on your key ring at all" (a removed capability — hard).

"As much as you could say never send emails, if there's a send-email key in that agent harness, then it physically could do it."

So you must assume that if an agent can read or do something, it eventually will — not because it usually does (most of the time it won't), but because designing for the worst case changes how you hand out endpoints, MCP servers, and scopes. The mitigation is to scope what the agent can touch, not to pile up prohibitions.

Why it matters

This is the structural reason behind the Bike Method cautionary tale: an agent sent 3 promo emails to 150,000+ inboxes because the send capability existed and a to-do item looked like a task. No instruction would have reliably stopped it; removing/scoping the key would have.

It connects to the harness framing ("the agent IS the harness" — capabilities are part of identity, not a footnote) and to the enterprise risk catalog: a Zombie AI Agent is dangerous precisely because its keys outlive its mandate; Prompt Injection is dangerous because it turns a benign instruction stream into a capability trigger.

Practical rule

When wiring a personal AI Operating System (AIOS): grant read before write; grant scoped tokens before broad ones; keep destructive/outbound keys (send email, post publicly, move money, delete) off the ring until a skill has earned them via the Bike Method.

The Fable-era restatement is the crispest slogan yet: "keys, not prompts — a prompt is never a permission layer." Concrete mechanism: scoped API keys (e.g. a Fireflies key that can only read meeting transcripts — no edit, no delete, no team access). Assume "if it can, it will": if it could send an email, it might; if it could read that database, it will.

Cross-links