Skills (Claude Code)
Skills (Claude Code)
Reusable instruction files that encode how you specifically do a task, callable by natural language (or as a /slash-command). In the Four C's Framework they are the Capabilities layer — the thing that turns a generic model into one that writes a LinkedIn post in your voice, runs your monthly report, or hands off a session.
Skills can live locally (scoped to one project) or globally (available in any directory). Nate Herk (AI Automation) moves frequently-used skills to global; project-specific ones stay local.
Two ways to build a skill
- Forward — name a recurring task, invoke a skill-creator, give the end goal + the tools/considerations involved, then iterate. "Sometimes 50 tries until you like it." Then keep evolving it: every use, give feedback ("this was good, this wasn't — change it so next time it doesn't happen").
- Reverse-engineer (Nate's more common method) — do the task end-to-end first, get a good output, then ask Claude: "Look back at our conversation. What did we do to get there? What tools did you need? What did you ask me?" and build a skill that reproduces that output.
Skills aren't only big SOPs
A skill can be as small as a prompt you keep retyping. Nate's example: a global /session-handoff skill that outputs a full breakdown — what was done, files created, open decisions, what's next — so he can clear context or move from Claude Code to Codex and pick up cleanly. Just a prompt, but worth a slash command because he ran it many times a day. (This vault's own CLAUDE.md operations — ingest/query/lint/journal/crm — are the same idea: codified repeatable workflows.)
Another small-skill archetype from the Fable walkthrough: "grill me" (adapted from Matt PCO) — the skill interviews you, 15–30 questions deep, writing answers to brainstorm docs. Elicitation as a skill: it pulls knowledge out of the owner's head into the AIOS instead of waiting for them to write context docs. Nate used it to plan the very video it appears in.
The feedback ritual, made explicit in the same source: "every single time I use a skill, I give it feedback and say update the skill" — preferences, models, and endpoints drift, so a four-month-old skill still gets iterated on every use. "There's no such thing as a finished product." (The manual counterpart of the overnight self-improvement loop below.)
Governance
A skill's reach is governed by Bike Method (earn autonomy in phases) and Capabilities vs Instructions (Agent Keys) (a skill can only do what keys are on the ring). Ryan Lopopolo's harness engineering is the industrial-scale version of the same skill-as-capability discipline.
Self-improvement (third construction path)
Beyond forward (build from a spec) and reverse (build from a good output), Simon Scrapes's video adds a third: let the skill iterate on itself overnight against a fixed eval set. Two distinct layers, two distinct loops:
| Layer | Optimizes | Loop | Status |
|---|---|---|---|
| 1 — Trigger | YAML description → activation rate | Skill-creator's built-in description-improvement loop (improve_description.py + run_loop.py) |
Already ships in Anthropic's skill-creator (Skills 2.0) |
| 2 — Behavior | SKILL.md body → output passes structural assertions |
Custom auto-research loop over Binary Eval Assertions | DIY — Simon's worked example: 25 binary assertions, 5 tests, overnight loop |
The binary-eval discipline is the make-or-break: subjective evals under optimization pressure surface Reward Hacking (see MAC paper); binary evals (regex / parser / boolean check) give a clean keep/revert signal. The video's headline win: a 5th-version marketing-copy skill hit 23/24 first run, found a tone-of-voice rule missing from SKILL.md, added it, hit perfect score on rerun.
Limitations Simon flags: the binary loop does not handle tone, creativity, or whether the skill is actually using its reference files properly. Those need a complementary subjective tier (human or LLM as Judge).
This applies to this vault's own skills too: ingest/query/lint/journal/crm are codified workflows with implicit binary invariants (index.md updated, log.md entry well-formed, raw file moved to processed). See Auto Research Loop (Karpathy) for the concrete vault application.
Cross-links
- Four C's Framework — Capabilities = skills
- AI Operating System (AIOS) · Claude Code · Harness (LLM Agents)
- Printing Press — wraps each CLI in a skill so it's callable in natural language
- Self-improvement · Auto Research Loop (Karpathy) · Binary Eval Assertions · Recursive Self-Improvement · Self-Evolving Agents
- Org-scale twin: What an Enterprise Context Layer Is (Prukalpa) makes skills substrate of the enterprise context layer — the same versioned-procedure primitive, but the consumer is the org's reps/managers/agents rather than a single coding agent. The user's Data Democratisation in Sales — Governed Context Layer, Not Dashboard Access is the worked sales-domain instance.