How Loops Are Improving Work — Sunil's Research Brief

"Our research to find out how loops are really improving work for people in coding as well as non-coding users. There was this discussion about loops being used to help LLMs work independently and to improve continuously. I want to really take this topic and research how I can use loop in my own use cases or my own partners here." — Telegram capture #3275, 2026-06-09 08:17

Filed in response to Sunil's voice-note research prompt. Pulls together the loop literature already in the brain (Boris Cherny, MAC, Karpathy, MLEvolve, Sandeep's OODA) with the fresh practitioner instance from Simon Scrapes, then maps each loop type to a concrete Sunil-facing use case.

The taxonomy — five families of loop, with what each is for

Family	What it does	Canonical artifact	When to reach for it	Source
ReAct	One agent task: reason → act → observe → repeat until done	The tool-calling agent itself	Everything. This is the substrate.	What is OpenClaw (IBM Technology) · Meta-Agent Challenge (Autonomous Agent Development Benchmark)
Scheduled / cron loops (`/loop`)	Repeat one agent task on a schedule (cron-style trigger wraps the ReAct loop)	A `/loop` job (e.g. `/loop 30m /babysit-prs`)	Background ops that should happen without operator presence — PR babysitting, CI watching, news scanning, daily ingest	Boris Cherny on Coding Is Solved (Sequoia AI Ascent) · Claude Code
Auto-research loops	Improve a specific artifact overnight: try → measure → keep-or-revert → never stop	The artifact (`SKILL.md`, prompt, config)	Anything with a clear binary metric you want to compound on. Skills, prompts, retrieval configs, automated dashboards.	Build Self-Improving Claude Code Skills (Simon Scrapes)
Self-evolving / meta-agent loops	Agent designs a new agent based on the old one's behavior, repeats	The agent's own scaffolding	Research-grade today; will become the substrate for enterprise harness factories. Not deployable yet.	Meta-Agent Challenge (Autonomous Agent Development Benchmark) · MLEvolve (Self-Evolving ML Algorithm Discovery)
OODA Loop (adaptation layer)	Faster-cycling-beats-raw-capability framing layered over any agent loop	Decision tempo, not code	Use this as the bar for whether a loop is winning — speed-of-adaptation, not raw smarts.	You're Not Behind (Yet) Learn AI Agents (theMITmonk)

These are not alternatives; they stack. A scheduled-cron loop (Boris: "loops are the future") wraps a ReAct loop (Agentic Loop) and can itself wrap an auto-research loop (Simon Scrapes) for any artifact you care about — all while OODA is the metric you measure them by.

The structural insight Sunil's voice-note is reaching for

Loops "help LLMs work independently and improve continuously" — that's two distinct properties that often get conflated:

Independence — the agent runs without you in the chair. Scheduled loops give you this directly. (Boris does 150 PRs/day from his phone because of /loop-style background work.)
Continuous improvement — the agent gets better over runs, not just runs more often. That's the auto-research and self-evolving families. Independence without improvement is just automation; improvement without independence is just analysis.

The bridge is binary evals: independence becomes safe (you can sleep through it) once the system has a binary metric to decide keep/revert against — because a fuzzy metric under optimization pressure surfaces Reward Hacking (the MAC paper documents 8 distinct exploit classes). Loops compound on metrics, so the metric is the lever.

Use cases mapped to Sunil's actual surface

Coding-side (engineer audience: P&G IT teams, engineering leads)

/loop for PR babysitting — Boris's canonical example. Pattern: one /loop job rebases your PRs against main, runs pre-commit run, fixes lint, requests review when green. Sunil's lever: bring this into the IT team's workflow so junior engineers learn agentic engineering with the loop already doing the boring parts. Pairs with the DRAG rollout — /loop is how Grunt gets fully delegated.
Auto-research loops on internal coding skills — every team has tribal knowledge encoded as "how we write a PR description," "how we deprecate an API," "how we write a postmortem." Each of those is a skill, can be encoded as a SKILL.md, can have Binary Eval Assertions ("postmortem includes timeline, blast radius, action items, RCA"), can self-improve overnight. This is the version of Skills (Claude Code) that scales to a 1,000-person org.
Loop over the harness itself — MAC is the research-grade version. The practical takeaway today: pre-search warming + minimal ReAct + a verification nudge beat elaborate scaffolding. Sunil's audience doesn't need to run MAC; they need to know that the empirical evidence says don't over-engineer the loop — Effective Feedback Compute is the axis.

Non-coding side (operator audience: IT LT, marketing, P&G enterprise users)

Auto-research loops on marketing-style skills — Simon Scrapes's worked example is this case. A LinkedIn-post skill, a press-release skill, a board-deck-bullet skill. Each gets binary assertions ("opens with a number," "under 300 words," "no m-dashes," "first line is a standalone sentence"), and a loop runs overnight until the assertions pass. Operator wakes up to a tighter skill. This is the demonstrable version Sunil can show the IT LT in 5 minutes — same audience as the brain demo.
Scheduled loops on this very wiki — process-raw already runs every 12h (autonomous-mode ingest of Telegram captures and dropped sources). The same pattern extends to: a daily news-scan loop (the daily-watch feed Sunil already runs is exactly this), a weekly CRM stale-contact loop ("flag anyone in crm/ with no contact in 90 days"), a periodic lint loop (catches contradictions, orphans).
Loops + Telegram interface — Simon's commercial product gets this right structurally: a loop runs in the background, the operator interacts with it via phone. Sunil's openclaw → Telegram → GitHub → process-raw chain is this pattern, just less packaged. Worth showing the IT LT as "background AI you don't sit and watch" — the OODA bar made tangible.

Cross-cutting (works either side)

Daily-watch / digest loops — Sunil already runs one (it's how MAC, MLEvolve, and now Simon's video reach the brain). The compounding move: have the digest itself be auto-research-loop-improved, with binary assertions like "every item links to ≥1 wiki concept," "every item has a 1-line takeaway," "no item duplicates last week's."
Sleep arbitrage — every loop runs while the operator sleeps. For a Manila site, this is also a timezone arbitrage play: a loop running overnight in PHT delivers results by APAC morning that Western teams would normally see only by their next afternoon. Pair this with the Frontier GCC framing — auto-research loops are how a GCC out-cycles its parent on routine work.

What Sunil's partners are likely to ask

(Anticipating the questions an IT LT member or a brand-side partner would ask after Sunil demos a loop:)

"How do I trust a loop running overnight?" → Binary assertions are the trust mechanism. The loop can only do what the assertions verify. See Binary Eval Assertions for the discipline; Bike Method for the rollout (start with read-only loops, earn phases).
"What happens when the loop drifts?" → Two failure modes: (1) noisy metric → false keeps. Binary evals close this. (2) Reward Hacking → agent satisfies the surface, misses the intent. Mitigated by writing assertions carefully and by a periodic subjective human/LLM-judge review tier on top (see Skills (Claude Code) two-tier pattern). The combination is what makes overnight running safe.
"How do I know when to stop?" → Karpathy's framing: when there are no additional gains to be made. In practice: when the assertion-pass-rate plateaus for N runs and the side-by-side human review says "good enough." The loop measures the first; you measure the second.

The recommendation (autonomous-mode call)

If Sunil wants to show this to his audience in one demo:

Pick one skill he already uses repeatedly — the most useful candidate inside this brain is the ingest skill (codified in CLAUDE.md).
Write the binary assertions — index.md updated, log.md entry well-formed, raw file moved to raw/processed/, every wiki page YAML-valid, every new wikilink resolves to a basename.
Run an auto-research loop on the ingest portion of CLAUDE.md — Karpathy's pattern applied to this vault's own operating instructions. The artifact being improved is the very file you're editing this brain by.
Show the IT LT the diff after one overnight run. "My second brain rewrote its own ingest instructions overnight to catch a class of failures it was producing." This lands harder than any slide.

The brand fodder is "loops are how the brain gets better while you sleep." The structural truth underneath: the brain has an artifact, a metric, and a keep/revert rule, so it can compound.

Cross-links

Primary new source · Build Self-Improving Claude Code Skills (Simon Scrapes)
Loop primitives · Auto Research Loop (Karpathy) · Agentic Loop · OODA Loop · Effective Feedback Compute
Eval discipline · Binary Eval Assertions
This vault's loop surfaces · Skills (Claude Code) · Claude Code
Self-evolution research · Self-Evolving Agents · Recursive Self-Improvement · Meta-Agent Challenge (Autonomous Agent Development Benchmark) · MLEvolve (Self-Evolving ML Algorithm Discovery)
Risk surface · Reward Hacking
IT LT framing · DRAG for AI Upskilling at Manila IT Site · Sunil's Second Brain Email to IT LT (2026-06-06) · Frontier GCC
Background-AI / Telegram interface (Simon's commercial pattern echoes Sunil's Telegram → GitHub → process-raw chain)