SecondBrain
Ask the Brain
Index/Conceptupdated Sat Jun 13 2026 08:00:00 GMT+0800 (Philippine Standard Time)

Self-Evolving Agents

self-evolving-agentsmeta-agentrecursive-self-improvementlong-horizonagentic-engineeringauto-research

Self-Evolving Agents

A research thread Sunil tracks via a daily-watch feed (see Telegram triggers #3236, #3239). The umbrella concept: agents that improve themselves or their offspring across runs, rather than starting cold every iteration. Adjacent to but narrower than Recursive Self-Improvement — RSI is the limit of this trajectory; self-evolving agents are the practical, near-term work.

Three layers

  1. Method papers — propose mechanisms by which an agent can compound across runs:

    • MLEvolve (Self-Evolving ML Algorithm Discovery) (arxiv 2606.06473): Progressive Monte Carlo Graph Search + Retrospective Memory for ML algorithm discovery. Branches share information via graph edges; a dynamic global memory accumulates lessons.
    • Earlier references in the lineage: AlphaEvolve, Gödel Agent, Darwin-Gödel Machine, Alita-G, Live-SWE-Agent, Memento-Skills, MemEvolve.
  2. Benchmark papers — measure whether current systems are capable:

  3. Practitioner implementations — what an end user can run today, no paper attached:

The three layers map roughly to: how to make it work (method papers) · whether it works (benchmarks) · how to deploy it today (practitioners).

Throughline

Across both layers, the same recurring failure mode keeps surfacing: inability to compound across iterations. MLEvolve names it ("inter-branch information isolation" + "memoryless search" + "lack of hierarchical control"). MAC sees its behavioral analog in underperforming meta-agents that "get trapped in design local optima" and "rarely monitor remaining time budget."

The recurring success pattern, conversely: think longer between decisions, persist what you learned, separate planning from execution. This is Effective Feedback Compute's informative-valid-non-redundant-retained criteria restated as a research program.

Why Sunil tracks this

The daily-watch feed rates papers in this cluster consistently 8.5+/10. The throughline — agents that get better the more you use them — is the missing piece for enterprise Agentic Engineering at scale. A harness that doesn't compound is just a fancy CLI.

2026-06-13 — live examples (Economist)

How AI Got Better at Building Itself (Economist) supplies two deployed datapoints for the lineage named above (which already lists AlphaEvolve as an earlier reference): AlphaEvolve (Google DeepMind) optimizing data-centre compute and matrix multiplication, and Andrej Karpathy's agent autonomously cutting Nanochat training ~18% (3h → 1h39m). Both are the self-improvement loop running in the wild rather than on a benchmark — see Recursive Self-Improvement for the full case-study set and the CSET forecast.

Cross-links