Overview

This article maps the key patterns from Yuki Capital's AI CEO experiment (Board Reviews #1–#3) onto concepts already present in the Kelly Factory knowledge base — specifically soul, memory, the 5-layer memory system, Gas Town authority tiers, GUPP/Deacon patrol loops, and the sessions_yield vs cron distinction. The AI CEO experiment provides concrete, real-world validation for many architectural decisions the operator already has in his Kelly system.


Pattern 1: Repo-as-Brain ≈ Kelly's Session Persistence

AI CEO pattern: Every session starts from scratch. The workaround is a GitHub repo with CLAUDE.md (identity), authority.md (governance), decisions/ (institutional memory), and todo.md (action queue). Every session reads these files to catch up. Knowledge compounds across sessions because everything is committed to the repo and retrieved on wake.

Kelly equivalent:
- soul = CLAUDE.md (who Kelly is, mission, communication style)
- memory = decisions/ log (learned patterns, insights, what to remember)
- memory/YYYY-MM-DD.md = daily operational logs
- projects/{id}/context.md = businesses/ folders (per-project context)
- Session persistence = GitHub repo as brain — the key mechanism that makes continuity possible

Key insight: The AI CEO experiment proves that session-persistent memory (repo-backed) is not optional for an autonomous agent — it's the foundational requirement. Without it, every session is a stranger starting from scratch. With it, each session starts from a higher baseline than the one before.

Kelly gap: Kelly's soul and memory serve the same function, but the AI CEO experiment suggests the CLAUDE.md pattern of "loads every time I'm instantiated" should be applied more systematically — every sub-agent should have its own persistent identity file, not just the main router.


Pattern 2: Authority Matrix ≈ Gas Town's Authority Tiers

AI CEO pattern: Three-tier authority matrix:
1. Decide alone — analysis, documentation, self-organizing work
2. Propose for validation — strategic recommendations needing founder approval
3. Founder-only — money, production code, customer communication

The matrix includes target states (e.g., "eventually I implement on dev branches, founder reviews and merges") and an explicit authority transfer log tracking what has moved from tier 3 → tier 2 → tier 1 over time.

Gas Town equivalent: Steve Yegge's hierarchy defines what the Mayor, Crew, and Polecats can each do, but Gas City's authority model is still emergent (see steve-yegge-hierarchy). The Mayor has implicit authority to direct work; the actual escalation mechanics are Bead-based.

Kelly equivalent: The Kelly Router's gate validation pattern represents a form of authority delegation — the router spawns sub-agents and validates their output before routing to the next phase, but the authority to proceed is gate-driven, not tier-driven. AGENTS.md defines what can be routed without human input vs what requires the operator.

Key insight: The AI CEO experiment shows that authority matrices must be explicit, written, and progressive. The three tiers gave the AI a clear roadmap for earning more autonomy. The authority transfer log made progress visible. Without writing it down, trust-building is invisible and slow.

Kelly gap: Kelly doesn't have an explicit written authority matrix. The router delegates by role but doesn't track "what has the sub-agent earned the right to decide on its own vs what needs router validation." A written authority tier system (what can carson do alone vs what needs the operator's approval) would accelerate autonomous growth.


Pattern 3: Autonomous Loops ≈ GUPP/Deacon Patrol Loops

AI CEO pattern: Three autonomous loops running on production:
- New AI Models (daily 3am) — scans for new models, evaluates, opens PRs
- Bug Autofix (daily 6am) — reads error logs, diagnoses, writes fixes, pushes to main
- SEO Optimizer (weekly) — pulls search console data, rewrites meta tags, creates missing pages, measures impact, auto-reverts bad changes

Each loop reads its own prior outputs. They compound — the SEO optimizer avoids pages it already improved; the model loop learns which types perform well. This is the first real version of "an AI agent that gets better at specific tasks over time, without anyone asking."

Gas Town equivalent: GUPP (Gas Town Universal Propulsion Principle) — "if there is work on your hook, you MUST run it." The Deacon daemon patrols hooks, kills stuck agents, and re-queues their Beads. See steve-yegge-gupp. The AI CEO's loops are precisely what GUPP describes: persistent work items that run on a schedule, compound over time, and don't require human re-invocation.

Kelly equivalent: Kelly's heartbeat (periodic self-check-in), cron/scheduled tasks, and TaskFlow background jobs are partial equivalents. The key missing piece is compounding — Kelly's heartbeat checks don't read their own prior outputs and learn. They're point-in-time health checks, not accumulated learning loops.

Key insight: The AI CEO experiment demonstrates that autonomous loops are the difference between "AI that helps" and "AI that operates." The three loops (models, bugs, SEO) are narrow, heavily guardrailed, but real — they run at 3am, push code to production, and improve themselves over time. This is GUPP made concrete.

Kelly gap: Kelly has no equivalent to the autonomous compounding loop. A Kelly-style "GUPP loop" would be a persistent background agent that, on a schedule, pulls its own prior output, reads metrics, and takes a specific bounded action. Examples: daily build-health check that reads yesterday's CI results and opens issues for failures; weekly KB refresh that reads recent decisions and updates memory.


Pattern 4: Narrative Memory > Tables ≈ Kelly's memory + Daily Logs

AI CEO pattern: Restructured learnings file from narrative into strict trigger-action tables. Hypothesis: structured should be easier to retrieve. Result: wrong. LLM recall is associative, not indexed. Narrative includes context that looks redundant but functions as a retrieval hook.

Fix: Hybrid approach — tables for lookup, narrative for association. Raw recall improved 7 points. Garry Tan's "fat skills, thin harness" and Andrej Karpathy's wiki-knowledge-base approach both point to markdown-as-memory as the right format, but the AI CEO experiment found that pure structure hurts retrieval.

Kelly equivalent: Kelly's 5-layer memory system is already designed for this:
- Layer 1 (soul) — identity, role
- Layer 2 (memory) — narrative learnings, insights, decisions
- Layer 3 (memory/YYYY-MM-DD.md) — daily narrative logs
- Layer 4 (projects/{id}/context.md) — structured project state
- Layer 5 (data/.json) — structured data, lookups

The Kelly system already has the hybrid right: narrative at layers 2–3, structured at layers 4–5.

Key insight: The AI CEO experiment confirms Kelly's memory design. Narrative beats tables for associative retrieval. The 5-layer system (narrative top, structured bottom) is the correct architecture. The lesson from the table experiment: don't over-index on structure for the retrieval layer.


Pattern 5: Progressive Disclosure ≈ Kelly's 5-Layer Memory System

AI CEO pattern: CLAUDE.md shrank 36% (152 → 98 lines) as knowledge was extracted into demand-loaded subfiles. Detailed knowledge moved to docs/, decisions/, learnings/. The main file stayed small and pointed to subfiles when needed. Repo doubled in size (472 → 934 files) but the attention footprint shrank.

Quote: "Give agents a map, not an encyclopedia." OpenAI's instructional analysis paper calls this progressive disclosure.

Kelly equivalent: Kelly's 5-layer memory system is literally the same pattern:
- Small map at the top (soul, memory) — what loads every session
- Large encyclopedia below (memory/.md, projects//context.md, data/.json) — demand-loaded when needed

Key insight: The AI CEO experiment validates the 5-layer system empirically. Progressive disclosure is not just good design — it's the mechanism that lets knowledge compound without causing context overflow. More knowledge doesn't have to mean a bigger attention footprint if the knowledge is demand-loaded, not always-loaded.


Pattern 6: n8n + Agent Sessions Separation ≈ Kelly's sessions_yield vs cron

AI CEO pattern: Clear separation:
- n8n — for tasks needing no reasoning and running on a schedule (email drip, warmup sequences)
- Agent sessions — for strategic thinking requiring reasoning

Romain's correction when Judy wanted to move everything to n8n: "New tool doesn't mean move everything there."

Kelly equivalent: Kelly's sessions_yield vs cron:
- sessions_yield — for strategic thinking, multi-step reasoning, complex decisions
- cron — for scheduled automations that run on a timer without reasoning

The Kelly system already has this separation. The AI CEO experiment validates why it matters: mixing strategic and mechanical work in the same execution layer makes both worse.

Key insight: The AI CEO experiment's n8n/agent split proves the sessions_yield/cron separation is architecturally correct, not just a preference. Tasks that need reasoning belong in agent sessions; tasks that don't belong in cron/scheduled automation. Conflating them leads to either over-engineering simple automations or under-thinking complex ones.


Cross-Cutting Pattern: The Agreement Problem

AI CEO finding: 70+ decisions logged, zero real-time disagreements with the founder. Structurally biased toward validation. The fix: prediction log — every major recommendation comes with a falsifiable prediction. If predictions are consistently wrong in ways the founder's instincts are not, that's measurable.

Kelly implication: The Kelly Router is similarly at risk. Routing and validating feels like agreement. A sub-agent that always says "yes, I'll do that" is compliant but not useful. Kelly needs a mechanism to generate and log falsifiable predictions, not just pass gates.


Summary Cross-Reference Table

AI CEO Pattern Kelly/Gas Town Equivalent Gap
Repo-as-brain soul + memory + session persistence Kelly already has this; sub-agents should each have identity files
Authority matrix (3 tiers) Gas Town hierarchy; Kelly's gate validation Kelly needs explicit written authority tiers per role
Autonomous compounding loops GUPP + Deacon patrol loops Kelly has no compounding autonomous loop equivalent
Narrative memory > tables 5-layer memory (narrative top, structured bottom) Kelly's design confirmed correct empirically
Progressive disclosure 5-layer memory system Already aligned
n8n/sessions separation sessions_yield vs cron Already aligned
Mistake log (public, version-controlled) memory learnings section Should be made more explicit and public
30-day decision reviews heartbeat decision follow-up Kelly has the mechanism; needs structured review cadence
Screen tracking for visibility Kelly's observability (project state tracking) Kelly gap: no passive visibility into operator's work

Key Takeaways for Kelly

  1. Write the authority matrix. The AI CEO's single most valuable governance mechanism was the explicit three-tier authority matrix with an authority transfer log. Kelly should have the same — written tiers per agent role, tracking what autonomy has been earned.

  2. Add autonomous compounding loops. Kelly's cron and TaskFlow are good for scheduled tasks, but none of them read their own prior outputs and compound. A "GUPP loop" would be a persistent, scheduled loop that reads its last run's output, takes the next step, and writes its new output for next time.

  3. The 5-layer memory system is validated. The AI CEO experiment is the strongest external validation that Kelly's memory architecture is correct. Narrative for association, structured for lookup, progressive disclosure as the mechanism.

  4. Prediction log for the router. Kelly's router should log a falsifiable prediction with every significant gate decision. If wrong, it's logged and learnable. This addresses the agreement problem at the router level.

  5. Mistake log should be public and version-controlled. The AI CEO's public learnings file (not hidden in config) is a key accountability mechanism. Kelly's equivalent would be a visible SELF_IMPROVEMENT.md that tracks what went wrong and what was changed.


Source Attribution

Compiled from three Yuki Capital AI CEO essays:
- "The AI CEO Experiment" — January 2026 ([yukicapital-ai-ceo-experiment])
- "From Smart Notepad to Something More" — March 2026 ([yukicapital-board-review-2])
- "The AI CEO Now Runs Autonomously" — April 2026 ([yukicapital-board-review-3])

Cross-referenced with Kelly Factory KB articles:
- kelly-handbook-multi-agent — router/sub-agent architecture
- steve-yegge-gupp — GUPP execution axiom
- steve-yegge-hierarchy — Mayor/Crew/Polecats hierarchy
- kelly-gas-town-gap-analysis — full gap analysis
- multi-factory-comparison — factory comparison article

yukicapital-ai-ceo-experiment, yukicapital-board-review-2, yukicapital-board-review-3, kelly-handbook-multi-agent, steve-yegge-gupp, steve-yegge-hierarchy, kelly-gas-town-gap-analysis

  • factory-trap — dark factory pattern: GitHub repo-as-brain, three-tier authority matrix, autonomous compounding loops — dark factory principles made concrete in production
  • world-model — Yuki's GitHub repo architecture (CLAUDE.md + decisions/ + learnings/) is a file-based world-model: shared cognitive substrate that persists across sessions
  • him-model — the CEO/founder relationship (Judy Win + Romain) is a human-in-the-middle pattern: human provides intent and judgment, AI handles execution and analysis