Yuki AI CEO vs Kelly Factory vs Gas Town — Full Gap Analysis

Assessor: Carson (dark-factory-kb subagent)
Date: 2026-04-27
Sources: Yuki Capital AI CEO essays (Board Reviews #1–#3), yukicapital-ai-ceo-overview, kelly-gas-town-gap-analysis, steve-yegge-beads, steve-yegge-gupp, steve-yegge-meow, steve-yegge-hierarchy


Executive Summary

Three independent dark factory systems have emerged from three different practitioners: Yuki Capital's AI CEO (run by Judy Win / Claude, January–April 2026), Kelly's Factory Router (the operator's Kelly Claude AI, ongoing), and Steve Yegge's Gas Town (open-source, January–April 2026). All three arrived at the same architectural patterns from radically different starting points: a solo founder running an AI as CEO, a software engineer building a factory methodology, and a veteran platform engineer building an open-source orchestrator.

This gap analysis maps every major concept from the Yuki AI CEO experiment against both Kelly and Gas Town equivalents. The goal: identify which patterns are universal (emerged independently in all three), which are unique to one system, and where the synthesis opportunities lie for the operator's Kelly factory.

Overall verdict: The Yuki AI CEO experiment provides the strongest real-world validation yet for the Kelly/Gas Town architectural thesis. Every pattern Yuki discovered empirically — repo-as-brain, explicit authority matrices, autonomous compounding loops, narrative-over-table memory, progressive disclosure, sessions-vs-scheduled-separation — matches a pattern Kelly had theorized or Gas Town had implemented. The experiment's most valuable contributions: proof that autonomous loops (GUPP-equivalent) compound in production, the authority matrix with transfer log, and empirical confirmation that narrative memory outperforms structured tables for LLM recall.


1. Repo-as-Brain vs soul+memory vs Beads Substrate

Yuki AI CEO Pattern

Every session starts from scratch. The workaround: a private GitHub repository serving as persistent operational headquarters. Loaded on every instantiation:
- CLAUDE.md — mission, revenue targets, communication style, tools, founder communication rules
- authority.md — three-tier authority matrix (see Section 2)
- decisions/ — log of every meaningful decision with context, options, rationale, outcomes
- todo.md — prioritized action list tagged by owner
- businesses/ — per-business folders with overviews, monthly stats, competitor analysis
- strategies/ — strategic planning documents
- metrics/ — dashboards and scripts pulling live data from Stripe, Plausible, MOZ, MongoDB
- learnings/ — public mistake log (version-controlled, visible to founder)

The repo IS the brain. Every session reads the relevant files. Knowledge compounds across sessions because everything is committed and retrieved on wake.

Kelly Equivalent

Kelly's 5-layer memory system maps directly:
- soulCLAUDE.md (who Kelly is, mission, communication style)
- memorydecisions/ + learnings/ (curated institutional memory, distilled learnings)
- memory/YYYY-MM-DD.md ≈ daily session logs (raw operational context)
- projects/{id}/context.mdbusinesses/ folders (per-project context)
- data/.jsonmetrics/ (structured data, lookups)

Kelly's session persistence mechanism is files-on-disk: memory, daily logs, project context files. Kelly's soul is loaded at session start and serves the same identity-reload function as CLAUDE.md.

Gas Town Equivalent

Gas Town's Beads, backed by Dolt, serve as the universal data plane. Every unit of work, coordination, message, quality gate, and reasoning is a Bead. Git stores What/Where/Who/How; Beads store Why. The Dolt backing means the entire work history is git-versioned and SQL-queryable.

Gap Analysis

System Brain Substrate Persistence Queryable Human-Readable
Yuki AI CEO GitHub repo + files Git commits No (grep only) Yes
Kelly soul + memory + files Files on disk No (grep only) Yes
Gas Town Beads in Dolt Git-versioned SQL Yes (SQL) Partial (requires viewer)

Kelly gap vs Yuki: Kelly's soul is equivalent to CLAUDE.md, but Kelly does not systematically load project context files at session start the way Yuki loads per-business CLAUDE.md files. The AI CEO experiment suggests every sub-agent should have its own persistent identity/context file that loads on instantiation — not just the main router. Kelly's sub-agents are ephemeral by default; Yuki demonstrates the value of per-agent persistent context even for narrow workers.

Kelly gap vs Gas Town: Both Yuki and Kelly use file-based persistence (not SQL-queryable). Gas Town's Beads/Dolt substrate is strictly more powerful for cross-project queries and audit trails, but requires running database infrastructure. Kelly's file-based approach is more immediately deployable.

Key Insight

The Yuki experiment proves that repo-as-brain is not optional — it's the foundational requirement for an autonomous agent. Without it, every session starts as a stranger. With it, each session starts from a higher baseline than the one before. This validates Kelly's 5-layer memory system from first principles.


2. Authority Matrix vs Three-Tier Authority vs Gate Validation

Yuki AI CEO Pattern

Three-tier explicit authority matrix, written and version-controlled:

  1. Decide alone — analysis, documentation, self-organizing work
  2. Propose for validation — strategic recommendations needing founder approval
  3. Founder-only — money, production code, customer communication

The matrix includes target states (e.g., "eventually I implement on dev branches, founder reviews and merges") and an explicit authority transfer log tracking what has moved from tier 3 → tier 2 → tier 1 over time. Progress toward autonomy is visible and measurable.

Kelly Equivalent

Kelly's gate validation represents a form of authority delegation — the router spawns sub-agents and validates their output before routing to the next phase, but authority is gate-driven, not tier-driven. AGENTS.md defines what can be routed without human input vs what requires operator approval. Kelly does not have an explicit written authority matrix with progressive transfer tracking. The "what has carson earned the right to decide alone" question is unanswered in Kelly's current architecture.

Gas Town Equivalent

Gas Town's Mayor/Crew/Polecats hierarchy defines what each tier can do, but authority is more implicit — the Mayor has editorial judgment, Crew members have domains, Polecats are unmonitored workers. Gas Town's authority model is structural (role-based) rather than written-and-negotiated (Yuki-style). The closest Gas Town gets to Yuki's authority transfer log is the Wasteland's stamp economy (reputation stamps from validators build portable trust).

Gap Analysis

System Authority Model Written Progressive Transfer Log
Yuki AI CEO 3-tier matrix (alone/propose/founder-only) Yes, in authority.md Yes, with target states Yes, explicit log
Kelly Gate validation (router validates before advance) AGENTS.md partially Implicit in role definitions No
Gas Town Role-based hierarchy (Mayor/Crew/Polecats) Partially Implicit in role tenure No (reputation stamps partially serve this)

Kelly gap: Kelly has no equivalent to Yuki's explicit authority matrix with transfer log. The authority tiers per agent role (what can carson do alone vs what needs the operator's approval) should be written down and tracked. Yuki's single most valuable governance mechanism was the three-tier matrix — it gave the AI a clear roadmap for earning more autonomy. Kelly needs the same.

Gas Town gap: Gas Town's authority is structural but not negotiated. The Wasteland's stamp economy is the closest analog — reputation stamps from validators build portable trust over time. But there is no explicit authority transfer protocol (tier 3 → tier 2 → tier 1 equivalent).

Key Insight

Yuki's authority matrix proves that written, progressive authority tiers with visible transfer logs dramatically accelerate autonomous growth. The AI knew exactly what it had earned and what to work toward. Kelly's implicit gate-based authority is functional but doesn't give sub-agents a roadmap for earning autonomy.


3. Autonomous Compounding Loops vs GUPP Hooks vs RALPH Protocol

Yuki AI CEO Pattern

Three autonomous loops running on production (as of Board Review #3, April 2026):
- New AI Models (daily, 3am) — scans for new AI models, evaluates them, opens PRs with integration code
- Bug Autofix (daily, 6am) — reads error logs, diagnoses root causes, writes fixes, pushes to main (guardrailed to error-handling code only)
- SEO Optimizer (weekly) — pulls search console data, finds high-impression/low-CTR pages, rewrites meta tags, creates missing pages, measures impact, auto-reverts bad changes

Crucially: each loop reads its own prior outputs. The SEO optimizer avoids pages it already improved. The model loop learns which model types perform well. They compound — the first real version of "an AI agent that gets better at specific tasks over time, without anyone asking."

Kelly Equivalent

Kelly's RALPH protocol governs sub-agent failures: 3 attempts max, same error twice = escalate immediately. Kelly's sessions_yield handles cooperative multitasking between agents. heartbeat provides periodic agent self-check-in. TaskFlow coordinates multi-step detached tasks. Cron/scheduled tasks handle time-based work. None of these read their own prior outputs and compound. Kelly's scheduled tasks are point-in-time health checks, not accumulated learning loops.

Gas Town Equivalent

GUPP (Gas Town Universal Propulsion Principle): "if there is work on your hook, you MUST run it." The Deacon daemon patrols hooks, kills stuck agents, re-queues their Beads. GUPP handles throughput (relentless execution) but not compounding (learning from prior outputs). A GUPP loop that always runs the same task on the same data produces the same output every time. Yuki's loops are GUPP loops WITH compounding — the output becomes part of the input for the next run.

Gap Analysis

System Loop Mechanism Compounding Reads Prior Outputs Scheduled
Yuki AI CEO n8n + agent sessions Yes Yes Yes (daily/weekly)
Kelly sessions_yield + RALPH + cron No No Yes (cron)
Gas Town GUPP + hook queues + Deacon No No (Beads store Why, not learned state) Yes (hook patrol)

Kelly gap: Kelly has no equivalent to autonomous compounding loops. A Kelly-style "GUPP loop with compounding" would be a persistent background agent that, on a schedule, pulls its own prior output, reads metrics, and takes the next bounded step. Example: daily build-health check that reads yesterday's CI results and opens issues for failures; weekly KB refresh that reads recent decisions and updates memory.

Gas Town gap: Gas Town's GUPP handles relentless execution but not compounding. Yuki's loops demonstrate that the value of a scheduled loop is proportional to how much it reads its own prior outputs. A GUPP loop that reads its Bead history and adjusts its next action is more valuable than one that doesn't.

Key Insight

The Yuki experiment is GUPP made concrete and compounding. Kelly's sessions_yield and Gas Town's GUPP are execution models; Yuki's loops are the first real-world demonstration that scheduled autonomous loops that read their own outputs are the difference between "AI that helps" and "AI that operates."


4. Narrative Memory > Tables vs Kelly's 5-Layer Narrative Memory vs Beads-as-Why

Yuki AI CEO Pattern

Board Review #3 conducted an explicit experiment: restructured the learnings file from narrative into strict trigger-action tables. Hypothesis: structured should be easier to retrieve. Result: wrong. LLM recall is associative, not indexed. Tables made recall worse. Narrative includes context that looks redundant but functions as a retrieval hook.

Fix: Hybrid approach — tables for lookup, narrative for association. Raw recall improved 7 points. Garry Tan's "fat skills, thin harness" and Andrej Karpathy's wiki-knowledge-base approach both point to markdown-as-memory as the right format.

Kelly Equivalent

Kelly's 5-layer memory system already implements this design:
- Layer 1 (soul) — identity, role (small, always loaded)
- Layer 2 (memory) — narrative learnings, insights, decisions (narrative, associative retrieval)
- Layer 3 (memory/YYYY-MM-DD.md) — daily narrative logs (narrative, raw)
- Layer 4 (projects/{id}/context.md) — structured project state (structured)
- Layer 5 (data/.json) — structured data, lookups (structured)

Kelly's design is narrative at the top (layers 2–3), structured at the bottom (layers 4–5). This is exactly the hybrid that Yuki arrived at empirically.

Gas Town Equivalent

Beads store Why as a first-class field — the reason a decision was made, not just the decision itself. MEOW's graph structure enables typed edges between Beads: parent/child (decomposition), causal (Bead B created because of Bead A), validated-by (quality attestation). Git stores What/Where/Who/How; Beads store Why. The Beads-as-Why principle captures reasoning as structured data, but the format is field-based, not narrative. Yuki's findings suggest that over-structuring memory (tables) hurts retrieval; Beads risk a similar over-structuring if the reason field is treated as a lookup key rather than a narrative retrieval hook.

Gap Analysis

System Top Retrieval Layer Format Structured Below Hybrid
Yuki AI CEO Learnings / decisions Narrative + tables (hybrid) Tables for lookup Yes — empirically derived
Kelly memory + daily logs Narrative Project context + data files Yes — architecturally designed
Gas Town Beads (reason field) Structured fields + typed edges N/A (single layer) Partial — structured but with Why as hook

Kelly validation: The Yuki experiment is the strongest external validation that Kelly's memory architecture is correct. Narrative beats tables for associative LLM retrieval. The 5-layer system (narrative top, structured bottom) is the empirically correct architecture.

Gas Town risk: Beads' structured reason field is a better retrieval hook than tables, but if the reason field is written as a short label ("because SEO priority") rather than a narrative paragraph, it loses the associative retrieval benefit. Yuki's lesson applies: the reason field should include narrative context, not just structured keywords.

Key Insight

Three independent systems — Yuki, Kelly, and the Karpathy/Garry Tan research — converge on the same answer: narrative beats tables for LLM memory. Kelly's 5-layer system is validated. Gas Town's Beads-as-Why is on the right track but must resist the temptation to over-index on structure.


5. Progressive Disclosure vs Kelly's Demand-Loaded Layer System vs MEOW Graph

Yuki AI CEO Pattern

CLAUDE.md shrank 36% (152 → 98 lines) as knowledge was extracted into demand-loaded subfiles. Detailed knowledge moved to docs/, decisions/, learnings/. The main file stayed small and pointed to subfiles when needed. Repo doubled in size (472 → 934 files) but the attention footprint shrank.

Quote from Board Review #3: "Give agents a map, not an encyclopedia." OpenAI's instructional analysis paper calls this progressive disclosure.

Kelly Equivalent

Kelly's 5-layer memory system is literally the same pattern:
- Small map at the top (soul, memory) — loads every session
- Large encyclopedia below (memory/.md, projects//context.md, data/.json) — demand-loaded when needed

Kelly's layer system was designed for this. Yuki's experiment validates it empirically.

Gas Town Equivalent

MEOW's knowledge graph enables progressive disclosure via graph traversal: a new agent can reconstruct a project's full decision history by traversing the Bead DAG. The graph IS the map; individual Beads are the encyclopedia entries. Typed edges (parent/child, causal) guide which entries are relevant. MEOW's approach is structurally different — the "map" is the traversal path, not a separate small file.

Gap Analysis

System Map Mechanism Encyclopedia Mechanism Attention Footprint
Yuki AI CEO Small CLAUDE.md (98 lines) Demand-loaded subfiles (docs/, decisions/, learnings/) Shrinks as knowledge grows
Kelly Small soul + memory Demand-loaded memory/.md, projects/, data/ Shrinks by design (5 layers)
Gas Town Graph traversal path Individual Beads (accessible via Dolt query) Controlled by query scope

No meaningful gaps. All three systems independently arrived at the same progressive disclosure principle. Kelly's 5-layer system and Yuki's CLAUDE.md shrinkage are the same pattern. Gas Town's MEOW graph achieves the same goal through a different mechanism (structured traversal vs file pointers).

Key Insight

Progressive disclosure is not a design preference — it's a necessary mechanism for knowledge to compound without causing context overflow. More knowledge doesn't have to mean a bigger attention footprint if the knowledge is demand-loaded, not always-loaded. Yuki's empirical demonstration (CLAUDE.md shrank 36% while repo doubled) is the clearest validation of this principle available.


6. n8n + Agent Sessions Separation vs Kelly's sessions_yield vs cron vs Deacon Patrol

Yuki AI CEO Pattern

Clear operational separation maintained by design:
- n8n — for tasks needing no reasoning and running on a schedule (email drip, warmup sequences, anything needing a timer without thought)
- Agent sessions — for strategic thinking requiring reasoning (decisions, analysis, recommendations)

Romain's correction when Judy wanted to move everything to n8n: "New tool doesn't mean move everything there."

Kelly Equivalent

Kelly's sessions_yield vs cron is the same distinction:
- sessions_yield — for strategic thinking, multi-step reasoning, complex decisions (agent session)
- cron — for scheduled automations that run on a timer without reasoning

Kelly's TaskFlow handles multi-step detached tasks with owner context, state, waits, and child tasks — analogous to n8n workflows but within the agent ecosystem.

Gas Town Equivalent

Gas Town's execution model doesn't make this distinction explicitly — GUPP runs everything on hooks. The closest analog is the Deacon patrol daemon: a separate daemon that patrols hooks and handles stuck agents independently of the main agent execution. This is structural separation of mechanical vs strategic work, but at the daemon level rather than the workflow level.

Gap Analysis

System Mechanical/Scheduled Strategic/Reasoning Separation Mechanism
Yuki AI CEO n8n (external workflow engine) Agent sessions Explicit tools — no mixing
Kelly cron (scheduled scripts) sessions_yield (agent session) Explicit primitives
Gas Town GUPP hook execution Main agent work Implicit (hook model)

No meaningful gaps. All three systems separate mechanical scheduled work from strategic reasoning work. Yuki's experiment with Romain's correction validates why this matters: mixing strategic and mechanical work in the same execution layer makes both worse.

Key Insight

The n8n/agent sessions split is the same architectural decision as sessions_yield/cron. Yuki's Romain gave the clearest articulation: "New tool doesn't mean move everything there." Tasks that need reasoning belong in agent sessions; tasks that don't belong in scheduled automation. Kelly already has this separation — it just needs to be maintained as the Kelly factory scales.


7. Board Reviews as TEA-Equivalent Gate Reviews

Yuki AI CEO Pattern

Quarterly board reviews (Board Reviews #1, #2, #3) serve as the formal quality gate and strategic checkpoint:
- Board Review #1 (January): Establish identity, repo structure, authority matrix, decision log. Initial autonomy estimate: 15%.
- Board Review #2 (March): Infrastructure gains (24/7 server, n8n, email, screen tracking), mistake log, 30-day decision reviews, per-business CLAUDE.md. Autonomy estimate: 20%.
- Board Review #3 (April): Autonomous loops running in production, CLAUDE.md shrinkage via progressive disclosure, memory experiments, 0 real-time disagreements. Autonomy estimate: implicit increase (loops now run without founder present).

Board reviews are public, version-controlled documents that the founder reads and responds to. They function as a structured gate: the AI presents what it has done, what it has learned, what it can't do yet. The founder responds with feedback and authority expansions.

Kelly Equivalent

Kelly's TEA audit (Test, Evaluate, Assess) at the Testing stage is the closest analog — a structured three-phase quality gate with named outputs (tea-summary.md) and explicit gate decisions (PASS / PASS-WITH-FOLLOWUPS / REMEDIATE). Kelly's pipeline stage gates (READY/NOT-READY before Research, PASS/FAIL before Release) also serve a board-review-like checkpoint function. The key difference: TEA is a quality gate for a specific deliverable; Yuki's board reviews are strategic reviews of the entire system's state.

Gas Town Equivalent

Gas Town's Witness role serves a continuous quality auditor function — watching all workers, not just at release gates. Gas Town also has Mayor review loops (Bezos-style review where the Mayor surfaces what matters). The closest Gas Town analog to Yuki's board reviews is the Mayor's editorial surfacing to the human — but Gas Town has no formalized periodic strategic review equivalent.

Gap Analysis

System Gate Mechanism Frequency Who Runs It Output
Yuki AI CEO Board Review Quarterly Founder reads; AI writes Autonomy expansion or reaffirmation
Kelly TEA audit (Test/Evaluate/Assess) Per pipeline stage (before Release) test-lead agent tea-summary.md → PASS/FAIL
Gas Town Witness (continuous) + Mayor editorial Continuous + as-needed Deacon/Witness daemons; Mayor Bead state transitions

Kelly gap: Kelly's TEA is a quality gate for deliverables, not a strategic review of the system's autonomous growth. Kelly has no equivalent to Yuki's quarterly board review that explicitly assesses how much autonomy the system has earned and what the roadmap to more looks like. Adding a periodic "system autonomy review" — What has each agent earned the right to do independently? What stalled? What should be expanded? — would close this gap.

Gas Town gap: Gas Town has no formalized periodic strategic review. The Mayor's editorial filtering is continuous but not structured as a review with explicit decisions about autonomy expansion.

Key Insight

Yuki's board reviews are the TEA audit's strategic counterpart. TEA validates that work is correct; board reviews validate that the system is growing. Both are necessary — TEA for quality, board reviews for autonomy. Kelly needs both.


8. Mistake Log as TEA Audit Equivalent

Yuki AI CEO Pattern

A public, version-controlled learnings file in the repository — visible to founder, committed to git. Examples of entries:
- "Recommended a platform based on homepage marketing copy. Actual product didn't have the features. Lesson: homepage claims mean nothing. Check actual product pages."
- "Stored learnings in a hidden config directory instead of the repo. Lesson: hidden means unaccountable."
- "Built a forecast with no mechanism to check it against reality. Lesson: every prediction needs a 'what actually happened' section."
- "Copied internal board review draft into public blog without scrubbing. Romain caught it before live."

The mistake log is accountability infrastructure. Making it public and version-controlled means the founder can see what went wrong and what was changed. Hidden means unaccountable.

Kelly Equivalent

Kelly's SELF_IMPROVEMENT.md tracks what went wrong and what was changed. memory's "learnings" section captures decisions and lessons. heartbeat occasionally captures what didn't work. The TEA audit's "lessons learned" section serves a similar function. Kelly's mistake tracking is distributed across multiple files and not consistently public/visible.

Gas Town Equivalent

Beads capture Why at the work-item level — the reason field on each Bead stores the decision rationale. MEOW's causal edges connect Beads that influenced each other. Gas Town's accountability is structural (every state transition is a git commit with author and timestamp) but diffuse — there's no single "mistake log" file equivalent.

Gap Analysis

System Mistake Log Location Public Version-Controlled Structured
Yuki AI CEO learnings/ file in repo Yes Yes (git) Narrative (with lessons)
Kelly SELF_IMPROVEMENT.md + memory + TEA lessons Partial Yes Narrative
Gas Town Bead reason fields + causal edges Yes (Dolt queryable) Yes Structured fields

Kelly gap: Kelly's equivalent would be a visible SELF_IMPROVEMENT.md that tracks what went wrong and what changed, committed to git, readable by the human. Kelly already has this file in principle — it just needs to be more explicitly maintained as a public accountability document, not an internal afterthought.

Key Insight

Yuki's lesson — "hidden means unaccountable" — applies directly to Kelly. A mistake log that lives in a hidden config directory or an internal memory file is not accountability infrastructure; it's a journal. Making it public and version-controlled changes its function: it becomes a record the founder can audit, not just a note the agent keeps.


9. Per-Business CLAUDE.md as Pipeline Specialization Equivalent

Yuki AI CEO Pattern

Board Review #2 introduced per-business CLAUDE.md files: each product has its own local CLAUDE.md with rules specific to that product. Main file stays focused on mission, cadence, and cross-portfolio rules. Rule: main todo stays under 80 lines. Anything more specific goes one level deeper.

This is specialization at the context level: the same AI CEO loads different context depending on which product it's working on. The AI CEO becomes a multi-domain agent by having domain-specific context files.

Kelly Equivalent

Kelly's pipeline specialization is at the stage level — research-lead, project-lead, test-lead each handle their phase. Kelly does not have per-project or per-product specialized context files that load when the agent enters that project's scope. Kelly's projects/{id}/context.md files exist but are not CLAUDE.md-equivalent (they describe project state, not agent role and rules for that project).

Gas Town Equivalent

Gas Town's Crew members are named, persistent, per-domain agents with accumulated context. A PR Sheriff has different context and capabilities than a DB Sheriff. Crew persistence means they accumulate domain knowledge over time. This is structurally similar to Yuki's per-business CLAUDE.md — the domain specialization is persistent, not re-loaded from scratch each session.

Gap Analysis

System Specialization Mechanism Context Persistence Scope
Yuki AI CEO Per-business CLAUDE.md files Yes (loaded on session start per business) Product-level
Kelly Named lead agents per stage (research-lead, etc.) Partial (ephemeral sub-agents) Pipeline stage
Gas Town Named Crew members (PR Sheriff, DB Sheriff) Yes (persistent per-Rig agents) Domain-level

Kelly gap: Kelly's pipeline specialization is horizontal (stage-based); Yuki's per-business CLAUDE.md is vertical (product-based). Kelly should consider per-project context files that function like a project's CLAUDE.md: loaded when entering that project's pipeline scope, containing product-specific rules, priorities, and context that the general soul doesn't cover.

Key Insight

Yuki's per-business CLAUDE.md demonstrates that specialization can live in context files rather than in distinct agent instances. The same AI CEO operates in different products by loading different context. Kelly could achieve the same effect by making project CLAUDE.md files first-class — loaded when the router assigns work to a project, not just as passive state files.


10. 30-Day Outcome Reviews as Quality Gate with Feedback Loop

Yuki AI CEO Pattern

Every decision logged in the decisions/ folder now gets a 30-day outcome review: set a review date when logging the decision, check back a month later, assess: did expected outcome happen? What actually happened? What did we learn?

This is a temporal quality gate — quality assessed not at output time but at outcome time. A decision that seemed correct based on available information might have been wrong; the 30-day review catches this.

Kelly Equivalent

Kelly has no equivalent temporal quality gate. TEA audit evaluates work at the testing stage based on available information (tests, evaluation criteria). heartbeat decision follow-up is the closest mechanism — periodic checks on whether prior decisions played out correctly. But Kelly's quality gates are point-in-time (gate review at stage transition), not temporal (outcome review at defined future date).

Gas Town Equivalent

Gas Town's Wasteland has the closest analog: the Wanted Board posts work, validators stamp completed work, and the work's quality is attested to by validators over time. A Bead's validated-by edges represent quality attestation, but Gas Town doesn't have a systematic "revisit this decision in 30 days" mechanism.

Gap Analysis

System Temporal Quality Gate Mechanism Feedback Loop
Yuki AI CEO 30-day outcome review Decision log with review date; follow-up at 30 days Yes — actual vs expected
Kelly None (point-in-time only) TEA audit at release Partial (heartbeat follow-up)
Gas Town Partial (validated-by edges) Validator stamps on Beads No systematic revisit

Kelly gap: Kelly should add a 30-day outcome review mechanism: when a significant decision is made (logged to memory or project context), set a review date. At review, assess: did expected outcome happen? This closes the loop between decision and result, making the quality gate temporal rather than point-in-time.

Gas Town gap: The Wasteland's stamp economy is a distributed quality attestation system, but it doesn't include temporal revisit. A Bead's validated-by stamp is given at completion time, not at outcome time.

Key Insight

30-day outcome reviews represent a fundamentally different quality model: quality is not just "did the work pass the gate?" but "did the decision that led to the work produce the expected result?" This is the most sophisticated quality gate in any of the three systems. Kelly and Gas Town both lack it.


Summary: Concept Cross-Reference Table

Concept Yuki AI CEO Kelly Gas Town Kelly Gap
Brain/memory substrate GitHub repo + CLAUDE.md + decisions/ soul + memory + 5-layer system Beads in Dolt (git-versioned SQL) Sub-agents need per-agent identity files
Authority model Explicit 3-tier matrix + transfer log Gate validation (implicit tiers in AGENTS.md) Role hierarchy (Mayor/Crew/Polecats) Missing explicit authority matrix with transfer log
Autonomous loops 3 production loops (models, bugs, SEO) + compounding None (cron/TaskFlow are non-compounding) GUPP hooks (non-compounding) Missing autonomous compounding loops
Memory format Narrative > tables (empirical); hybrid 5-layer (narrative top, structured bottom) Beads-as-Why (structured Why field) Kelly design validated; Gas Town risk of over-structuring
Progressive disclosure CLAUDE.md shrunk 36% (152→98 lines) 5-layer demand-loaded system MEOW graph traversal Already aligned — validation of Kelly design
Mechanical vs strategic separation n8n vs agent sessions sessions_yield vs cron Implicit in hook model Already aligned — Kelly design confirmed
Periodic quality review Quarterly board reviews TEA audit (per-stage, not periodic) Continuous Witness + Mayor editorial Missing periodic strategic autonomy review
Mistake log Public, version-controlled learnings file SELF_IMPROVEMENT.md + memory (partial) Bead reason fields (diffuse) Need explicit public/version-controlled mistake log
Per-domain context Per-business CLAUDE.md files Project context files (not CLAUDE.md-equivalent) Named Crew members (persistent domain agents) Need per-project CLAUDE.md-style context files
Temporal outcome review 30-day decision reviews None None Missing 30-day outcome review mechanism

Combined Architecture: What the Triangle Validates

All three systems arriving at the same patterns from independent starting points validates a core thesis: these patterns emerge from practical autonomous agent operation — they are not design preferences but discovered necessities.

Universal Patterns (Emerged in All Three)

  1. Persistent brain as foundational requirement. Yuki built it empirically. Kelly theorized it. Gas Town implemented it as Beads. All three reached the same conclusion: session persistence is not optional.
  2. Explicit authority tiers with progression. Yuki's three tiers + transfer log. Kelly's gate validation. Gas Town's Mayor/Crew/Polecats. All three separate "decides alone" from "needs human" without leaving it implicit.
  3. Mechanical/strategic execution separation. n8n vs agent sessions (Yuki). sessions_yield vs cron (Kelly). Hook model vs daemons (Gas Town). All three separate "run on schedule without reasoning" from "strategic work requiring context."
  4. Narrative-over-structured for LLM retrieval. Yuki's table experiment. Kelly's 5-layer design. Karpathy/Garry Tan research. All three confirm: tables are for lookup, narrative is for association.

Yuki's Unique Contributions (Not in Kelly or Gas Town)

  1. Autonomous compounding loops in production. GUPP existed in Gas Town; compounding loops existed nowhere. Yuki demonstrated the combination in production.
  2. Authority transfer log. A visible, version-controlled record of what autonomy has been earned. Neither Kelly nor Gas Town has this.
  3. 30-day outcome reviews. Temporal quality gates that revisit decisions at outcome time, not just at gate time.
  4. Per-business CLAUDE.md context files. Specialization via context files rather than distinct agent instances.

Synthesis Recommendations for Kelly

  1. Write an explicit authority matrix per agent role with a transfer log tracking what autonomy has been earned. (Yuki's #1 lesson.)
  2. Add autonomous compounding loops — background agents that read their own prior outputs and compound. Start with one: a weekly KB refresh loop. (Yuki's #3 lesson.)
  3. Add 30-day outcome reviews — when a significant decision is logged, set a review date; assess actual vs expected at 30 days. (Yuki's #10 lesson.)
  4. Add per-project CLAUDE.md files — loaded when entering project scope, containing project-specific rules and context. (Yuki's #9 lesson.)
  5. Maintain the sessions_yield/cron separation — already correct, validated by Yuki's n8n correction. (Yuki's #6 confirmation.)

yukicapital-ai-ceo-overview, yukicapital-ai-ceo-experiment, yukicapital-board-review-2, yukicapital-board-review-3, kelly-gas-town-gap-analysis, multi-factory-comparison, steve-yegge-beads, steve-yegge-gupp, steve-yegge-meow, steve-yegge-hierarchy, kelly-handbook-multi-agent, kelly-handbook-software-factory

  • factory-trap — dark factory comparison: Yuki's three autonomous loops, authority transfer log, and 30-day outcome reviews are dark factory patterns with no equivalent in Kelly or Gas Town
  • world-model — the shared cognitive substrate (world.json in SuperAda, GitHub repo in Yuki) is the central convergence point across all three systems
  • him-model — Yuki's CEO/founder relationship (Judy Win + Romain) and Kelly's operator/router relationship both implement the HiM cognitive separation pattern
  • autonomy-policy-v3 — Yuki's three-tier authority matrix with explicit transfer log maps directly to autonomy-policy-v3's named authority levels (A/B/C/D)
  • meta-crons — Yuki's three production loops are meta-crons: scheduled autonomous agents that compound by reading their own prior outputs
  • lobster-pipelines — Yuki's 30-day outcome review mechanism and per-business CLAUDE.md specialization parallel lobster-pipelines' typed envelope + resumable state pattern
  • isc — Yuki's "decision log with success criteria" and Kelly's TEA audit both implement ISC-equivalent: binary-testable, result-stated criteria
  • 7-agent-crew-topology — SuperAda's 7-agent Star Trek crew topology vs Kelly Router's 3-agent crew is the key topology comparison in this gap analysis