Yuki AI CEO vs Kelly Factory vs Gas Town — Full Gap Analysis¶
Assessor: Carson (dark-factory-kb subagent)
Date: 2026-04-27
Sources: Yuki Capital AI CEO essays (Board Reviews #1–#3), yukicapital-ai-ceo-overview, kelly-gas-town-gap-analysis, steve-yegge-beads, steve-yegge-gupp, steve-yegge-meow, steve-yegge-hierarchy
Executive Summary¶
Three independent dark factory systems have emerged from three different practitioners: Yuki Capital's AI CEO (run by Judy Win / Claude, January–April 2026), Kelly's Factory Router (the operator's Kelly Claude AI, ongoing), and Steve Yegge's Gas Town (open-source, January–April 2026). All three arrived at the same architectural patterns from radically different starting points: a solo founder running an AI as CEO, a software engineer building a factory methodology, and a veteran platform engineer building an open-source orchestrator.
This gap analysis maps every major concept from the Yuki AI CEO experiment against both Kelly and Gas Town equivalents. The goal: identify which patterns are universal (emerged independently in all three), which are unique to one system, and where the synthesis opportunities lie for the operator's Kelly factory.
Overall verdict: The Yuki AI CEO experiment provides the strongest real-world validation yet for the Kelly/Gas Town architectural thesis. Every pattern Yuki discovered empirically — repo-as-brain, explicit authority matrices, autonomous compounding loops, narrative-over-table memory, progressive disclosure, sessions-vs-scheduled-separation — matches a pattern Kelly had theorized or Gas Town had implemented. The experiment's most valuable contributions: proof that autonomous loops (GUPP-equivalent) compound in production, the authority matrix with transfer log, and empirical confirmation that narrative memory outperforms structured tables for LLM recall.
1. Repo-as-Brain vs soul+memory vs Beads Substrate¶
Yuki AI CEO Pattern¶
Every session starts from scratch. The workaround: a private GitHub repository serving as persistent operational headquarters. Loaded on every instantiation:
- CLAUDE.md — mission, revenue targets, communication style, tools, founder communication rules
- authority.md — three-tier authority matrix (see Section 2)
- decisions/ — log of every meaningful decision with context, options, rationale, outcomes
- todo.md — prioritized action list tagged by owner
- businesses/ — per-business folders with overviews, monthly stats, competitor analysis
- strategies/ — strategic planning documents
- metrics/ — dashboards and scripts pulling live data from Stripe, Plausible, MOZ, MongoDB
- learnings/ — public mistake log (version-controlled, visible to founder)
The repo IS the brain. Every session reads the relevant files. Knowledge compounds across sessions because everything is committed and retrieved on wake.
Kelly Equivalent¶
Kelly's 5-layer memory system maps directly:
- soul ≈ CLAUDE.md (who Kelly is, mission, communication style)
- memory ≈ decisions/ + learnings/ (curated institutional memory, distilled learnings)
- memory/YYYY-MM-DD.md ≈ daily session logs (raw operational context)
- projects/{id}/context.md ≈ businesses/ folders (per-project context)
- data/.json ≈ metrics/ (structured data, lookups)
Kelly's session persistence mechanism is files-on-disk: memory, daily logs, project context files. Kelly's soul is loaded at session start and serves the same identity-reload function as CLAUDE.md.
Gas Town Equivalent¶
Gas Town's Beads, backed by Dolt, serve as the universal data plane. Every unit of work, coordination, message, quality gate, and reasoning is a Bead. Git stores What/Where/Who/How; Beads store Why. The Dolt backing means the entire work history is git-versioned and SQL-queryable.
Gap Analysis¶
| System | Brain Substrate | Persistence | Queryable | Human-Readable |
|---|---|---|---|---|
| Yuki AI CEO | GitHub repo + files | Git commits | No (grep only) | Yes |
| Kelly | soul + memory + files | Files on disk | No (grep only) | Yes |
| Gas Town | Beads in Dolt | Git-versioned SQL | Yes (SQL) | Partial (requires viewer) |
Kelly gap vs Yuki: Kelly's soul is equivalent to CLAUDE.md, but Kelly does not systematically load project context files at session start the way Yuki loads per-business CLAUDE.md files. The AI CEO experiment suggests every sub-agent should have its own persistent identity/context file that loads on instantiation — not just the main router. Kelly's sub-agents are ephemeral by default; Yuki demonstrates the value of per-agent persistent context even for narrow workers.
Kelly gap vs Gas Town: Both Yuki and Kelly use file-based persistence (not SQL-queryable). Gas Town's Beads/Dolt substrate is strictly more powerful for cross-project queries and audit trails, but requires running database infrastructure. Kelly's file-based approach is more immediately deployable.
Key Insight¶
The Yuki experiment proves that repo-as-brain is not optional — it's the foundational requirement for an autonomous agent. Without it, every session starts as a stranger. With it, each session starts from a higher baseline than the one before. This validates Kelly's 5-layer memory system from first principles.
2. Authority Matrix vs Three-Tier Authority vs Gate Validation¶
Yuki AI CEO Pattern¶
Three-tier explicit authority matrix, written and version-controlled:
- Decide alone — analysis, documentation, self-organizing work
- Propose for validation — strategic recommendations needing founder approval
- Founder-only — money, production code, customer communication
The matrix includes target states (e.g., "eventually I implement on dev branches, founder reviews and merges") and an explicit authority transfer log tracking what has moved from tier 3 → tier 2 → tier 1 over time. Progress toward autonomy is visible and measurable.
Kelly Equivalent¶
Kelly's gate validation represents a form of authority delegation — the router spawns sub-agents and validates their output before routing to the next phase, but authority is gate-driven, not tier-driven. AGENTS.md defines what can be routed without human input vs what requires operator approval. Kelly does not have an explicit written authority matrix with progressive transfer tracking. The "what has carson earned the right to decide alone" question is unanswered in Kelly's current architecture.
Gas Town Equivalent¶
Gas Town's Mayor/Crew/Polecats hierarchy defines what each tier can do, but authority is more implicit — the Mayor has editorial judgment, Crew members have domains, Polecats are unmonitored workers. Gas Town's authority model is structural (role-based) rather than written-and-negotiated (Yuki-style). The closest Gas Town gets to Yuki's authority transfer log is the Wasteland's stamp economy (reputation stamps from validators build portable trust).
Gap Analysis¶
| System | Authority Model | Written | Progressive | Transfer Log |
|---|---|---|---|---|
| Yuki AI CEO | 3-tier matrix (alone/propose/founder-only) | Yes, in authority.md | Yes, with target states | Yes, explicit log |
| Kelly | Gate validation (router validates before advance) | AGENTS.md partially | Implicit in role definitions | No |
| Gas Town | Role-based hierarchy (Mayor/Crew/Polecats) | Partially | Implicit in role tenure | No (reputation stamps partially serve this) |
Kelly gap: Kelly has no equivalent to Yuki's explicit authority matrix with transfer log. The authority tiers per agent role (what can carson do alone vs what needs the operator's approval) should be written down and tracked. Yuki's single most valuable governance mechanism was the three-tier matrix — it gave the AI a clear roadmap for earning more autonomy. Kelly needs the same.
Gas Town gap: Gas Town's authority is structural but not negotiated. The Wasteland's stamp economy is the closest analog — reputation stamps from validators build portable trust over time. But there is no explicit authority transfer protocol (tier 3 → tier 2 → tier 1 equivalent).
Key Insight¶
Yuki's authority matrix proves that written, progressive authority tiers with visible transfer logs dramatically accelerate autonomous growth. The AI knew exactly what it had earned and what to work toward. Kelly's implicit gate-based authority is functional but doesn't give sub-agents a roadmap for earning autonomy.
3. Autonomous Compounding Loops vs GUPP Hooks vs RALPH Protocol¶
Yuki AI CEO Pattern¶
Three autonomous loops running on production (as of Board Review #3, April 2026):
- New AI Models (daily, 3am) — scans for new AI models, evaluates them, opens PRs with integration code
- Bug Autofix (daily, 6am) — reads error logs, diagnoses root causes, writes fixes, pushes to main (guardrailed to error-handling code only)
- SEO Optimizer (weekly) — pulls search console data, finds high-impression/low-CTR pages, rewrites meta tags, creates missing pages, measures impact, auto-reverts bad changes
Crucially: each loop reads its own prior outputs. The SEO optimizer avoids pages it already improved. The model loop learns which model types perform well. They compound — the first real version of "an AI agent that gets better at specific tasks over time, without anyone asking."
Kelly Equivalent¶
Kelly's RALPH protocol governs sub-agent failures: 3 attempts max, same error twice = escalate immediately. Kelly's sessions_yield handles cooperative multitasking between agents. heartbeat provides periodic agent self-check-in. TaskFlow coordinates multi-step detached tasks. Cron/scheduled tasks handle time-based work. None of these read their own prior outputs and compound. Kelly's scheduled tasks are point-in-time health checks, not accumulated learning loops.
Gas Town Equivalent¶
GUPP (Gas Town Universal Propulsion Principle): "if there is work on your hook, you MUST run it." The Deacon daemon patrols hooks, kills stuck agents, re-queues their Beads. GUPP handles throughput (relentless execution) but not compounding (learning from prior outputs). A GUPP loop that always runs the same task on the same data produces the same output every time. Yuki's loops are GUPP loops WITH compounding — the output becomes part of the input for the next run.
Gap Analysis¶
| System | Loop Mechanism | Compounding | Reads Prior Outputs | Scheduled |
|---|---|---|---|---|
| Yuki AI CEO | n8n + agent sessions | Yes | Yes | Yes (daily/weekly) |
| Kelly | sessions_yield + RALPH + cron | No | No | Yes (cron) |
| Gas Town | GUPP + hook queues + Deacon | No | No (Beads store Why, not learned state) | Yes (hook patrol) |
Kelly gap: Kelly has no equivalent to autonomous compounding loops. A Kelly-style "GUPP loop with compounding" would be a persistent background agent that, on a schedule, pulls its own prior output, reads metrics, and takes the next bounded step. Example: daily build-health check that reads yesterday's CI results and opens issues for failures; weekly KB refresh that reads recent decisions and updates memory.
Gas Town gap: Gas Town's GUPP handles relentless execution but not compounding. Yuki's loops demonstrate that the value of a scheduled loop is proportional to how much it reads its own prior outputs. A GUPP loop that reads its Bead history and adjusts its next action is more valuable than one that doesn't.
Key Insight¶
The Yuki experiment is GUPP made concrete and compounding. Kelly's sessions_yield and Gas Town's GUPP are execution models; Yuki's loops are the first real-world demonstration that scheduled autonomous loops that read their own outputs are the difference between "AI that helps" and "AI that operates."
4. Narrative Memory > Tables vs Kelly's 5-Layer Narrative Memory vs Beads-as-Why¶
Yuki AI CEO Pattern¶
Board Review #3 conducted an explicit experiment: restructured the learnings file from narrative into strict trigger-action tables. Hypothesis: structured should be easier to retrieve. Result: wrong. LLM recall is associative, not indexed. Tables made recall worse. Narrative includes context that looks redundant but functions as a retrieval hook.
Fix: Hybrid approach — tables for lookup, narrative for association. Raw recall improved 7 points. Garry Tan's "fat skills, thin harness" and Andrej Karpathy's wiki-knowledge-base approach both point to markdown-as-memory as the right format.
Kelly Equivalent¶
Kelly's 5-layer memory system already implements this design:
- Layer 1 (soul) — identity, role (small, always loaded)
- Layer 2 (memory) — narrative learnings, insights, decisions (narrative, associative retrieval)
- Layer 3 (memory/YYYY-MM-DD.md) — daily narrative logs (narrative, raw)
- Layer 4 (projects/{id}/context.md) — structured project state (structured)
- Layer 5 (data/.json) — structured data, lookups (structured)
Kelly's design is narrative at the top (layers 2–3), structured at the bottom (layers 4–5). This is exactly the hybrid that Yuki arrived at empirically.
Gas Town Equivalent¶
Beads store Why as a first-class field — the reason a decision was made, not just the decision itself. MEOW's graph structure enables typed edges between Beads: parent/child (decomposition), causal (Bead B created because of Bead A), validated-by (quality attestation). Git stores What/Where/Who/How; Beads store Why. The Beads-as-Why principle captures reasoning as structured data, but the format is field-based, not narrative. Yuki's findings suggest that over-structuring memory (tables) hurts retrieval; Beads risk a similar over-structuring if the reason field is treated as a lookup key rather than a narrative retrieval hook.
Gap Analysis¶
| System | Top Retrieval Layer | Format | Structured Below | Hybrid |
|---|---|---|---|---|
| Yuki AI CEO | Learnings / decisions | Narrative + tables (hybrid) | Tables for lookup | Yes — empirically derived |
| Kelly | memory + daily logs | Narrative | Project context + data files | Yes — architecturally designed |
| Gas Town | Beads (reason field) | Structured fields + typed edges | N/A (single layer) | Partial — structured but with Why as hook |
Kelly validation: The Yuki experiment is the strongest external validation that Kelly's memory architecture is correct. Narrative beats tables for associative LLM retrieval. The 5-layer system (narrative top, structured bottom) is the empirically correct architecture.
Gas Town risk: Beads' structured reason field is a better retrieval hook than tables, but if the reason field is written as a short label ("because SEO priority") rather than a narrative paragraph, it loses the associative retrieval benefit. Yuki's lesson applies: the reason field should include narrative context, not just structured keywords.
Key Insight¶
Three independent systems — Yuki, Kelly, and the Karpathy/Garry Tan research — converge on the same answer: narrative beats tables for LLM memory. Kelly's 5-layer system is validated. Gas Town's Beads-as-Why is on the right track but must resist the temptation to over-index on structure.
5. Progressive Disclosure vs Kelly's Demand-Loaded Layer System vs MEOW Graph¶
Yuki AI CEO Pattern¶
CLAUDE.md shrank 36% (152 → 98 lines) as knowledge was extracted into demand-loaded subfiles. Detailed knowledge moved to docs/, decisions/, learnings/. The main file stayed small and pointed to subfiles when needed. Repo doubled in size (472 → 934 files) but the attention footprint shrank.
Quote from Board Review #3: "Give agents a map, not an encyclopedia." OpenAI's instructional analysis paper calls this progressive disclosure.
Kelly Equivalent¶
Kelly's 5-layer memory system is literally the same pattern:
- Small map at the top (soul, memory) — loads every session
- Large encyclopedia below (memory/.md, projects//context.md, data/.json) — demand-loaded when needed
Kelly's layer system was designed for this. Yuki's experiment validates it empirically.
Gas Town Equivalent¶
MEOW's knowledge graph enables progressive disclosure via graph traversal: a new agent can reconstruct a project's full decision history by traversing the Bead DAG. The graph IS the map; individual Beads are the encyclopedia entries. Typed edges (parent/child, causal) guide which entries are relevant. MEOW's approach is structurally different — the "map" is the traversal path, not a separate small file.
Gap Analysis¶
| System | Map Mechanism | Encyclopedia Mechanism | Attention Footprint |
|---|---|---|---|
| Yuki AI CEO | Small CLAUDE.md (98 lines) | Demand-loaded subfiles (docs/, decisions/, learnings/) | Shrinks as knowledge grows |
| Kelly | Small soul + memory | Demand-loaded memory/.md, projects/, data/ | Shrinks by design (5 layers) |
| Gas Town | Graph traversal path | Individual Beads (accessible via Dolt query) | Controlled by query scope |
No meaningful gaps. All three systems independently arrived at the same progressive disclosure principle. Kelly's 5-layer system and Yuki's CLAUDE.md shrinkage are the same pattern. Gas Town's MEOW graph achieves the same goal through a different mechanism (structured traversal vs file pointers).
Key Insight¶
Progressive disclosure is not a design preference — it's a necessary mechanism for knowledge to compound without causing context overflow. More knowledge doesn't have to mean a bigger attention footprint if the knowledge is demand-loaded, not always-loaded. Yuki's empirical demonstration (CLAUDE.md shrank 36% while repo doubled) is the clearest validation of this principle available.
6. n8n + Agent Sessions Separation vs Kelly's sessions_yield vs cron vs Deacon Patrol¶
Yuki AI CEO Pattern¶
Clear operational separation maintained by design:
- n8n — for tasks needing no reasoning and running on a schedule (email drip, warmup sequences, anything needing a timer without thought)
- Agent sessions — for strategic thinking requiring reasoning (decisions, analysis, recommendations)
Romain's correction when Judy wanted to move everything to n8n: "New tool doesn't mean move everything there."
Kelly Equivalent¶
Kelly's sessions_yield vs cron is the same distinction:
- sessions_yield — for strategic thinking, multi-step reasoning, complex decisions (agent session)
- cron — for scheduled automations that run on a timer without reasoning
Kelly's TaskFlow handles multi-step detached tasks with owner context, state, waits, and child tasks — analogous to n8n workflows but within the agent ecosystem.
Gas Town Equivalent¶
Gas Town's execution model doesn't make this distinction explicitly — GUPP runs everything on hooks. The closest analog is the Deacon patrol daemon: a separate daemon that patrols hooks and handles stuck agents independently of the main agent execution. This is structural separation of mechanical vs strategic work, but at the daemon level rather than the workflow level.
Gap Analysis¶
| System | Mechanical/Scheduled | Strategic/Reasoning | Separation Mechanism |
|---|---|---|---|
| Yuki AI CEO | n8n (external workflow engine) | Agent sessions | Explicit tools — no mixing |
| Kelly | cron (scheduled scripts) | sessions_yield (agent session) | Explicit primitives |
| Gas Town | GUPP hook execution | Main agent work | Implicit (hook model) |
No meaningful gaps. All three systems separate mechanical scheduled work from strategic reasoning work. Yuki's experiment with Romain's correction validates why this matters: mixing strategic and mechanical work in the same execution layer makes both worse.
Key Insight¶
The n8n/agent sessions split is the same architectural decision as sessions_yield/cron. Yuki's Romain gave the clearest articulation: "New tool doesn't mean move everything there." Tasks that need reasoning belong in agent sessions; tasks that don't belong in scheduled automation. Kelly already has this separation — it just needs to be maintained as the Kelly factory scales.
7. Board Reviews as TEA-Equivalent Gate Reviews¶
Yuki AI CEO Pattern¶
Quarterly board reviews (Board Reviews #1, #2, #3) serve as the formal quality gate and strategic checkpoint:
- Board Review #1 (January): Establish identity, repo structure, authority matrix, decision log. Initial autonomy estimate: 15%.
- Board Review #2 (March): Infrastructure gains (24/7 server, n8n, email, screen tracking), mistake log, 30-day decision reviews, per-business CLAUDE.md. Autonomy estimate: 20%.
- Board Review #3 (April): Autonomous loops running in production, CLAUDE.md shrinkage via progressive disclosure, memory experiments, 0 real-time disagreements. Autonomy estimate: implicit increase (loops now run without founder present).
Board reviews are public, version-controlled documents that the founder reads and responds to. They function as a structured gate: the AI presents what it has done, what it has learned, what it can't do yet. The founder responds with feedback and authority expansions.
Kelly Equivalent¶
Kelly's TEA audit (Test, Evaluate, Assess) at the Testing stage is the closest analog — a structured three-phase quality gate with named outputs (tea-summary.md) and explicit gate decisions (PASS / PASS-WITH-FOLLOWUPS / REMEDIATE). Kelly's pipeline stage gates (READY/NOT-READY before Research, PASS/FAIL before Release) also serve a board-review-like checkpoint function. The key difference: TEA is a quality gate for a specific deliverable; Yuki's board reviews are strategic reviews of the entire system's state.
Gas Town Equivalent¶
Gas Town's Witness role serves a continuous quality auditor function — watching all workers, not just at release gates. Gas Town also has Mayor review loops (Bezos-style review where the Mayor surfaces what matters). The closest Gas Town analog to Yuki's board reviews is the Mayor's editorial surfacing to the human — but Gas Town has no formalized periodic strategic review equivalent.
Gap Analysis¶
| System | Gate Mechanism | Frequency | Who Runs It | Output |
|---|---|---|---|---|
| Yuki AI CEO | Board Review | Quarterly | Founder reads; AI writes | Autonomy expansion or reaffirmation |
| Kelly | TEA audit (Test/Evaluate/Assess) | Per pipeline stage (before Release) | test-lead agent | tea-summary.md → PASS/FAIL |
| Gas Town | Witness (continuous) + Mayor editorial | Continuous + as-needed | Deacon/Witness daemons; Mayor | Bead state transitions |
Kelly gap: Kelly's TEA is a quality gate for deliverables, not a strategic review of the system's autonomous growth. Kelly has no equivalent to Yuki's quarterly board review that explicitly assesses how much autonomy the system has earned and what the roadmap to more looks like. Adding a periodic "system autonomy review" — What has each agent earned the right to do independently? What stalled? What should be expanded? — would close this gap.
Gas Town gap: Gas Town has no formalized periodic strategic review. The Mayor's editorial filtering is continuous but not structured as a review with explicit decisions about autonomy expansion.
Key Insight¶
Yuki's board reviews are the TEA audit's strategic counterpart. TEA validates that work is correct; board reviews validate that the system is growing. Both are necessary — TEA for quality, board reviews for autonomy. Kelly needs both.
8. Mistake Log as TEA Audit Equivalent¶
Yuki AI CEO Pattern¶
A public, version-controlled learnings file in the repository — visible to founder, committed to git. Examples of entries:
- "Recommended a platform based on homepage marketing copy. Actual product didn't have the features. Lesson: homepage claims mean nothing. Check actual product pages."
- "Stored learnings in a hidden config directory instead of the repo. Lesson: hidden means unaccountable."
- "Built a forecast with no mechanism to check it against reality. Lesson: every prediction needs a 'what actually happened' section."
- "Copied internal board review draft into public blog without scrubbing. Romain caught it before live."
The mistake log is accountability infrastructure. Making it public and version-controlled means the founder can see what went wrong and what was changed. Hidden means unaccountable.
Kelly Equivalent¶
Kelly's SELF_IMPROVEMENT.md tracks what went wrong and what was changed. memory's "learnings" section captures decisions and lessons. heartbeat occasionally captures what didn't work. The TEA audit's "lessons learned" section serves a similar function. Kelly's mistake tracking is distributed across multiple files and not consistently public/visible.
Gas Town Equivalent¶
Beads capture Why at the work-item level — the reason field on each Bead stores the decision rationale. MEOW's causal edges connect Beads that influenced each other. Gas Town's accountability is structural (every state transition is a git commit with author and timestamp) but diffuse — there's no single "mistake log" file equivalent.
Gap Analysis¶
| System | Mistake Log Location | Public | Version-Controlled | Structured |
|---|---|---|---|---|
| Yuki AI CEO | learnings/ file in repo | Yes | Yes (git) | Narrative (with lessons) |
| Kelly | SELF_IMPROVEMENT.md + memory + TEA lessons | Partial | Yes | Narrative |
| Gas Town | Bead reason fields + causal edges | Yes (Dolt queryable) | Yes | Structured fields |
Kelly gap: Kelly's equivalent would be a visible SELF_IMPROVEMENT.md that tracks what went wrong and what changed, committed to git, readable by the human. Kelly already has this file in principle — it just needs to be more explicitly maintained as a public accountability document, not an internal afterthought.
Key Insight¶
Yuki's lesson — "hidden means unaccountable" — applies directly to Kelly. A mistake log that lives in a hidden config directory or an internal memory file is not accountability infrastructure; it's a journal. Making it public and version-controlled changes its function: it becomes a record the founder can audit, not just a note the agent keeps.
9. Per-Business CLAUDE.md as Pipeline Specialization Equivalent¶
Yuki AI CEO Pattern¶
Board Review #2 introduced per-business CLAUDE.md files: each product has its own local CLAUDE.md with rules specific to that product. Main file stays focused on mission, cadence, and cross-portfolio rules. Rule: main todo stays under 80 lines. Anything more specific goes one level deeper.
This is specialization at the context level: the same AI CEO loads different context depending on which product it's working on. The AI CEO becomes a multi-domain agent by having domain-specific context files.
Kelly Equivalent¶
Kelly's pipeline specialization is at the stage level — research-lead, project-lead, test-lead each handle their phase. Kelly does not have per-project or per-product specialized context files that load when the agent enters that project's scope. Kelly's projects/{id}/context.md files exist but are not CLAUDE.md-equivalent (they describe project state, not agent role and rules for that project).
Gas Town Equivalent¶
Gas Town's Crew members are named, persistent, per-domain agents with accumulated context. A PR Sheriff has different context and capabilities than a DB Sheriff. Crew persistence means they accumulate domain knowledge over time. This is structurally similar to Yuki's per-business CLAUDE.md — the domain specialization is persistent, not re-loaded from scratch each session.
Gap Analysis¶
| System | Specialization Mechanism | Context Persistence | Scope |
|---|---|---|---|
| Yuki AI CEO | Per-business CLAUDE.md files | Yes (loaded on session start per business) | Product-level |
| Kelly | Named lead agents per stage (research-lead, etc.) | Partial (ephemeral sub-agents) | Pipeline stage |
| Gas Town | Named Crew members (PR Sheriff, DB Sheriff) | Yes (persistent per-Rig agents) | Domain-level |
Kelly gap: Kelly's pipeline specialization is horizontal (stage-based); Yuki's per-business CLAUDE.md is vertical (product-based). Kelly should consider per-project context files that function like a project's CLAUDE.md: loaded when entering that project's pipeline scope, containing product-specific rules, priorities, and context that the general soul doesn't cover.
Key Insight¶
Yuki's per-business CLAUDE.md demonstrates that specialization can live in context files rather than in distinct agent instances. The same AI CEO operates in different products by loading different context. Kelly could achieve the same effect by making project CLAUDE.md files first-class — loaded when the router assigns work to a project, not just as passive state files.
10. 30-Day Outcome Reviews as Quality Gate with Feedback Loop¶
Yuki AI CEO Pattern¶
Every decision logged in the decisions/ folder now gets a 30-day outcome review: set a review date when logging the decision, check back a month later, assess: did expected outcome happen? What actually happened? What did we learn?
This is a temporal quality gate — quality assessed not at output time but at outcome time. A decision that seemed correct based on available information might have been wrong; the 30-day review catches this.
Kelly Equivalent¶
Kelly has no equivalent temporal quality gate. TEA audit evaluates work at the testing stage based on available information (tests, evaluation criteria). heartbeat decision follow-up is the closest mechanism — periodic checks on whether prior decisions played out correctly. But Kelly's quality gates are point-in-time (gate review at stage transition), not temporal (outcome review at defined future date).
Gas Town Equivalent¶
Gas Town's Wasteland has the closest analog: the Wanted Board posts work, validators stamp completed work, and the work's quality is attested to by validators over time. A Bead's validated-by edges represent quality attestation, but Gas Town doesn't have a systematic "revisit this decision in 30 days" mechanism.
Gap Analysis¶
| System | Temporal Quality Gate | Mechanism | Feedback Loop |
|---|---|---|---|
| Yuki AI CEO | 30-day outcome review | Decision log with review date; follow-up at 30 days | Yes — actual vs expected |
| Kelly | None (point-in-time only) | TEA audit at release | Partial (heartbeat follow-up) |
| Gas Town | Partial (validated-by edges) | Validator stamps on Beads | No systematic revisit |
Kelly gap: Kelly should add a 30-day outcome review mechanism: when a significant decision is made (logged to memory or project context), set a review date. At review, assess: did expected outcome happen? This closes the loop between decision and result, making the quality gate temporal rather than point-in-time.
Gas Town gap: The Wasteland's stamp economy is a distributed quality attestation system, but it doesn't include temporal revisit. A Bead's validated-by stamp is given at completion time, not at outcome time.
Key Insight¶
30-day outcome reviews represent a fundamentally different quality model: quality is not just "did the work pass the gate?" but "did the decision that led to the work produce the expected result?" This is the most sophisticated quality gate in any of the three systems. Kelly and Gas Town both lack it.
Summary: Concept Cross-Reference Table¶
| Concept | Yuki AI CEO | Kelly | Gas Town | Kelly Gap |
|---|---|---|---|---|
| Brain/memory substrate | GitHub repo + CLAUDE.md + decisions/ | soul + memory + 5-layer system | Beads in Dolt (git-versioned SQL) | Sub-agents need per-agent identity files |
| Authority model | Explicit 3-tier matrix + transfer log | Gate validation (implicit tiers in AGENTS.md) | Role hierarchy (Mayor/Crew/Polecats) | Missing explicit authority matrix with transfer log |
| Autonomous loops | 3 production loops (models, bugs, SEO) + compounding | None (cron/TaskFlow are non-compounding) | GUPP hooks (non-compounding) | Missing autonomous compounding loops |
| Memory format | Narrative > tables (empirical); hybrid | 5-layer (narrative top, structured bottom) | Beads-as-Why (structured Why field) | Kelly design validated; Gas Town risk of over-structuring |
| Progressive disclosure | CLAUDE.md shrunk 36% (152→98 lines) | 5-layer demand-loaded system | MEOW graph traversal | Already aligned — validation of Kelly design |
| Mechanical vs strategic separation | n8n vs agent sessions | sessions_yield vs cron | Implicit in hook model | Already aligned — Kelly design confirmed |
| Periodic quality review | Quarterly board reviews | TEA audit (per-stage, not periodic) | Continuous Witness + Mayor editorial | Missing periodic strategic autonomy review |
| Mistake log | Public, version-controlled learnings file | SELF_IMPROVEMENT.md + memory (partial) | Bead reason fields (diffuse) | Need explicit public/version-controlled mistake log |
| Per-domain context | Per-business CLAUDE.md files | Project context files (not CLAUDE.md-equivalent) | Named Crew members (persistent domain agents) | Need per-project CLAUDE.md-style context files |
| Temporal outcome review | 30-day decision reviews | None | None | Missing 30-day outcome review mechanism |
Combined Architecture: What the Triangle Validates¶
All three systems arriving at the same patterns from independent starting points validates a core thesis: these patterns emerge from practical autonomous agent operation — they are not design preferences but discovered necessities.
Universal Patterns (Emerged in All Three)¶
- Persistent brain as foundational requirement. Yuki built it empirically. Kelly theorized it. Gas Town implemented it as Beads. All three reached the same conclusion: session persistence is not optional.
- Explicit authority tiers with progression. Yuki's three tiers + transfer log. Kelly's gate validation. Gas Town's Mayor/Crew/Polecats. All three separate "decides alone" from "needs human" without leaving it implicit.
- Mechanical/strategic execution separation. n8n vs agent sessions (Yuki). sessions_yield vs cron (Kelly). Hook model vs daemons (Gas Town). All three separate "run on schedule without reasoning" from "strategic work requiring context."
- Narrative-over-structured for LLM retrieval. Yuki's table experiment. Kelly's 5-layer design. Karpathy/Garry Tan research. All three confirm: tables are for lookup, narrative is for association.
Yuki's Unique Contributions (Not in Kelly or Gas Town)¶
- Autonomous compounding loops in production. GUPP existed in Gas Town; compounding loops existed nowhere. Yuki demonstrated the combination in production.
- Authority transfer log. A visible, version-controlled record of what autonomy has been earned. Neither Kelly nor Gas Town has this.
- 30-day outcome reviews. Temporal quality gates that revisit decisions at outcome time, not just at gate time.
- Per-business CLAUDE.md context files. Specialization via context files rather than distinct agent instances.
Synthesis Recommendations for Kelly¶
- Write an explicit authority matrix per agent role with a transfer log tracking what autonomy has been earned. (Yuki's #1 lesson.)
- Add autonomous compounding loops — background agents that read their own prior outputs and compound. Start with one: a weekly KB refresh loop. (Yuki's #3 lesson.)
- Add 30-day outcome reviews — when a significant decision is logged, set a review date; assess actual vs expected at 30 days. (Yuki's #10 lesson.)
- Add per-project CLAUDE.md files — loaded when entering project scope, containing project-specific rules and context. (Yuki's #9 lesson.)
- Maintain the sessions_yield/cron separation — already correct, validated by Yuki's n8n correction. (Yuki's #6 confirmation.)
Related Articles¶
yukicapital-ai-ceo-overview, yukicapital-ai-ceo-experiment, yukicapital-board-review-2, yukicapital-board-review-3, kelly-gas-town-gap-analysis, multi-factory-comparison, steve-yegge-beads, steve-yegge-gupp, steve-yegge-meow, steve-yegge-hierarchy, kelly-handbook-multi-agent, kelly-handbook-software-factory
Concept Links¶
- factory-trap — dark factory comparison: Yuki's three autonomous loops, authority transfer log, and 30-day outcome reviews are dark factory patterns with no equivalent in Kelly or Gas Town
- world-model — the shared cognitive substrate (world.json in SuperAda, GitHub repo in Yuki) is the central convergence point across all three systems
- him-model — Yuki's CEO/founder relationship (Judy Win + Romain) and Kelly's operator/router relationship both implement the HiM cognitive separation pattern
- autonomy-policy-v3 — Yuki's three-tier authority matrix with explicit transfer log maps directly to autonomy-policy-v3's named authority levels (A/B/C/D)
- meta-crons — Yuki's three production loops are meta-crons: scheduled autonomous agents that compound by reading their own prior outputs
- lobster-pipelines — Yuki's 30-day outcome review mechanism and per-business CLAUDE.md specialization parallel lobster-pipelines' typed envelope + resumable state pattern
- isc — Yuki's "decision log with success criteria" and Kelly's TEA audit both implement ISC-equivalent: binary-testable, result-stated criteria
- 7-agent-crew-topology — SuperAda's 7-agent Star Trek crew topology vs Kelly Router's 3-agent crew is the key topology comparison in this gap analysis