Kelly vs Gas Town — Full Gap Analysis¶
Assessor: Carson (dark-factory-kb subagent)
Date: 2026-04-26
Sources: Kelly handbook (Ch7 multi-agent, Ch11 factory), Kelly tweets, steve-yegge-gas-town, steve-yegge-beads, steve-yegge-gupp, steve-yegge-meow, steve-yegge-hierarchy, steve-yegge-wasteland, steve-yegge-gas-city, steve-yegge-saas-mountain, steve-yegge-beads-kelly-gap
Executive Summary¶
Steve Yegge's Gas Town system and Kelly's factory methodology are two independent inventions of the same underlying pattern: autonomous multi-agent teams executing structured work pipelines with quality gates, audit trails, and hierarchical role assignment. Both systems arrived at the same conclusions from different starting points — Yegge from decades of Google/Amazon/Grab engineering, Kelly from AI agent experimentation and the Software Factory handbook.
This gap analysis systematically compares every major Gas Town concept against its Kelly equivalent. The goal: identify where Kelly has full parity, partial coverage, or missing functionality — and assess the adoption potential for each Gas Town innovation the operator could import into his Kelly factory.
Overall verdict: Kelly has strong structural parity with Gas Town in pipeline architecture, role hierarchy, and quality gates. The major gaps are in the substrate (Beads/Dolt vs file-based tracking), the execution model (GUPP's relentless hooks vs Kelly's yield-friendly autonomous continuation), and the ecosystem (Gas City SDK vs project-specific pipeline, Wasteland vs closed factory). The highest-value near-term adoption: Beads as the audit/TEA substrate, GUPP-inspired hook enforcement, and the Mayor's information-filtering role.
1. Beads¶
vs pipeline state, done markers, TEA audits, memory
Kelly Equivalent¶
Kelly tracks work state across four separate mechanisms:
- pipeline state — central JSON file tracking current stage, subphase, timestamps
- done markers per subphase — text markers signaling completed subphases
- TEA audit — structured narrative capturing Thought-Event-Action reasoning
- memory + memory/YYYY-MM-DD.md — long-term curated memory and daily logs
A prior assessment (steve-yegge-beads-kelly-gap) evaluated each mechanism against Beads. The summary finding: done markers map naturally to Bead state transitions (adopt); TEA audit is the strongest semantic match (adopt with schema design); pipeline state needs a JSON view on top of Beads (partial adopt); memory requires the most careful human-interface design (partial adopt).
Gap?¶
Partial. Kelly has file-based equivalents for all Bead functions, but none of them are unified. Beads would replace four separate mechanisms with a single git-versioned, SQL-queryable substrate. The semantic coverage is there; the architectural integration is not.
What Gas Town Does Better¶
- Unified substrate. Gas Town's entire state — work items, quality gates, messages, patrol routes — lives in Beads/Dolt. Kelly's four mechanisms are siloed files that can't be queried together.
- Git-backed every action. Beads are git-versioned by default. Every state transition is a commit with author, timestamp, and reason. Kelly's done markers and pipeline state are append-only files with no structural immutability guarantee.
- Cross-pipeline queries. With Dolt, you can run: "Show me all testing-stage Beads that failed in the last 48 hours across all projects" — one SQL query. Kelly requires grep across project directories.
- Branch and experiment. Gas Town can branch a Bead sequence to try an alternative approach, test it, then merge or discard. Kelly has no equivalent — the pipeline is linear.
- Bead-as-Why completeness. Gas Town explicitly frames Beads as "The Missing Why" — git stores What/Where/Who/How, Beads store Why. Kelly's TEA audit captures Why but as a narrative sidecar, not as the primary work primitive.
What Kelly Does Better¶
- Human readability. pipeline state and done markers are plain text — a human can read them in any editor without special tooling. Beads require Dolt queries or a Beads viewer.
- Simplicity of deployment. Setting up Dolt for Beads is infrastructure work. Kelly's file-based mechanisms work immediately —
echo "DONE" >> memoryrequires no server. - Graceful degradation. If the Dolt instance goes down, Gas Town stops. Kelly's file-based pipeline keeps working (just less queryable).
- Immediate persistence. Files written to disk are durable immediately. Dolt requires a running database with transaction logging.
Adoption Potential¶
High. Beads are the most impactful single architectural upgrade to the Kelly pipeline. The TEA audit and done markers are the immediate adoption targets; pipeline state can follow once the Beads substrate is stable. The main risk is Dolt infrastructure — mitigate by deploying Dolt as a local instance first, not a hosted service.
2. GUPP¶
vs sessions_yield, RALPH protocol, autonomous continuation
Kelly Equivalent¶
Kelly's autonomous continuation is implemented through:
- sessions_yield — parent agent yields control while sub-agent executes; parent resumes when sub-agent completes or the session is explicitly continued
- RALPH protocol (Retry And Learn Protocol) — 3 max retries with diagnostic passing between attempts; "same error twice = escalate immediately"
- Autonomous sub-agent execution — spawned agents run to completion without requiring the parent to poll
Kelly's model is cooperative multitasking: agents yield and resume, with RALPH providing the retry/escalation backbone. It works, but it's not enforced at the architectural level — a stuck sub-agent that never returns will stall the parent indefinitely unless the parent has explicit timeout logic.
Gap?¶
Full. Kelly has no equivalent to GUPP's absolute "must-run-when-hook-is-non-empty" rule. Kelly's autonomous continuation uses sessions_yield + RALPH timeout enforcement (3 retries, same error twice = immediate escalate) — providing hook-like guarantees without a formal hook mechanism.
What Gas Town Does Better¶
- No yielding. GUPP eliminates the "yielding is contagious" failure mode. If Agent A yields while waiting for Agent B, and Agent B yields while waiting for Agent C, the pipeline stalls. GUPP's rule ("if your hook is non-empty, you MUST run") eliminates this entire class of failure.
- External enforcement. The Deacon daemon patrols all hooks and kills agents that have had non-empty hooks for too long. This is external enforcement — it doesn't rely on the agent's own discipline. Kelly's heartbeat requires the agent to actively update it; the Deacon actively patrols.
- Relentless throughput. GUPP maximizes worker utilization. As long as a hook has work, the agent is running. Kelly's yield model introduces idle time between sub-agent spawns.
What Kelly Does Better¶
- Context preservation on yield. When Kelly's router yields to a sub-agent, it preserves full session context. The sub-agent gets a complete working environment. GUPP's hook model is lighter-weight but may not preserve as rich a context across the yield boundary.
- Structured retry with RALPH. RALPH's "same error twice = escalate immediately" is more nuanced than GUPP's blunt timeout-and-requeue. RALPH passes diagnostics between retries, so a retrying agent knows why it failed. GUPP just re-queues the Bead for another agent.
- No infrastructure dependency. GUPP requires a hook management system and a Deacon daemon. Kelly's sessions_yield works with the agent's native session management — no additional infrastructure.
- Graceful degradation under load. GUPP assumes all agents are actively running when hooks are non-empty. Under heavy system load, forcing all agents to run can create resource contention. Kelly's cooperative model can be throttled more gracefully.
Adoption Potential¶
Medium-High. GUPP's hook-based execution model is a genuine improvement over Kelly's yield-friendly continuation, particularly for high-throughput factory scenarios. the operator should consider adding a GUPP-inspired hook enforcement layer: when a sub-agent is spawned, its work item goes on a hook, and the parent can rely on guaranteed execution within a timeout window. However, full GUPP adoption (Deacon daemon + hook queues) requires infrastructure that Kelly doesn't currently have. A lightweight approximation — explicit timeout enforcement on sub-agent spawning with automatic re-spawn — is achievable without the full hook infrastructure.
3. MEOW¶
vs Kelly's knowledge graph / structured memory
Kelly Equivalent¶
Kelly has two knowledge management layers:
- memory — curated long-term memory, distilled learnings, significant decisions. Read in main session, human-editable.
- memory/YYYY-MM-DD.md — daily raw logs of session events and context. Append-only per session.
These are narrative, text-based, and human-written. They are human-readable and human-writable, but not machine-queryable in any structured sense. The closest Kelly gets to structured knowledge is the TEA audit narrative and the pipeline state's machine-readable stage metadata.
Gap?¶
Full. Kelly has no knowledge graph. memory and daily logs are a well-written journal; MEOW's Dolt-backed graph is a structured database. The difference is architectural: one preserves knowledge in narrative form (rich but opaque to queries), the other in typed, queryable form (structured and powerful but less narrative).
What Gas Town Does Better¶
- Machine-queryable knowledge. "Show me all TEA audits where the reason field mentions 'security'" — one SQL query in MEOW; grep across narrative files in Kelly. As the knowledge base grows, this gap widens.
- Typed edges between knowledge nodes. MEOW's Bead graph has typed relationships: parent/child (decomposition), causal (Bead B created because of Bead A), validated-by (quality attestation). Kelly's cross-references are free-form links in narrative text — powerful when written, but not structurally enforceable.
- Versioned knowledge graph. Any point-in-time snapshot of the MEOW graph is queryable. "What did we know about this system on March 15th?" is a Dolt query. Kelly's daily logs are chronological but not versioned in the same sense.
- Onboarding via graph traversal. A new agent can reconstruct a project's full decision history by traversing the Bead DAG. Kelly's onboarding requires reading daily logs in chronological order — more narrative, less efficient for targeted discovery.
- Shared knowledge across agents. MEOW's graph is shared across all agents writing to the same Dolt instance. Kelly's memory is per-agent (each agent has its own session memory) unless explicitly shared.
What Kelly Does Better¶
- Narrative richness. memory entries can include opinions, emotions, context, and free-form associations that don't fit a structured schema. MEOW's typed Bead fields capture "why" but not the full voice of a human writing in a journal.
- Human write path. Humans can write directly to memory. MEOW requires a Beads CLI or API — there's no equivalent of
echo "decided to use Postgres" >> memory. The human write path in Kelly is immediate; in Gas Town it requires tooling. - No infrastructure dependency. memory is a text file. MEOW requires Dolt running with a Beads schema. If Dolt is unavailable, MEOW's knowledge is inaccessible; Kelly's memory is always available.
- Free association. memory entries can reference each other in natural, unstructured ways ("see also March 15th"). MEOW's typed edges require explicit authoring of the relationship — you can't accidentally create an implicit association.
Adoption Potential¶
Medium. MEOW is a powerful long-term target for the operator's Kelly factory, but it's a larger migration than Beads for state tracking. The recommendation from the prior Beads assessment holds: adopt Beads as the TEA audit substrate first (where the semantic match is strongest), then extend to memory as the interface matures. Full MEOW knowledge graph adoption requires Dolt infrastructure + a human-usable write interface + a migration strategy for existing memory files. High value, but phased investment.
4. Mayor / Crew / Polecats¶
vs Router / sub-agents / named agents
Kelly Equivalent¶
Kelly's agent hierarchy:
- Router (main agent) — orchestrates, routes to sub-agents, validates gates, escalates. Never executes work itself.
- Named lead agents — research-lead, project-lead, test-lead. Each is a named specialist that spawns their own sub-agents.
- Sub-agents — unnamed, ephemeral, spawned for specific tasks. Run to completion and die.
This maps loosely to Gas Town's tier structure: Router ≈ Mayor (orchestrator), named lead agents ≈ Crew (named, persistent, domain-specialist), sub-agents ≈ Polecats (ephemeral workers). But the mapping is imprecise.
Gap?¶
Partial. Kelly has the named lead agent concept (Crew equivalent) and the ephemeral sub-agent concept (Polecat equivalent). The gap is in the Mayor role specifically — Kelly's router is primarily a routing mechanism, not an editorial information filter for the human. Gas Town's Mayor is explicitly a chief-of-staff who reads all agent output and surfaces only what matters. Kelly's router does not play this role.
What Gas Town Does Better¶
- Mayor's information filtering. Yegge identifies the Mayor as "the killer feature" — not observability dashboards, not activity feeds, just less reading. The Mayor reads all the agent babble so the human doesn't have to. Kelly's router does not explicitly address the information overload problem at scale (20–30 concurrent agents).
- Named, addressable Crew. Gas Town's Crew members are named, persistent, and addressable by the human ("ask the PR Sheriff to..."). Kelly's named lead agents are defined in AGENTS.md but are not persistently addressable mid-session in the same way — they exist as role definitions, not as persistent agent identities with accumulated context.
- Crew persistence across sessions. Gas Town's Crew members maintain context across sessions via Beads. Kelly's lead agents are re-spawned per session; their accumulated context lives in memory files, not in a persistent agent identity.
- Mayor-as-control-plane clarity. Gas Town explicitly positions the Mayor as the human's control plane — the single interface for intent and information. Kelly's router is one component of the pipeline, not the human's primary interface in the same way.
What Kelly Does Better¶
- Explicit pipeline stages. Kelly's pipeline (Intake → Research → Planning → Implementation → Testing → Release) is explicit and enforceable. Gas Town's workflow is more fluid — the Mayor's editorial judgment drives what happens next. For regulated or auditable environments, Kelly's explicit stages are an advantage.
- Gate validation before stage advance. Kelly requires explicit READY/NOT-READY or PASS/FAIL gate decisions before advancing stages. Gas Town's quality gates (Witness) are more implicit in the workflow.
- AGENTS.md as executable contract. Kelly's AGENTS.md defines agent roles, triggers, capabilities, output directories, and escalation protocols — as both documentation and executable instruction. Gas Town's role definitions are more informal.
- RALPH escalation protocol. Kelly's router has an explicit 3-retry → escalate protocol for sub-agent failures. Gas Town's Deacon handles stuck agents but doesn't have the same diagnostic-passing retry logic.
Adoption Potential¶
High for Mayor-equivalent adoption, Medium for Crew/Polecats. the operator's Kelly factory would benefit most from adding a Mayor-equivalent information filtering role — an agent whose job is to read sub-agent output and surface only what matters for the human. This addresses the information overload problem as agent count scales. The Crew/Polecat concepts are already partially present in Kelly (named lead agents / ephemeral sub-agents); refinement rather than reinvention is appropriate.
5. Deacon / Boot / Witness / Refinery¶
vs heartbeat, subagent spawning, QA, routing
Kelly Equivalent¶
Kelly distributes these functions across existing mechanisms:
- Heartbeat / liveness — heartbeat check-in mechanism; agents periodically update a file with current activity and timestamp. Detects stuck agents by absence of updates.
- QA / quality gates — test-lead agent runs the TEA (Test, Evaluate, Assess) audit before release. Kelly also has the "Angry Mob" / 5-agent verdict pattern for adversarial review.
- Routing — Kelly Router handles intent parsing, work decomposition, and sub-agent spawning.
- Sub-agent spawning — JSON tool call with label, task, and output path. List/steer/kill controls.
Kelly does not have dedicated daemon roles corresponding to Deacon, Boot, Witness, or Refinery. These functions are either handled by agents (TEA audit by test-lead) or not explicitly handled (Deacon's stuck-worker cleanup, Boot's heartbeat offload).
Gap?¶
Partial. Kelly has equivalents for the outcomes of these roles (liveness detection, quality gates, routing) but not for the architectural separation of these functions into dedicated daemons. The gap is in isolation, observability, and enforcement.
What Gas Town Does Better¶
- Deacon's structural liveness enforcement. heartbeat is a file that agents must actively update. If an agent crashes, it stops updating — but the absence is only detected when something else checks. The Deacon is a daemon that actively patrols all hooks and detects stuck workers structurally (hook age, not file timestamp). This is more robust than agent-managed heartbeat files.
- Boot isolating heartbeat traffic. Separating heartbeat handling into a dedicated daemon (Boot) means the Deacon isn't interrupted by routine liveness pings. Kelly's heartbeat is distributed across all agents — if heartbeat volume increases, it competes with work-processing resources.
- Witness as dedicated quality gate daemon. Having a dedicated Witness daemon that watches all workers means quality checks run continuously, not just at the TEA stage. Kelly's quality gate is a stage-gate (TEA audit before release), not continuous. For some quality concerns (style violations, simple correctness checks), continuous watching is more efficient.
- Refinery's explicit intent decomposition. Refinery converts vague epics into well-specified Bead sequences before polecats execute. Kelly's equivalent (Intake → Planning) is more heavyweight — it creates project directories and multiple artifacts. Refinery is more lightweight and can be applied to smaller units of work.
What Kelly Does Better¶
- TEA audit depth. Kelly's TEA (Test, Evaluate, Assess) is a structured, three-phase quality audit with named outputs (tea-summary.md) and a defined gate decision (PASS / PASS-WITH-FOLLOWUPS / REMEDIATE). Gas Town's Witness is a quality auditor but the specific audit methodology is less formalized than TEA.
- Multi-agent adversarial review (5-agent verdict). Kelly's Angry Mob / 5-agent verdict is a more robust adversarial review mechanism than Gas Town's single Witness. Five agents independently reviewing work and reaching consensus is statistically more reliable than a single Witness.
- Pipeline stages contain the roles. Kelly's pipeline stages implicitly contain the functions of Refinery (Planning), Witness (TEA), and routing (Router). Gas Town's roles are separate daemons that must be explicitly coordinated. Kelly's approach is more self-documenting.
- Explicit human approval at release. Kelly's Operator Decision (SHIP / NO-SHIP) at the release gate is an explicit human-in-the-loop checkpoint. Gas Town doesn't have an equivalent explicit human approval before release — it's more implicit in the Mayor's judgment.
Adoption Potential¶
Medium. The highest-value adoption from this group is the Deacon's structural liveness enforcement — replacing Kelly's file-based heartbeat with a daemon that actively patrols running agents. This is a targeted addition that doesn't require adopting the full Gas Town daemon model. Witness (continuous quality watching) is valuable but more complex to integrate. Refinery's lightweight intent decomposition could enhance Kelly's Intake/Planning stages. TEA and the 5-agent verdict already give Kelly strong quality coverage.
6. Wasteland / Reputation Economy¶
vs Kelly's multi-factory vision
Kelly Equivalent¶
Kelly's factory model includes:
- Autonomous companies — independent factories with their own agents, trading work in a market
- Factory marketplace — work is bid on by autonomous companies; reputation determines contract size and criticality
- Reputation tiers — established reputation earns larger/more critical contracts
This is described in the Kelly tweets and handbook, but it is a conceptual model, not an implemented protocol. Kelly has not built a concrete Wasteland-equivalent with git/Dolt-backed reputation portability.
Gap?¶
Full. Kelly has the conceptual framework for a multi-factory reputation economy but no implementation. The Wasteland is the concrete realization of what Kelly describes abstractly. The gap is in the substrate (git/Dolt-based work and reputation tracking) and the network effects (federated trust across multiple organizations).
What Gas Town Does Better¶
- Concrete protocol with git/Dolt backing. The Wasteland's Wanted Board, stamps, and trust ladder are a fully-specified protocol with Dolt as the backing store. Kelly's factory marketplace is a design concept without a specified implementation substrate.
- Yearbook rule enforcement. The Wasteland's rule ("you can't stamp your own work") is enforced structurally — stamps are authored by the validator, not the worker. Kelly's third-party verification concept doesn't have an equivalent structural enforcement.
- Multi-dimensional stamps. Wasteland stamps track quality, reliability, and creativity as separate axes. Kelly's reputation model treats trust as a single dimension (trustworthy or not).
- Pre-seeded reputation. The Wasteland is pre-seeded with GitHub's top 10,000 contributors, eliminating the cold-start problem. Kelly's factory has no equivalent pre-seeded reputation baseline.
- RPG framing for non-technical adoption. Yegge's "character sheet" framing makes the reputation economy intuitive for non-technical participants. Kelly's autonomous company model is more abstract.
- Federated schema portability. Wasteland trust is portable across team, company, university, and open-source federations. Kelly's cross-factory reputation requires manual verification.
What Kelly Does Better¶
- Multi-factory marketplace concept. Kelly's factory marketplace model is more explicitly competitive — autonomous companies bid on work, and the market determines price and allocation. The Wasteland is more cooperative (Wanted Board posting → claiming) than competitive bidding.
- Contract size scaling. Kelly's reputation tiers explicitly scale contract size and criticality — established factories get larger, more important work. The Wasteland's trust ladder (Registered → Contributor → Maintainer) scales reputation weight, but contract allocation is less explicitly tied to it.
- Conceptual depth. Kelly's autonomous company model is more thoroughly theorized (including internal organization, profit/loss, scaling team size) than the Wasteland's more operational framing.
Adoption Potential¶
Low (near term), High (long term). The Wasteland requires network effects to function — a single Kelly factory can't realize the value of a federated reputation economy without multiple participating factories. For the operator's near-term use case (a single Kelly factory), the Wasteland is not immediately applicable. However, if the operator plans to scale to a multi-factory ecosystem, the Wasteland's protocol design (Dolt-backed stamps, federated trust ladder) is the right foundation. The multi-dimensional stamp concept (quality/reliability/creativity) could be adopted internally even without federation.
7. Gas City SDK¶
vs Kelly's pipeline-as-framework
Kelly Equivalent¶
Kelly's factory is a structured six-stage pipeline:
- Intake → Research → Planning → Implementation → Testing → Release
- Each stage has defined artifacts, gates, and specialized agents
- Defined in AGENTS.md, executed via the Kelly Router
This is a pipeline, not an SDK. It is project-specific and opinionated — the six stages are the factory. Gas City is a toolkit for building arbitrary factories; Kelly is a specific factory implementation.
Gap?¶
Partial. Kelly doesn't have a Gas City equivalent — a composable SDK of building blocks that can be assembled into custom factories. However, Kelly's BMAD agent definitions (kelly-tweets-bmad) provide a degree of composability: modular agent specs that can be mixed and matched. The gap is in the pack model and the SDK extensibility.
What Gas Town Does Better¶
- Composable packs. Gas City allows building custom packs (collections of agent roles, workflows, Bead types, and configuration) that compose with other packs. Kelly's pipeline is a fixed six-stage sequence — adding a new stage or modifying the workflow requires modifying the pipeline itself.
- SDK for building custom factories. Gas City is MIT-licensed, community-driven, with a Discord of thousands building packs for different domains. Kelly's framework is a single implementation — there's no SDK layer for building custom Kelly-style factories.
- Drop-in Gas Town pack. Gas City's Gas Town pack ships as the default, making it a drop-in replacement for existing Gas Town users. Anyone can build a new pack that redefines the factory. Kelly has no equivalent extensibility mechanism.
- 11 Stages of AI Adoption as a progression model. Gas City's leveling framework (Level 8: build your own orchestrator → Level 11: factory builder) provides a clear maturity model for adopters. Kelly's factory evolution (v1→v3) is less formalized as an adoption progression.
What Kelly Does Better¶
- Opinionated defaults with proven structure. Kelly's six-stage pipeline is a specific, proven factory design. Gas City's pack model is more powerful but requires more design decisions — Kelly's opinionated defaults reduce the burden on adopters who just want a working factory.
- Pipeline stages as a mental model. The Intake → Research → Planning → Implementation → Testing → Release sequence is immediately intuitive to most developers. The pack model is more powerful but less immediately comprehensible.
- AGENTS.md as executable specification. Kelly's AGENTS.md is both documentation and executable instruction — it defines the factory concretely enough to run. Gas City's packs are declarative but less directly tied to the running system.
- TEA audit and gate validation are first-class. Kelly's TEA audit and stage-gate validation are integral to the pipeline, not optional add-ons. Gas City's quality gates (Witness) are more composable but can be omitted.
Adoption Potential¶
Medium. the operator should consider Gas City's pack composition principles — specifically the idea that agent roles, workflows, and Bead types should be composable modules rather than a fixed pipeline. This could inspire a Kelly "factory SDK" that separates the pipeline framework from project-specific configurations. However, full Gas City SDK adoption would require a significant architectural shift from Kelly's pipeline model to Gas City's pack model. The more practical near-term adoption is applying Gas City's Light Factory observability principles to make Kelly's pipeline more transparent.
8. SaaS Mountain Escape¶
vs Kelly's positioning
Kelly Equivalent¶
Kelly's critique of monolithic tooling is implicit in:
- Autonomous companies build their own tooling — rather than buying SaaS, factories build exactly what they need
- 80/20 rule for automation — most of what SaaS does is routine; the interesting work is the exception
- Autonomous computer control — agents need direct system access, not UI-mediated API wrappers
Kelly's value proposition is: autonomous agents executing a structured pipeline produce software faster and cheaper than traditional human-driven development. The implicit comparison is to human developers using off-the-shelf tools.
Gap?¶
None — partial framing difference. Kelly has the same underlying thesis as Yegge's SaaS Mountain critique, but the framing differs. Yegge's SaaS Mountain is a named, coherent critique that resonates with non-technical stakeholders. Kelly's critique is more implicit and technical.
What Gas Town Does Better¶
- Named, accessible critique. "SaaS Mountain" is a vivid, named metaphor that immediately communicates the thesis to non-technical audiences. Kelly's critique of monolithic systems is more diffuse across tweets and handbook sections.
- Concrete $30k SaaS replacement anecdote. Yegge's story of a non-technical staffer rebuilding a $30,000/year SaaS in Gas Town is a compelling, concrete proof point. Kelly's 66 apps in a weekend (kelly-tweets-business-metrics) is more impressive but more technical.
- "Escape vehicle" framing. Gas City as the escape vehicle from SaaS Mountain is a clear product narrative. Kelly's factory is a methodology, not a product — the narrative framing is less sharp.
- 3-crack analysis. Yegge's three specific cracks in SaaS Mountain (only use 20%, not agent-native, not composable) give practitioners a diagnostic tool. Kelly's critique is more about "build over buy" without the same structural breakdown.
What Kelly Does Better¶
- Agent-native by design from the start. Kelly's factory was conceived for AI agents from the beginning — it's not a migration from human-tools. Gas Town started as a developer tool (IDE for 2026) and evolved into an agent orchestrator. Kelly's agent-native positioning is more fundamental.
- Structured pipeline value. Kelly's concrete value proposition (6-stage pipeline → shipped software with TEA audit) is more tangible than Gas City's more general "escape from SaaS Mountain." For a practitioner choosing a methodology, Kelly's structured outputs are easier to evaluate.
- Business results. Kelly's tweets show concrete business results: first revenue day, $2K/day, $4K contracts, 66 apps in a weekend. These are quantifiable outcomes that SaaS Mountain's narrative can't match.
Adoption Potential¶
N/A. This is a positioning and narrative comparison, not a technical feature. the operator should adopt Yegge's SaaS Mountain framing when communicating the value of the Kelly factory to non-technical stakeholders — it's a more accessible narrative. The technical comparison is covered by the other sections.
9. Multi-Agent Adversarial Reliability¶
vs Kelly's TEA audit / QA gates
Kelly Equivalent¶
Kelly has two adversarial reliability mechanisms:
- TEA audit (Test, Evaluate, Assess) — structured quality gate before release, run by the test-lead agent
- 5-agent verdict / Angry Mob — multiple agents independently reviewing work and reaching consensus
Kelly's multi-agent review is well-established. The TEA audit is a formal three-phase gate; the Angry Mob is used for adversarial testing (e.g., 5 agents independently test the same implementation and compare results).
Gap?¶
None. Kelly has genuine adversarial multi-agent review. The 5-agent verdict is arguably more robust than Gas Town's single Witness — five agents independently reaching consensus is statistically more reliable than a single quality auditor.
What Gas Town Does Better¶
- Witness as continuous auditor. Gas Town's Witness watches all workers continuously, not just at the release gate. Quality issues are caught as they occur, not retrospectively. Kelly's TEA is a stage-gate (before release), which means quality issues discovered late are more expensive to fix.
- Witness as a dedicated daemon role. Making Witness a dedicated daemon role means it can't be skipped under time pressure. Kelly's TEA is a pipeline stage — under deadline pressure, there's temptation to shorten or skip it.
- Two-agent watching each other as baseline. Gas City's "never deploy a single agent for real business processes" principle formalizes what Kelly does empirically. The framing makes it a design principle, not an emergent practice.
What Kelly Does Better¶
- 5-agent verdict vs single Witness. Kelly's 5-agent adversarial verdict is statistically more reliable than Gas Town's single Witness. Five independent agents reaching consensus catches failure modes that a single auditor misses.
- TEA's explicit three-phase structure. Test → Evaluate → Assess is a more thorough audit than a single Witness review. The Evaluate phase specifically covers non-functional requirements (performance, security) — aspects that a single Witness might deprioritize.
- TEA output as named artifact. Kelly's tea-summary.md with explicit gate decision (PASS / PASS-WITH-FOLLOWUPS / REMEDIATE) is a structured, auditable output. Gas Town's Witness validation results are less formally specified.
- Human operator decision at release. Kelly's SHIP / NO-SHIP operator decision at the release gate is an explicit human-in-the-loop checkpoint. Gas Town doesn't have an equivalent — the Mayor's editorial judgment is the final call.
Adoption Potential¶
Low. Kelly already has stronger adversarial coverage than Gas Town (5-agent verdict > single Witness) and a more formal quality audit structure (TEA's three phases). The only improvement Kelly should consider from Gas Town is making the Witness role continuous rather than a stage-gate — adding lightweight, continuous quality watching alongside the existing TEA stage-gate.
10. Light Factory / Observability¶
vs Kelly's pipeline state / observability
Kelly Equivalent¶
Kelly's observability mechanisms:
- pipeline state — machine-readable pipeline state (current stage, subphase, timestamps)
- done markers — human-readable subphase completion signals
- heartbeat — agent liveness and current activity
- TEA audit logs — quality gate results
- memory/YYYY-MM-DD.md — daily session logs
Kelly's observability is file-based and retrospective — you read files to understand the state. It's functional but not observability-first in the Gas City sense.
Gap?¶
Partial. Kelly has all the observability data — it's just not structured for real-time querying. Gas Town's Light Factory framing makes observability a first-class architectural concern, not an afterthought.
What Gas Town Does Better¶
- Light Factory as architectural framing. Gas City's "Light Factory" framing — maximize observability, all workers visible and addressable, polecats in the back rooms being the only normally-invisible ones — makes observability a design principle. Kelly's pipeline is functional but doesn't have this framing.
- Real-time hook state. Gas Town's hook state is queryable in real time via Dolt. Kelly's pipeline state requires polling the file. As agent count scales, real-time querying becomes more valuable.
- Dolt SQL observability. Being able to run SQL queries against the work ledger means observability dashboards can be built on top of existing tooling (Dolt SQL interface), not custom file parsers. Kelly's observability requires custom scripts to read JSON/log files.
- Worker corpses visible. Gas Town's Deacon explicitly tracks stuck workers. Kelly's heartbeat shows agents that have checked in recently, but a crashed agent that hasn't been detected looks identical to a healthy agent between heartbeats.
What Kelly Does Better¶
- Human readability of pipeline state. pipeline state in an editor is immediately comprehensible. Dolt queries require either a SQL interface or a Beads viewer — additional tooling for human inspection.
- Simpler stack. Kelly's observability stack is just files. Gas City's observability stack requires Dolt running, Beads schema defined, and SQL queries to inspect. For small-scale deployments, the file-based approach has lower operational overhead.
- heartbeat is self-contained. A single heartbeat file contains all agent liveness. Gas Town's observability is distributed across multiple Bead types — heartbeat Beads, state Beads, quality Beads — requiring joins to reconstruct a full picture.
- done markers are grep-able.
grep -r "DONE" memory/finds completed subphases immediately. Beads state transitions require Dolt queries.
Adoption Potential¶
Medium. the operator should add a Gas City-inspired observability layer to his Kelly factory: deploy Dolt for Beads (covering TEA, done markers, and pipeline state), then build SQL-based observability queries on top. This doesn't require adopting the full Light Factory framing — just adding real-time querying to the existing Kelly observability data. The gain is significant: cross-project SQL queries, git-backed audit trails, and structured liveness enforcement.
Concepts Kelly is Missing Entirely¶
The following Gas Town concepts have no Kelly equivalent at all:
GUPP (Full Execution Persistence)¶
Kelly's agents yield and resume. GUPP's "if your hook is non-empty, you MUST run" has no architectural equivalent. A Kelly agent that yields indefinitely will stall the pipeline; GUPP eliminates this by design. Recommendation: Add explicit timeout enforcement on sub-agent spawning with automatic re-spawn. This is a lightweight approximation of GUPP that doesn't require the full hook infrastructure.
Wasteland (Federated Reputation Economy)¶
Kelly's autonomous company marketplace is a design concept without a concrete protocol implementation. The Wasteland's git/Dolt-backed Wanted Board, multi-dimensional stamps, and trust ladder are the concrete realization. Recommendation: Not immediately applicable (requires network effects), but study the Wasteland's protocol design for future multi-factory scenarios.
Beads-as-Why (The Missing Why as First-Class Data)¶
Beads' core insight — git stores What/Where/Who/How, Beads capture Why — has no Kelly equivalent. Kelly's TEA audit captures reasoning, but as a narrative sidecar, not as the primary work primitive. Recommendation: Begin TEA schema design to map current audit fields to Bead fields. This is the highest-value missing concept to adopt.
MEOW Work Primitives (Work as First-Class System Primitive)¶
In MEOW, everything is Work: knowledge, coordination, communication, reputation. Kelly distinguishes between work items (pipeline tasks), memory (knowledge), and quality gates (coordination) — they are separate mechanisms. Recommendation: Adopt Beads as the universal substrate so that all Kelly mechanisms (pipeline state, memory, QA, heartbeats) can eventually be queried together.
Boot Daemon (Heartbeat Traffic Isolation)¶
Boot handles heartbeats so the Deacon isn't interrupted. Kelly has no equivalent heartbeat isolation — heartbeat traffic is distributed across all agents and competes with work processing. Recommendation: Consider a dedicated heartbeat handler for high-agent-count deployments.
Pack Composition Model¶
Gas City's packs (composable agent role + workflow + Bead type bundles) have no Kelly equivalent. Kelly's factory is a fixed six-stage pipeline. Recommendation: Investigate whether Kelly's BMAD agent definitions could be extended to a pack-like composability model for future extensibility.
Concepts Gas Town is Missing¶
The following Kelly concepts have no Gas Town equivalent:
Explicit Pipeline Stages¶
Kelly's six-stage pipeline (Intake → Research → Planning → Implementation → Testing → Release) is explicitly defined with named artifacts, gates, and entry/exit criteria per stage. Gas Town's workflow is more fluid — the Mayor and Refinery drive what happens next, but there's no equivalent stage structure. Implication: Gas Town is more flexible but less auditable; Kelly is more structured but potentially less adaptable to non-linear work.
TEA Audit (Three-Phase Quality Gate)¶
Kelly's TEA (Test, Evaluate, Assess) is a formal three-phase audit with named outputs and explicit gate decisions. Gas Town's Witness is a single quality auditor without the same phase structure. Implication: Kelly's quality gates are more thorough and more auditable; Gas Town's are more lightweight and continuous.
Human-in-the-Loop Checkpoints¶
Kelly's SHIP / NO-SHIP operator decision at the Release gate is an explicit human approval before deployment. Gas Town has no equivalent — the Mayor's editorial judgment is the final call. Implication: Kelly is safer for production deployments where human accountability is required; Gas Town moves faster but with less human oversight.
Structured Handoff Protocol¶
Kelly's artifact directory structure (research-artifacts/, planning-artifacts/, etc.) with summary.md gate files provides a structured handoff between pipeline stages. Downstream agents read the summary.md before proceeding. Gas Town's Bead handoffs are more implicit. Implication: Kelly's handoffs are more explicit and auditable; Gas Town's are more flexible.
RALPH Escalation Protocol¶
Kelly's RALPH (Retry And Learn Protocol) — 3 max retries, same error twice = escalate immediately — is a structured retry/escalation protocol with diagnostic passing. Gas Town's Deacon handles stuck workers but doesn't have equivalent retry-with-diagnostics logic. Implication: Kelly handles transient failures more gracefully with better diagnostic preservation; Gas Town's re-queue approach is simpler but loses context.
AGENTS.md as Executable Contract¶
Kelly's AGENTS.md is both documentation and executable instruction — it defines agent roles, triggers, capabilities, output directories, and escalation protocols in a form that is both human-readable and machine-enforceable (implicitly, through agent prompting). Implication: Gas Town's role definitions are more informal and distributed.
Combined Architecture¶
What would a Kelly + Gas Town hybrid look like? The best elements from each:
Pipeline Layer: Kelly's Stages + Gas Town's Packs¶
The hybrid retains Kelly's explicit six-stage pipeline (Intake → Research → Planning → Implementation → Testing → Release) as the macro structure, but replaces the fixed stage implementation with Gas City's pack composition model. Each pipeline stage is a pack containing the agent roles, workflows, and Bead types specific to that stage. Stages compose via the shared Bead substrate — a Bead created in Planning flows naturally into Implementation.
Execution Layer: Kelly's RALPH + Gas Town's GUPP¶
Sub-agents execute under GUPP's hook model (if work on hook, must run) with RALPH's retry-with-diagnostics layered on top. When a sub-agent fails, RALPH's rules apply: 3 retries, pass diagnostics, escalate on same error twice. The Deacon patrols hooks and re-queues stale Beads. This is Kelly's reliability + Gas Town's throughput in one execution model.
Role Layer: Gas Town's Mayor + Kelly's Router¶
The Mayor is the human's primary interface — filtering all agent output, surfacing decisions, maintaining context. The Kelly Router handles pipeline orchestration (staging, gate validation, escalation). These are complementary roles: Mayor is the human's face; Router is the factory's engine. They communicate via Beads.
Memory Layer: Kelly's TEA + Gas Town's MEOW¶
TEA audits become Beads with MEOW's graph structure. Each TEA audit item is a Bead node with typed edges to the work item it audited and the decision it reached. The TEA's reason field IS the Bead's reason field. The full knowledge graph is queryable: "Show me all TEA decisions related to security in the last quarter." Kelly's narrative TEA richness is preserved via a notes text field on each Bead.
Quality Layer: Kelly's 5-Agent Verdict + Gas Town's Witness¶
Witness runs continuously (not just at stage gates) catching simple quality issues early. 5-agent verdict applies at TEA stage for high-stakes decisions. This gives the hybrid continuous lightweight watching plus thorough adversarial review for release-critical quality gates.
Observability Layer: Gas Town's Light Factory + Kelly's Files¶
Deploy Dolt as the Beads backing store. All pipeline state, TEA audits, done markers, and quality results are Beads on the same Dolt instance. Kelly's file-based interface (memory, done markers, pipeline state) is preserved as view layers generated from Dolt queries — humans read files, machines query Dolt. This is the Light Factory: all workers visible and addressable, with the file interface as the human window.
Top 5 Priorities for the operator¶
If the operator wanted to adopt the best of Gas Town into his Kelly factory, in priority order:
1. Deploy Dolt + Migrate TEA to Beads (High Impact, Medium Complexity)¶
Why: TEA audits are the strongest semantic match between Kelly and Beads. Migrating TEA to Beads/Dolt gives the operator a machine-queryable reasoning history — the single biggest capability upgrade. Once Dolt is deployed, done markers and pipeline state can follow.
How: Set up a local Dolt instance. Design a Bead schema for TEA audit fields. Write a migration script for existing TEA audits. Build a pipeline state view generator from Dolt queries.
Risk: Dolt infrastructure. Mitigate by running Dolt locally first, not as a hosted service.
2. Add a Mayor-Equivalent Information Filter (High Impact, Low Complexity)¶
Why: As agent count scales, information overload becomes the limiting factor. Yegge identified the Mayor's information filtering as the killer feature. Kelly's router is a routing mechanism, not an editorial filter.
How: Create a new agent role (call it "Curator" or "Mayor-proxy") whose job is to read sub-agent outputs and write a human-digestible summary. No Dolt required. This is a targeted addition that doesn't require infrastructure changes.
Risk: New agent adds latency to the information flow. Mitigate by keeping the Curator lightweight — summary only, no new work generation.
3. Add GUPP-Inspired Timeout Enforcement on Sub-Agent Spawning (Medium Impact, Low Complexity)¶
Why: Kelly's yield-friendly model leaves open the "stuck sub-agent" failure mode. Adding explicit timeout enforcement with automatic re-spawn approximates GUPP's hook guarantee without the full hook infrastructure.
How: In the Kelly Router's sub-agent spawning, add a configurable timeout. If the sub-agent doesn't complete within the timeout, kill it, log the failure, and spawn a new agent with the same work. This is 20–30 lines of Router code.
Risk: Work items that legitimately take a long time might be killed prematurely. Mitigate by making timeouts configurable per work type, with longer timeouts for research-intensive tasks.
4. Continuous Witness-Quality Watching (Medium Impact, Medium Complexity)¶
Why: Kelly's TEA audit catches quality issues at the release gate. Gas Town's continuous Witness catches them as they occur. Adding lightweight continuous quality watching alongside the TEA stage-gate catches simple issues (style, basic correctness) early, reducing the cost of fixing them.
How: Create a lightweight Witness daemon that runs on every sub-agent's output. Configure it to catch a specific subset of quality issues (e.g., no console.log statements, no TODOs in shipped code, basic formatting). Don't try to replace TEA — just catch the low-hanging fruit.
Risk: Overhead from running Witness on every sub-agent output. Mitigate by configuring Witness only for the simplest, fastest checks, and keeping its scope narrow.
5. Formalize a Kelly Factory SDK / Pack Model (Long-Term, High Complexity)¶
Why: Kelly's factory is a specific implementation; Gas City's pack model is a toolkit for building custom factories. the operator's Kelly factory could become a platform if it separates the factory framework (pipeline engine, Beads substrate, Router logic) from the project-specific agent definitions (BMAD-style modular specs).
How: This is a longer-term architectural investment. Start by defining the Kelly factory's "plug points" — where the current pipeline can be extended with custom agent roles or workflows without modifying the core pipeline. Study Gas City's pack model for inspiration. The AGENTS.md pattern already has the right shape; the question is whether it can be generalized.
Risk: Over-engineering for the current scale. Mitigate by treating this as a v2 concern — let the factory scale first, then refactor for extensibility when the plug points are better understood.
Summary Table¶
| Concept | Kelly Status | Gap | Adoption Priority |
|---|---|---|---|
| Beads | Partial (4 separate mechanisms) | Partial | High |
| GUPP | None (yield-friendly model) | Full | Medium-High |
| MEOW / Knowledge Graph | None (text-based memory) | Full | Medium |
| Mayor (information filtering) | None (router is routing-focused) | Full | High |
| Crew (named persistent agents) | Partial (named lead agents exist) | Partial | Medium |
| Polecats (ephemeral workers) | Full (sub-agents) | None | N/A |
| Deacon (stuck worker cleanup) | Partial (heartbeat check) | Partial | Medium |
| Boot (heartbeat offload) | None | Full | Low |
| Witness (continuous QA) | Partial (TEA stage-gate) | Partial | Medium |
| Refinery (intent decomposition) | Partial (Intake → Planning) | Partial | Medium |
| Wasteland (federated reputation) | None (conceptual only) | Full | Low (near-term) |
| Gas City SDK / Packs | None (fixed pipeline) | Full | Medium (long-term) |
| SaaS Mountain narrative | Partial (implicit) | Partial | N/A |
| 5-agent adversarial verdict | Full (Angry Mob / 5-agent) | None | N/A |
| TEA audit (3-phase gate) | Full (TEA in pipeline) | None | N/A |
| Human-in-the-loop (SHIP/NO-SHIP) | Full (operator decision) | None | N/A |
| RALPH (retry + escalate) | Full (RALPH protocol) | None | N/A |
| Light Factory / Observability | Partial (file-based) | Partial | Medium |
| AGENTS.md as executable spec | Full (AGENTS.md pattern) | None | N/A |
| Structured handoff (artifact dirs) | Full (summary.md gates) | None | N/A |
Yuki AI CEO: Third Independent Invention¶
Yuki Capital independently arrived at the same patterns from a different starting point — GitHub repo as brain, authority matrix, autonomous loops, narrative memory > tables, progressive disclosure. This convergence across three independent systems (Kelly handbook, Gas Town, Yuki AI CEO) validates that these patterns are discovered necessities of autonomous agent operation, not design preferences.
The Yuki system also contributed unique insights not yet in Kelly or Gas Town: autonomous compounding loops in production (New AI Models, Bug Autofix, SEO Optimizer — each reads its own prior outputs and improves over time), the authority transfer log (explicit tracking of earned autonomy progression), 30-day outcome reviews as a governance cadence, and per-business CLAUDE.md specialization (each product has its own context file rather than one global identity). Yuki's hybrid tables+narrative insight — that LLMs have associative recall, not indexed — also sharpened the memory layer understanding: narrative memory beats tables for LLM retrieval even when tables are more structured for humans.
Cross-ref: yuki-ai-ceo-vs-kelly-gas-town-gap
Related Articles¶
steve-yegge-gas-town, steve-yegge-beads, steve-yegge-gupp, steve-yegge-meow, steve-yegge-hierarchy, steve-yegge-wasteland, steve-yegge-gas-city, steve-yegge-saas-mountain, steve-yegge-beads-kelly-gap, yuki-ai-ceo-vs-kelly-gas-town-gap, kelly-handbook-multi-agent, kelly-handbook-software-factory