Kelly vs Gas Town — Full Gap Analysis

kelly-gas-town-gap-analysis.md

id	kelly-gas-town-gap-analysis
type	article
source	kelly-gas-town-gap-analysis
author	Carson (dark-factory-kb subagent)
date	2026-04-26

Kelly vs Gas Town — Full Gap Analysis

Assessor: Carson (dark-factory-kb subagent)

Date: 2026-04-26

Sources: Kelly handbook (Ch7 multi-agent, Ch11 factory), Kelly tweets, [[steve-yegge-gas-town]], [[steve-yegge-beads]], [[steve-yegge-gupp]], [[steve-yegge-meow]], [[steve-yegge-hierarchy]], [[steve-yegge-wasteland]], [[steve-yegge-gas-city]], [[steve-yegge-saas-mountain]], [[steve-yegge-beads-kelly-gap]]

1. Beads

vs pipeline state, done markers, TEA audits, memory

Kelly Equivalent

Kelly tracks work state across four separate mechanisms:

**pipeline state** — central JSON file tracking current stage, subphase, timestamps
**done markers per subphase** — text markers signaling completed subphases
**TEA audit** — structured narrative capturing Thought-Event-Action reasoning
**memory + memory/YYYY-MM-DD.md** — long-term curated memory and daily logs

A prior assessment ([[steve-yegge-beads-kelly-gap]]) evaluated each mechanism against Beads. The summary finding: done markers map naturally to Bead state transitions (adopt); TEA audit is the strongest semantic match (adopt with schema design); pipeline state needs a JSON view on top of Beads (partial adopt); memory requires the most careful human-interface design (partial adopt).

Gap?

Partial. Kelly has file-based equivalents for all Bead functions, but none of them are unified. Beads would replace four separate mechanisms with a single git-versioned, SQL-queryable substrate. The semantic coverage is there; the architectural integration is not.

What Gas Town Does Better

**Unified substrate.** Gas Town's entire state — work items, quality gates, messages, patrol routes — lives in Beads/Dolt. Kelly's four mechanisms are siloed files that can't be queried together.
**Git-backed every action.** Beads are git-versioned by default. Every state transition is a commit with author, timestamp, and reason. Kelly's done markers and pipeline state are append-only files with no structural immutability guarantee.
**Cross-pipeline queries.** With Dolt, you can run: "Show me all testing-stage Beads that failed in the last 48 hours across all projects" — one SQL query. Kelly requires grep across project directories.
**Branch and experiment.** Gas Town can branch a Bead sequence to try an alternative approach, test it, then merge or discard. Kelly has no equivalent — the pipeline is linear.
**Bead-as-Why completeness.** Gas Town explicitly frames Beads as "The Missing Why" — git stores What/Where/Who/How, Beads store Why. Kelly's TEA audit captures Why but as a narrative sidecar, not as the primary work primitive.

What Kelly Does Better

**Human readability.** pipeline state and done markers are plain text — a human can read them in any editor without special tooling. Beads require Dolt queries or a Beads viewer.
**Simplicity of deployment.** Setting up Dolt for Beads is infrastructure work. Kelly's file-based mechanisms work immediately — `echo "DONE" >> memory` requires no server.
**Graceful degradation.** If the Dolt instance goes down, Gas Town stops. Kelly's file-based pipeline keeps working (just less queryable).
**Immediate persistence.** Files written to disk are durable immediately. Dolt requires a running database with transaction logging.

Adoption Potential

High. Beads are the most impactful single architectural upgrade to the Kelly pipeline. The TEA audit and done markers are the immediate adoption targets; pipeline state can follow once the Beads substrate is stable. The main risk is Dolt infrastructure — mitigate by deploying Dolt as a local instance first, not a hosted service.

3. MEOW

vs Kelly's knowledge graph / structured memory

Kelly Equivalent

Kelly has two knowledge management layers:

**memory** — curated long-term memory, distilled learnings, significant decisions. Read in main session, human-editable.
**memory/YYYY-MM-DD.md** — daily raw logs of session events and context. Append-only per session.

These are narrative, text-based, and human-written. They are human-readable and human-writable, but not machine-queryable in any structured sense. The closest Kelly gets to structured knowledge is the TEA audit narrative and the pipeline state's machine-readable stage metadata.

Gap?

Full. Kelly has no knowledge graph. memory and daily logs are a well-written journal; MEOW's Dolt-backed graph is a structured database. The difference is architectural: one preserves knowledge in narrative form (rich but opaque to queries), the other in typed, queryable form (structured and powerful but less narrative).

What Gas Town Does Better

**Machine-queryable knowledge.** "Show me all TEA audits where the reason field mentions 'security'" — one SQL query in MEOW; grep across narrative files in Kelly. As the knowledge base grows, this gap widens.
**Typed edges between knowledge nodes.** MEOW's Bead graph has typed relationships: parent/child (decomposition), causal (Bead B created because of Bead A), validated-by (quality attestation). Kelly's cross-references are free-form links in narrative text — powerful when written, but not structurally enforceable.
**Versioned knowledge graph.** Any point-in-time snapshot of the MEOW graph is queryable. "What did we know about this system on March 15th?" is a Dolt query. Kelly's daily logs are chronological but not versioned in the same sense.
**Onboarding via graph traversal.** A new agent can reconstruct a project's full decision history by traversing the Bead DAG. Kelly's onboarding requires reading daily logs in chronological order — more narrative, less efficient for targeted discovery.
**Shared knowledge across agents.** MEOW's graph is shared across all agents writing to the same Dolt instance. Kelly's memory is per-agent (each agent has its own session memory) unless explicitly shared.

What Kelly Does Better

**Narrative richness.** memory entries can include opinions, emotions, context, and free-form associations that don't fit a structured schema. MEOW's typed Bead fields capture "why" but not the full voice of a human writing in a journal.
**Human write path.** Humans can write directly to memory. MEOW requires a Beads CLI or API — there's no equivalent of `echo "decided to use Postgres" >> memory`. The human write path in Kelly is immediate; in Gas Town it requires tooling.
**No infrastructure dependency.** memory is a text file. MEOW requires Dolt running with a Beads schema. If Dolt is unavailable, MEOW's knowledge is inaccessible; Kelly's memory is always available.
**Free association.** memory entries can reference each other in natural, unstructured ways ("see also March 15th"). MEOW's typed edges require explicit authoring of the relationship — you can't accidentally create an implicit association.

Adoption Potential

Medium. MEOW is a powerful long-term target for the operator's Kelly factory, but it's a larger migration than Beads for state tracking. The recommendation from the prior Beads assessment holds: adopt Beads as the TEA audit substrate first (where the semantic match is strongest), then extend to memory as the interface matures. Full MEOW knowledge graph adoption requires Dolt infrastructure + a human-usable write interface + a migration strategy for existing memory files. High value, but phased investment.

5. Deacon / Boot / Witness / Refinery

vs heartbeat, subagent spawning, QA, routing

Kelly Equivalent

Kelly distributes these functions across existing mechanisms:

**Heartbeat / liveness** — heartbeat check-in mechanism; agents periodically update a file with current activity and timestamp. Detects stuck agents by absence of updates.
**QA / quality gates** — test-lead agent runs the TEA (Test, Evaluate, Assess) audit before release. Kelly also has the "Angry Mob" / 5-agent verdict pattern for adversarial review.
**Routing** — Kelly Router handles intent parsing, work decomposition, and sub-agent spawning.
**Sub-agent spawning** — JSON tool call with label, task, and output path. List/steer/kill controls.

Kelly does not have dedicated daemon roles corresponding to Deacon, Boot, Witness, or Refinery. These functions are either handled by agents (TEA audit by test-lead) or not explicitly handled (Deacon's stuck-worker cleanup, Boot's heartbeat offload).

Gap?

Partial. Kelly has equivalents for the outcomes of these roles (liveness detection, quality gates, routing) but not for the architectural separation of these functions into dedicated daemons. The gap is in isolation, observability, and enforcement.

What Gas Town Does Better

**Deacon's structural liveness enforcement.** heartbeat is a file that agents must actively update. If an agent crashes, it stops updating — but the absence is only detected when something else checks. The Deacon is a daemon that *actively patrols* all hooks and detects stuck workers structurally (hook age, not file timestamp). This is more robust than agent-managed heartbeat files.
**Boot isolating heartbeat traffic.** Separating heartbeat handling into a dedicated daemon (Boot) means the Deacon isn't interrupted by routine liveness pings. Kelly's heartbeat is distributed across all agents — if heartbeat volume increases, it competes with work-processing resources.
**Witness as dedicated quality gate daemon.** Having a dedicated Witness daemon that watches all workers means quality checks run continuously, not just at the TEA stage. Kelly's quality gate is a stage-gate (TEA audit before release), not continuous. For some quality concerns (style violations, simple correctness checks), continuous watching is more efficient.
**Refinery's explicit intent decomposition.** Refinery converts vague epics into well-specified Bead sequences before polecats execute. Kelly's equivalent (Intake → Planning) is more heavyweight — it creates project directories and multiple artifacts. Refinery is more lightweight and can be applied to smaller units of work.

What Kelly Does Better

**TEA audit depth.** Kelly's TEA (Test, Evaluate, Assess) is a structured, three-phase quality audit with named outputs (tea-summary.md) and a defined gate decision (PASS / PASS-WITH-FOLLOWUPS / REMEDIATE). Gas Town's Witness is a quality auditor but the specific audit methodology is less formalized than TEA.
**Multi-agent adversarial review (5-agent verdict).** Kelly's Angry Mob / 5-agent verdict is a more robust adversarial review mechanism than Gas Town's single Witness. Five agents independently reviewing work and reaching consensus is statistically more reliable than a single Witness.
**Pipeline stages contain the roles.** Kelly's pipeline stages implicitly contain the functions of Refinery (Planning), Witness (TEA), and routing (Router). Gas Town's roles are separate daemons that must be explicitly coordinated. Kelly's approach is more self-documenting.
**Explicit human approval at release.** Kelly's Operator Decision (SHIP / NO-SHIP) at the release gate is an explicit human-in-the-loop checkpoint. Gas Town doesn't have an equivalent explicit human approval before release — it's more implicit in the Mayor's judgment.

Adoption Potential

Medium. The highest-value adoption from this group is the Deacon's structural liveness enforcement — replacing Kelly's file-based heartbeat with a daemon that actively patrols running agents. This is a targeted addition that doesn't require adopting the full Gas Town daemon model. Witness (continuous quality watching) is valuable but more complex to integrate. Refinery's lightweight intent decomposition could enhance Kelly's Intake/Planning stages. TEA and the 5-agent verdict already give Kelly strong quality coverage.

7. Gas City SDK

vs Kelly's pipeline-as-framework

Kelly Equivalent

Kelly's factory is a structured six-stage pipeline:

**Intake → Research → Planning → Implementation → Testing → Release**
Each stage has defined artifacts, gates, and specialized agents
Defined in AGENTS.md, executed via the Kelly Router

This is a pipeline, not an SDK. It is project-specific and opinionated — the six stages are the factory. Gas City is a toolkit for building arbitrary factories; Kelly is a specific factory implementation.

Gap?

Partial. Kelly doesn't have a Gas City equivalent — a composable SDK of building blocks that can be assembled into custom factories. However, Kelly's BMAD agent definitions ([[kelly-tweets-bmad]]) provide a degree of composability: modular agent specs that can be mixed and matched. The gap is in the pack model and the SDK extensibility.

What Gas Town Does Better

**Composable packs.** Gas City allows building custom packs (collections of agent roles, workflows, Bead types, and configuration) that compose with other packs. Kelly's pipeline is a fixed six-stage sequence — adding a new stage or modifying the workflow requires modifying the pipeline itself.
**SDK for building custom factories.** Gas City is MIT-licensed, community-driven, with a Discord of thousands building packs for different domains. Kelly's framework is a single implementation — there's no SDK layer for building custom Kelly-style factories.
**Drop-in Gas Town pack.** Gas City's Gas Town pack ships as the default, making it a drop-in replacement for existing Gas Town users. Anyone can build a new pack that redefines the factory. Kelly has no equivalent extensibility mechanism.
**11 Stages of AI Adoption as a progression model.** Gas City's leveling framework (Level 8: build your own orchestrator → Level 11: factory builder) provides a clear maturity model for adopters. Kelly's factory evolution (v1→v3) is less formalized as an adoption progression.

What Kelly Does Better

**Opinionated defaults with proven structure.** Kelly's six-stage pipeline is a specific, proven factory design. Gas City's pack model is more powerful but requires more design decisions — Kelly's opinionated defaults reduce the burden on adopters who just want a working factory.
**Pipeline stages as a mental model.** The Intake → Research → Planning → Implementation → Testing → Release sequence is immediately intuitive to most developers. The pack model is more powerful but less immediately comprehensible.
**AGENTS.md as executable specification.** Kelly's AGENTS.md is both documentation and executable instruction — it defines the factory concretely enough to run. Gas City's packs are declarative but less directly tied to the running system.
**TEA audit and gate validation are first-class.** Kelly's TEA audit and stage-gate validation are integral to the pipeline, not optional add-ons. Gas City's quality gates (Witness) are more composable but can be omitted.

Adoption Potential

Medium. the operator should consider Gas City's pack composition principles — specifically the idea that agent roles, workflows, and Bead types should be composable modules rather than a fixed pipeline. This could inspire a Kelly "factory SDK" that separates the pipeline framework from project-specific configurations. However, full Gas City SDK adoption would require a significant architectural shift from Kelly's pipeline model to Gas City's pack model. The more practical near-term adoption is applying Gas City's Light Factory observability principles to make Kelly's pipeline more transparent.

9. Multi-Agent Adversarial Reliability

vs Kelly's TEA audit / QA gates

Kelly Equivalent

Kelly has two adversarial reliability mechanisms:

**TEA audit** (Test, Evaluate, Assess) — structured quality gate before release, run by the test-lead agent
**5-agent verdict / Angry Mob** — multiple agents independently reviewing work and reaching consensus

Kelly's multi-agent review is well-established. The TEA audit is a formal three-phase gate; the Angry Mob is used for adversarial testing (e.g., 5 agents independently test the same implementation and compare results).

Gap?

None. Kelly has genuine adversarial multi-agent review. The 5-agent verdict is arguably more robust than Gas Town's single Witness — five agents independently reaching consensus is statistically more reliable than a single quality auditor.

What Gas Town Does Better

**Witness as continuous auditor.** Gas Town's Witness watches all workers continuously, not just at the release gate. Quality issues are caught as they occur, not retrospectively. Kelly's TEA is a stage-gate (before release), which means quality issues discovered late are more expensive to fix.
**Witness as a dedicated daemon role.** Making Witness a dedicated daemon role means it can't be skipped under time pressure. Kelly's TEA is a pipeline stage — under deadline pressure, there's temptation to shorten or skip it.
**Two-agent watching each other as baseline.** Gas City's "never deploy a single agent for real business processes" principle formalizes what Kelly does empirically. The framing makes it a design principle, not an emergent practice.

What Kelly Does Better

**5-agent verdict vs single Witness.** Kelly's 5-agent adversarial verdict is statistically more reliable than Gas Town's single Witness. Five independent agents reaching consensus catches failure modes that a single auditor misses.
**TEA's explicit three-phase structure.** Test → Evaluate → Assess is a more thorough audit than a single Witness review. The Evaluate phase specifically covers non-functional requirements (performance, security) — aspects that a single Witness might deprioritize.
**TEA output as named artifact.** Kelly's tea-summary.md with explicit gate decision (PASS / PASS-WITH-FOLLOWUPS / REMEDIATE) is a structured, auditable output. Gas Town's Witness validation results are less formally specified.
**Human operator decision at release.** Kelly's SHIP / NO-SHIP operator decision at the release gate is an explicit human-in-the-loop checkpoint. Gas Town doesn't have an equivalent — the Mayor's editorial judgment is the final call.

Adoption Potential

Low. Kelly already has stronger adversarial coverage than Gas Town (5-agent verdict > single Witness) and a more formal quality audit structure (TEA's three phases). The only improvement Kelly should consider from Gas Town is making the Witness role continuous rather than a stage-gate — adding lightweight, continuous quality watching alongside the existing TEA stage-gate.

Concepts Kelly is Missing Entirely

The following Gas Town concepts have no Kelly equivalent at all:

GUPP (Full Execution Persistence)

Kelly's agents yield and resume. GUPP's "if your hook is non-empty, you MUST run" has no architectural equivalent. A Kelly agent that yields indefinitely will stall the pipeline; GUPP eliminates this by design. Recommendation: Add explicit timeout enforcement on sub-agent spawning with automatic re-spawn. This is a lightweight approximation of GUPP that doesn't require the full hook infrastructure.

Wasteland (Federated Reputation Economy)

Kelly's autonomous company marketplace is a design concept without a concrete protocol implementation. The Wasteland's git/Dolt-backed Wanted Board, multi-dimensional stamps, and trust ladder are the concrete realization. Recommendation: Not immediately applicable (requires network effects), but study the Wasteland's protocol design for future multi-factory scenarios.

Beads-as-Why (The Missing Why as First-Class Data)

Beads' core insight — git stores What/Where/Who/How, Beads capture Why — has no Kelly equivalent. Kelly's TEA audit captures reasoning, but as a narrative sidecar, not as the primary work primitive. Recommendation: Begin TEA schema design to map current audit fields to Bead fields. This is the highest-value missing concept to adopt.

MEOW Work Primitives (Work as First-Class System Primitive)

In MEOW, everything is Work: knowledge, coordination, communication, reputation. Kelly distinguishes between work items (pipeline tasks), memory (knowledge), and quality gates (coordination) — they are separate mechanisms. Recommendation: Adopt Beads as the universal substrate so that all Kelly mechanisms (pipeline state, memory, QA, heartbeats) can eventually be queried together.

Boot Daemon (Heartbeat Traffic Isolation)

Boot handles heartbeats so the Deacon isn't interrupted. Kelly has no equivalent heartbeat isolation — heartbeat traffic is distributed across all agents and competes with work processing. Recommendation: Consider a dedicated heartbeat handler for high-agent-count deployments.

Pack Composition Model

Gas City's packs (composable agent role + workflow + Bead type bundles) have no Kelly equivalent. Kelly's factory is a fixed six-stage pipeline. Recommendation: Investigate whether Kelly's BMAD agent definitions could be extended to a pack-like composability model for future extensibility.

Combined Architecture

What would a Kelly + Gas Town hybrid look like? The best elements from each:

Pipeline Layer: Kelly's Stages + Gas Town's Packs

The hybrid retains Kelly's explicit six-stage pipeline (Intake → Research → Planning → Implementation → Testing → Release) as the macro structure, but replaces the fixed stage implementation with Gas City's pack composition model. Each pipeline stage is a pack containing the agent roles, workflows, and Bead types specific to that stage. Stages compose via the shared Bead substrate — a Bead created in Planning flows naturally into Implementation.

Execution Layer: Kelly's RALPH + Gas Town's GUPP

Sub-agents execute under GUPP's hook model (if work on hook, must run) with RALPH's retry-with-diagnostics layered on top. When a sub-agent fails, RALPH's rules apply: 3 retries, pass diagnostics, escalate on same error twice. The Deacon patrols hooks and re-queues stale Beads. This is Kelly's reliability + Gas Town's throughput in one execution model.

Role Layer: Gas Town's Mayor + Kelly's Router

The Mayor is the human's primary interface — filtering all agent output, surfacing decisions, maintaining context. The Kelly Router handles pipeline orchestration (staging, gate validation, escalation). These are complementary roles: Mayor is the human's face; Router is the factory's engine. They communicate via Beads.

Memory Layer: Kelly's TEA + Gas Town's MEOW

TEA audits become Beads with MEOW's graph structure. Each TEA audit item is a Bead node with typed edges to the work item it audited and the decision it reached. The TEA's reason field IS the Bead's reason field. The full knowledge graph is queryable: "Show me all TEA decisions related to security in the last quarter." Kelly's narrative TEA richness is preserved via a notes text field on each Bead.

Quality Layer: Kelly's 5-Agent Verdict + Gas Town's Witness

Witness runs continuously (not just at stage gates) catching simple quality issues early. 5-agent verdict applies at TEA stage for high-stakes decisions. This gives the hybrid continuous lightweight watching plus thorough adversarial review for release-critical quality gates.

Observability Layer: Gas Town's Light Factory + Kelly's Files

Deploy Dolt as the Beads backing store. All pipeline state, TEA audits, done markers, and quality results are Beads on the same Dolt instance. Kelly's file-based interface (memory, done markers, pipeline state) is preserved as view layers generated from Dolt queries — humans read files, machines query Dolt. This is the Light Factory: all workers visible and addressable, with the file interface as the human window.

Summary Table

Concept	Kelly Status	Gap	Adoption Priority
Beads	Partial (4 separate mechanisms)	Partial	High
GUPP	None (yield-friendly model)	Full	Medium-High
MEOW / Knowledge Graph	None (text-based memory)	Full	Medium
Mayor (information filtering)	None (router is routing-focused)	Full	High
Crew (named persistent agents)	Partial (named lead agents exist)	Partial	Medium
Polecats (ephemeral workers)	Full (sub-agents)	None	N/A
Deacon (stuck worker cleanup)	Partial (heartbeat check)	Partial	Medium
Boot (heartbeat offload)	None	Full	Low
Witness (continuous QA)	Partial (TEA stage-gate)	Partial	Medium
Refinery (intent decomposition)	Partial (Intake → Planning)	Partial	Medium
Wasteland (federated reputation)	None (conceptual only)	Full	Low (near-term)
Gas City SDK / Packs	None (fixed pipeline)	Full	Medium (long-term)
SaaS Mountain narrative	Partial (implicit)	Partial	N/A
5-agent adversarial verdict	Full (Angry Mob / 5-agent)	None	N/A
TEA audit (3-phase gate)	Full (TEA in pipeline)	None	N/A
Human-in-the-loop (SHIP/NO-SHIP)	Full (operator decision)	None	N/A
RALPH (retry + escalate)	Full (RALPH protocol)	None	N/A
Light Factory / Observability	Partial (file-based)	Partial	Medium
AGENTS.md as executable spec	Full (AGENTS.md pattern)	None	N/A
Structured handoff (artifact dirs)	Full (summary.md gates)	None	N/A