Kelly Factory — Overview¶

Compiled by: Router (subagent)
Date: 2026-04-27
Sources: Kelly Handbook Ch7 (Multi-Agent), Kelly Handbook Ch11 (Software Factory), Kelly Tweets (Factory), Kelly vs Gas Town Gap Analysis, soul, AGENTS.md

What Is the Kelly Factory¶

The Kelly Factory is a dark factory architecture for software development — a multi-agent production line that processes ideas into shipped software with minimal human intervention. It is the operator's implementation of the "relentless cofounder" thesis: an AI agent system that never sleeps, never forgets, and executes structured work pipelines on behalf of its human operator.

Origin: The factory emerges from three converging sources — Kelly's OpenClaw Handbook (Ch7 multi-agent orchestration + Ch11 software factory), KellyClaudeAI's public tweets documenting the factory's evolution from single-agent to v3 multi-stage production line, and the broader gstack corpus on autonomous agent patterns. the operator is synthesizing these into a working system for his own software factory.

Goal: Relentless cofounder for the operator — a system that can take a product idea and drive it through research, design, build, test, and deploy with minimal intervention. The human's role is intent and approval, not execution.

Factory type: Dark factory with human-in-the-loop. The factory runs autonomously between gates; humans approve at key decision points (SHIP/NO-SHIP at release).

For the detailed pipeline stages and gate definitions, see kelly-handbook-ch11-software-factory. For the multi-agent orchestration pattern, see kelly-handbook-ch7-multi-agent.

Architecture¶

Pipeline Stages¶

The factory runs a structured six-stage pipeline:

Idea → Intake → Research → Planning → Implementation → Testing → Release
               (CIS Loop)  (PRD+    (Sprint      (TEA     (Operator
                          Arch+     Execution)   Audit)   Decision)
                          UX Design)

Stage	Agent	Key Artifact	Gate
Intake	Router	`intake.md`	—
Research	research-lead	`research-summary.md`	READY / NOT-READY
Planning	project-lead	`prd.md`, `architecture.md`, `ux-design.md`	PASS / FAIL
Implementation	build agents	`implementation-summary.md`	—
Testing	test-lead	`tea-summary.md`	PASS / PASS-WITH-FOLLOWUPS / REMEDIATE
Release	Router	—	SHIP / NO-SHIP

Quick Path: Bug fixes skip Research and go directly to Planning/Implementation. New products run the full pipeline.

Full Pipeline example (CSV Export Feature): 10:15 AM intake → 10:21 AM planning complete → 11:15 AM implementation done → 11:30 AM TEA audit PASS-WITH-FOLLOWUPS → 11:31 AM release decision. ~75 minutes end-to-end.

Sub-Agent Routing Table¶

Task Type	Route To
New project intake	carson
Research	carson, drquinn, mary, victor
Planning	mary
Architecture	winston
Design/UI	sally
Building	amelia
Testing/QA	qa, testlead
DevOps/Scaffold/Deploy	phil

BEADS Pipeline State¶

Kelly currently tracks pipeline state across four separate mechanisms:

pipeline state — machine-readable current stage, subphase, timestamps
done markers per subphase — text signals for completed subphases
TEA audit narrative — structured Thought/Event/Action captures per quality gate
heartbeat — agent liveness and current activity

The Kelly vs Gas Town gap analysis identifies Beads (git-versioned, SQL-queryable work primitives backed by Dolt) as the highest-priority migration target to unify these four separate mechanisms into a single substrate.

Agent Roles¶

Router (Orchestrator)¶

The Router is the main Kelly agent — the orchestrator that never does the work itself. Core responsibilities:
- Route work to the correct sub-agent based on type
- Validate quality gates before advancing stages
- Maintain strategic view of the whole pipeline
- Flag stuck items, keep the flow moving
- Escalate when sub-agents fail per RALPH protocol

Never edit project files while a sub-agent is working on the same project. Use sessions_send or subagents(steer) to redirect instead.

Named Lead Agents¶

research-lead (Carson): Runs the Context → Information → Synthesis (CIS) loop for research stage
project-lead (Mary): Owns planning artifacts (PRD, architecture, UX design)
test-lead: Runs TEA audit, manages adversarial review (Angry Mob / 5-agent verdict)
design-lead (Sally): UI/UX design specifications
build-lead (Amelia): Sprint execution, implementation
deploy-lead (Phil): DevOps, scaffolding, deployment

Subagent Patterns¶

Spawned with a label, task definition, and output directory
Work in parallel — what takes 15 minutes sequentially takes 5 in parallel when three are spawned simultaneously
Ephemeral — run to completion and die
Parent (Router) maintains context via sessions_yield

Memory Model¶

5-Layer System¶

Kelly's session persistence uses a 5-layer memory system — demand-loaded, narrative-dominant:

Layer	File	Purpose
1	soul	Who Kelly is — identity, role, communication style
2	memory	Curated long-term memory — learnings, decisions, insights
3	memory/YYYY-MM-DD.md	Today's raw session log
4	projects/{id}/context.md	Per-project state and context
5	data/.json	Structured data, lookups

Design principle: Narrative at the top (layers 2–3), structured at the bottom (layers 4–5). Yuki AI CEO experiments confirm this is correct — LLM recall is associative, not indexed. Tables are for lookup; narrative is for association. More knowledge can compound without expanding the attention footprint via progressive disclosure.

Load strategy: soul and memory load every session (small map). Project context and daily logs load on demand (large encyclopedia). This is the "map, not encyclopedia" principle — validated empirically by Yuki's CLAUDE.md shrinking 36% while repo doubled in size.

Quality Gates¶

TEA Audit¶

The Test → Evaluate → Assess audit is the structured quality gate before release:

Test: Implementation against requirements — does it do what it should?
Evaluate: Non-functional requirements — performance, security, edge cases
Assess: Overall quality — is it good enough to ship?

Output: tea-summary.md with gate decision:
- PASS — ready to ship
- PASS-WITH-FOLLOWUPS — ship now, follow-up issues tracked
- REMEDIATE — must fix before release

done markers¶

Each completed subphase writes a DONE marker to its output directory. The router reads these before spawning the next phase — nothing advances until the gate file exists and passes validation.

QA Requirement Rule¶

QA is not optional. Every build/deliverable task must include a verification step before marking complete. QA happens inline as part of the task, not after.

Gate Failure Protocol (PIP-05 / Gate Failure Rule)¶

When a gate fails → fix the agent, not the gate. Default to fixing what produced wrong output. Only fix the gate when the term is genuinely interchangeable. Never fix the gate to accept bad output or lower thresholds.

Autonomy Model¶

sessions_yield for Sub-Agent Execution¶

The Router uses sessions_yield to delegate work to sub-agents:
- Parent yields control while sub-agent executes
- Parent resumes when sub-agent completes or session is explicitly continued
- Sub-agents run to completion without requiring the parent to poll

RALPH Retry Protocol¶

Retry And Learn Protocol:
1. Any sub-agent failure → retry
2. Same failure twice → escalate immediately (don't waste third attempt)
3. Three failures → mandatory escalation with structured diagnostic
4. Unrecoverable → immediate escalation with operator decision requested

Cron for Scheduling¶

Scheduled automations (health checks, periodic updates, daily syncs) run via cron — tasks that need no reasoning and run on a timer. sessions_yield vs cron separation principle: tasks that need reasoning belong in agent sessions; tasks that don't belong in cron/scheduled automation. Mixing them makes both worse. Yuki AI CEO confirmed this separation with Romain's correction: "New tool doesn't mean move everything there."

heartbeat for Active Pulse¶

Kelly's heartbeat mechanism: agents periodically update heartbeat with current activity and timestamp. Detects stuck agents by absence of updates. This is a file-based approximation of Gas Town's Deacon daemon (which actively patrols hooks structurally).

Key Patterns¶

Full Content in Prompt (Story-by-Story)¶

Kelly's factory produces narrative-rich artifacts — research summaries, TEA audits, decision logs — not just code. The "story" of each decision (context, options, rationale, outcome) is preserved alongside the output. This serves the memory model: narrative beats tables for LLM recall.

Queue Protocol¶

When the operator says "queue this", "add to queue", or similar → immediately update TODO.md with the queued item and its status. Keep heartbeat for active pulse only.

Operational Safety Rules¶

Revert path before breaking changes. Before modifying critical infrastructure (gateway bind, network exposure, auth settings, service mode), always know how to revert. If you can't revert safely in <5 min without outside help, you need a plan before touching it.

Never exfiltrate private data.

trash > rm (recoverable beats gone forever).

Commit Conventions¶

Router commits use the router: prefix so agent vs router changes are distinguishable:

git commit -m "router: soul: add no-edit-while-agent-working rule"
git commit -m "router: AGENTS.md: add office-hours subphase mapping"

Gaps vs Gas Town¶

The Kelly vs Gas Town gap analysis (Carson, 2026-04-26) identified these missing elements:

Missing Beads Unified Substrate¶

Kelly has 4 separate state mechanisms (pipeline state, done markers, TEA audit, heartbeat) that Beads would unify into a single git-versioned, SQL-queryable substrate. Adoption priority: High.

Missing GUPP Hook Enforcement¶

Kelly's yield-friendly model has no equivalent to GUPP's absolute "if hook is non-empty, you MUST run" rule. No architectural enforcement that a deferred sub-agent is violating a contract. Adoption priority: Medium-High.

Missing Autonomous Compounding Loops¶

Kelly's cron/TaskFlow handle scheduled tasks but none read their own prior outputs and compound. Yuki AI CEO's three production loops (New AI Models, Bug Autofix, SEO Optimizer) demonstrate the pattern: each reads its last run's output, takes the next step, and writes new output for next time. Adoption priority: Medium.

Missing Explicit Authority Matrix¶

Kelly's authority is gate-driven (PASS/FAIL before advancing) but has no written per-agent authority tiers with progressive transfer tracking. Gas Town's Mayor/Crew/Polecats and Yuki's three-tier authority matrix provide the model. Adoption priority: High.

Missing 30-Day Outcome Reviews¶

Kelly's quality gates are point-in-time (gate at stage transition). Yuki AI CEO adds temporal quality gates: every significant decision sets a 30-day review date, then assesses actual vs expected outcome. Adoption priority: Medium.

Gaps vs Yuki AI CEO¶

The Yuki AI CEO vs Kelly/Gas Town gap analysis (Carson, 2026-04-27) identified these Kelly-specific missing elements:

Missing Per-Project CLAUDE.md Files¶

Yuki's per-business CLAUDE.md files demonstrate that specialization lives in context files, not in distinct agent instances. The same AI CEO loads different context depending on which product it's working on. Kelly has projects/{id}/context.md files but they describe project state, not agent role and rules for that project. Missing: per-project CLAUDE.md-style context files that load when entering project scope.

Missing Autonomous Compounding Loops¶

Yuki's three production loops (daily 3am model scanner, daily 6am bug autofix, weekly SEO optimizer) are GUPP loops with compounding — each reads its own prior outputs. Kelly has no equivalent. Missing: persistent background agents that read their own prior outputs and compound over time.

Missing Authority Transfer Log¶

Yuki maintains a visible, version-controlled record of what autonomy has been earned (tier 3 → tier 2 → tier 1 movements). Kelly has no equivalent. Missing: written authority progression log per agent role.

kelly-gas-town-gap-analysis — full Gas Town comparison with adoption priorities
kelly-handbook-multi-agent — Ch7 router/sub-agent architecture, RALPH protocol
kelly-handbook-software-factory — Ch11 factory pipeline, TEA audit
kelly-tweets-factory — Kelly's public tweets on factory evolution
yuki-ai-ceo-vs-kelly-gas-town-gap — Yuki AI CEO cross-reference and synthesis recommendations
yukicapital-ai-ceo-overview — Yuki Capital AI CEO patterns mapped to Kelly equivalents

Memory & Architecture Concepts¶

five-layer-memory — the 5-layer memory system used for session persistence
gateway-daemon — the Gateway daemon architecture
session-isolation — session isolation model
workspace-boot — workspace boot and context restoration
exec-tool — the exec tool for shell command execution
subagent-spawning — sub-agent spawn patterns

Kelly Factory — Overview¶

What Is the Kelly Factory¶

Architecture¶

Pipeline Stages¶

Sub-Agent Routing Table¶

BEADS Pipeline State¶

Agent Roles¶

Router (Orchestrator)¶

Named Lead Agents¶

Subagent Patterns¶

Memory Model¶

5-Layer System¶

Quality Gates¶

TEA Audit¶

done markers¶

QA Requirement Rule¶

Gate Failure Protocol (PIP-05 / Gate Failure Rule)¶

Autonomy Model¶

sessions_yield for Sub-Agent Execution¶

RALPH Retry Protocol¶

Cron for Scheduling¶

heartbeat for Active Pulse¶

Key Patterns¶

Full Content in Prompt (Story-by-Story)¶

Queue Protocol¶

Operational Safety Rules¶

Commit Conventions¶

Gaps vs Gas Town¶

Missing Beads Unified Substrate¶

Missing GUPP Hook Enforcement¶

Missing Autonomous Compounding Loops¶

Missing Explicit Authority Matrix¶

Missing 30-Day Outcome Reviews¶

Gaps vs Yuki AI CEO¶

Missing Per-Project CLAUDE.md Files¶

Missing Autonomous Compounding Loops¶

Missing Authority Transfer Log¶

Related¶

Memory & Architecture Concepts¶