Kelly Factory — Overview¶
Compiled by: Router (subagent)
Date: 2026-04-27
Sources: Kelly Handbook Ch7 (Multi-Agent), Kelly Handbook Ch11 (Software Factory), Kelly Tweets (Factory), Kelly vs Gas Town Gap Analysis, soul, AGENTS.md
What Is the Kelly Factory¶
The Kelly Factory is a dark factory architecture for software development — a multi-agent production line that processes ideas into shipped software with minimal human intervention. It is the operator's implementation of the "relentless cofounder" thesis: an AI agent system that never sleeps, never forgets, and executes structured work pipelines on behalf of its human operator.
Origin: The factory emerges from three converging sources — Kelly's OpenClaw Handbook (Ch7 multi-agent orchestration + Ch11 software factory), KellyClaudeAI's public tweets documenting the factory's evolution from single-agent to v3 multi-stage production line, and the broader gstack corpus on autonomous agent patterns. the operator is synthesizing these into a working system for his own software factory.
Goal: Relentless cofounder for the operator — a system that can take a product idea and drive it through research, design, build, test, and deploy with minimal intervention. The human's role is intent and approval, not execution.
Factory type: Dark factory with human-in-the-loop. The factory runs autonomously between gates; humans approve at key decision points (SHIP/NO-SHIP at release).
For the detailed pipeline stages and gate definitions, see kelly-handbook-ch11-software-factory. For the multi-agent orchestration pattern, see kelly-handbook-ch7-multi-agent.
Architecture¶
Pipeline Stages¶
The factory runs a structured six-stage pipeline:
Idea → Intake → Research → Planning → Implementation → Testing → Release
(CIS Loop) (PRD+ (Sprint (TEA (Operator
Arch+ Execution) Audit) Decision)
UX Design)
| Stage | Agent | Key Artifact | Gate |
|---|---|---|---|
| Intake | Router | intake.md |
— |
| Research | research-lead | research-summary.md |
READY / NOT-READY |
| Planning | project-lead | prd.md, architecture.md, ux-design.md |
PASS / FAIL |
| Implementation | build agents | implementation-summary.md |
— |
| Testing | test-lead | tea-summary.md |
PASS / PASS-WITH-FOLLOWUPS / REMEDIATE |
| Release | Router | — | SHIP / NO-SHIP |
Quick Path: Bug fixes skip Research and go directly to Planning/Implementation. New products run the full pipeline.
Full Pipeline example (CSV Export Feature): 10:15 AM intake → 10:21 AM planning complete → 11:15 AM implementation done → 11:30 AM TEA audit PASS-WITH-FOLLOWUPS → 11:31 AM release decision. ~75 minutes end-to-end.
Sub-Agent Routing Table¶
| Task Type | Route To |
|---|---|
| New project intake | carson |
| Research | carson, drquinn, mary, victor |
| Planning | mary |
| Architecture | winston |
| Design/UI | sally |
| Building | amelia |
| Testing/QA | qa, testlead |
| DevOps/Scaffold/Deploy | phil |
BEADS Pipeline State¶
Kelly currently tracks pipeline state across four separate mechanisms:
- pipeline state — machine-readable current stage, subphase, timestamps
- done markers per subphase — text signals for completed subphases
- TEA audit narrative — structured Thought/Event/Action captures per quality gate
- heartbeat — agent liveness and current activity
The Kelly vs Gas Town gap analysis identifies Beads (git-versioned, SQL-queryable work primitives backed by Dolt) as the highest-priority migration target to unify these four separate mechanisms into a single substrate.
Agent Roles¶
Router (Orchestrator)¶
The Router is the main Kelly agent — the orchestrator that never does the work itself. Core responsibilities:
- Route work to the correct sub-agent based on type
- Validate quality gates before advancing stages
- Maintain strategic view of the whole pipeline
- Flag stuck items, keep the flow moving
- Escalate when sub-agents fail per RALPH protocol
Never edit project files while a sub-agent is working on the same project. Use sessions_send or subagents(steer) to redirect instead.
Named Lead Agents¶
- research-lead (Carson): Runs the Context → Information → Synthesis (CIS) loop for research stage
- project-lead (Mary): Owns planning artifacts (PRD, architecture, UX design)
- test-lead: Runs TEA audit, manages adversarial review (Angry Mob / 5-agent verdict)
- design-lead (Sally): UI/UX design specifications
- build-lead (Amelia): Sprint execution, implementation
- deploy-lead (Phil): DevOps, scaffolding, deployment
Subagent Patterns¶
- Spawned with a label, task definition, and output directory
- Work in parallel — what takes 15 minutes sequentially takes 5 in parallel when three are spawned simultaneously
- Ephemeral — run to completion and die
- Parent (Router) maintains context via sessions_yield
Memory Model¶
5-Layer System¶
Kelly's session persistence uses a 5-layer memory system — demand-loaded, narrative-dominant:
| Layer | File | Purpose |
|---|---|---|
| 1 | soul | Who Kelly is — identity, role, communication style |
| 2 | memory | Curated long-term memory — learnings, decisions, insights |
| 3 | memory/YYYY-MM-DD.md | Today's raw session log |
| 4 | projects/{id}/context.md | Per-project state and context |
| 5 | data/.json | Structured data, lookups |
Design principle: Narrative at the top (layers 2–3), structured at the bottom (layers 4–5). Yuki AI CEO experiments confirm this is correct — LLM recall is associative, not indexed. Tables are for lookup; narrative is for association. More knowledge can compound without expanding the attention footprint via progressive disclosure.
Load strategy: soul and memory load every session (small map). Project context and daily logs load on demand (large encyclopedia). This is the "map, not encyclopedia" principle — validated empirically by Yuki's CLAUDE.md shrinking 36% while repo doubled in size.
Quality Gates¶
TEA Audit¶
The Test → Evaluate → Assess audit is the structured quality gate before release:
- Test: Implementation against requirements — does it do what it should?
- Evaluate: Non-functional requirements — performance, security, edge cases
- Assess: Overall quality — is it good enough to ship?
Output: tea-summary.md with gate decision:
- PASS — ready to ship
- PASS-WITH-FOLLOWUPS — ship now, follow-up issues tracked
- REMEDIATE — must fix before release
done markers¶
Each completed subphase writes a DONE marker to its output directory. The router reads these before spawning the next phase — nothing advances until the gate file exists and passes validation.
QA Requirement Rule¶
QA is not optional. Every build/deliverable task must include a verification step before marking complete. QA happens inline as part of the task, not after.
Gate Failure Protocol (PIP-05 / Gate Failure Rule)¶
When a gate fails → fix the agent, not the gate. Default to fixing what produced wrong output. Only fix the gate when the term is genuinely interchangeable. Never fix the gate to accept bad output or lower thresholds.
Autonomy Model¶
sessions_yield for Sub-Agent Execution¶
The Router uses sessions_yield to delegate work to sub-agents:
- Parent yields control while sub-agent executes
- Parent resumes when sub-agent completes or session is explicitly continued
- Sub-agents run to completion without requiring the parent to poll
RALPH Retry Protocol¶
Retry And Learn Protocol:
1. Any sub-agent failure → retry
2. Same failure twice → escalate immediately (don't waste third attempt)
3. Three failures → mandatory escalation with structured diagnostic
4. Unrecoverable → immediate escalation with operator decision requested
Cron for Scheduling¶
Scheduled automations (health checks, periodic updates, daily syncs) run via cron — tasks that need no reasoning and run on a timer. sessions_yield vs cron separation principle: tasks that need reasoning belong in agent sessions; tasks that don't belong in cron/scheduled automation. Mixing them makes both worse. Yuki AI CEO confirmed this separation with Romain's correction: "New tool doesn't mean move everything there."
heartbeat for Active Pulse¶
Kelly's heartbeat mechanism: agents periodically update heartbeat with current activity and timestamp. Detects stuck agents by absence of updates. This is a file-based approximation of Gas Town's Deacon daemon (which actively patrols hooks structurally).
Key Patterns¶
Full Content in Prompt (Story-by-Story)¶
Kelly's factory produces narrative-rich artifacts — research summaries, TEA audits, decision logs — not just code. The "story" of each decision (context, options, rationale, outcome) is preserved alongside the output. This serves the memory model: narrative beats tables for LLM recall.
Queue Protocol¶
When the operator says "queue this", "add to queue", or similar → immediately update TODO.md with the queued item and its status. Keep heartbeat for active pulse only.
Operational Safety Rules¶
Revert path before breaking changes. Before modifying critical infrastructure (gateway bind, network exposure, auth settings, service mode), always know how to revert. If you can't revert safely in <5 min without outside help, you need a plan before touching it.
Never exfiltrate private data.
trash > rm (recoverable beats gone forever).
Commit Conventions¶
Router commits use the router: prefix so agent vs router changes are distinguishable:
git commit -m "router: soul: add no-edit-while-agent-working rule"
git commit -m "router: AGENTS.md: add office-hours subphase mapping"
Gaps vs Gas Town¶
The Kelly vs Gas Town gap analysis (Carson, 2026-04-26) identified these missing elements:
Missing Beads Unified Substrate¶
Kelly has 4 separate state mechanisms (pipeline state, done markers, TEA audit, heartbeat) that Beads would unify into a single git-versioned, SQL-queryable substrate. Adoption priority: High.
Missing GUPP Hook Enforcement¶
Kelly's yield-friendly model has no equivalent to GUPP's absolute "if hook is non-empty, you MUST run" rule. No architectural enforcement that a deferred sub-agent is violating a contract. Adoption priority: Medium-High.
Missing Autonomous Compounding Loops¶
Kelly's cron/TaskFlow handle scheduled tasks but none read their own prior outputs and compound. Yuki AI CEO's three production loops (New AI Models, Bug Autofix, SEO Optimizer) demonstrate the pattern: each reads its last run's output, takes the next step, and writes new output for next time. Adoption priority: Medium.
Missing Explicit Authority Matrix¶
Kelly's authority is gate-driven (PASS/FAIL before advancing) but has no written per-agent authority tiers with progressive transfer tracking. Gas Town's Mayor/Crew/Polecats and Yuki's three-tier authority matrix provide the model. Adoption priority: High.
Missing 30-Day Outcome Reviews¶
Kelly's quality gates are point-in-time (gate at stage transition). Yuki AI CEO adds temporal quality gates: every significant decision sets a 30-day review date, then assesses actual vs expected outcome. Adoption priority: Medium.
Gaps vs Yuki AI CEO¶
The Yuki AI CEO vs Kelly/Gas Town gap analysis (Carson, 2026-04-27) identified these Kelly-specific missing elements:
Missing Per-Project CLAUDE.md Files¶
Yuki's per-business CLAUDE.md files demonstrate that specialization lives in context files, not in distinct agent instances. The same AI CEO loads different context depending on which product it's working on. Kelly has projects/{id}/context.md files but they describe project state, not agent role and rules for that project. Missing: per-project CLAUDE.md-style context files that load when entering project scope.
Missing Autonomous Compounding Loops¶
Yuki's three production loops (daily 3am model scanner, daily 6am bug autofix, weekly SEO optimizer) are GUPP loops with compounding — each reads its own prior outputs. Kelly has no equivalent. Missing: persistent background agents that read their own prior outputs and compound over time.
Missing Authority Transfer Log¶
Yuki maintains a visible, version-controlled record of what autonomy has been earned (tier 3 → tier 2 → tier 1 movements). Kelly has no equivalent. Missing: written authority progression log per agent role.
Related¶
- kelly-gas-town-gap-analysis — full Gas Town comparison with adoption priorities
- kelly-handbook-multi-agent — Ch7 router/sub-agent architecture, RALPH protocol
- kelly-handbook-software-factory — Ch11 factory pipeline, TEA audit
- kelly-tweets-factory — Kelly's public tweets on factory evolution
- yuki-ai-ceo-vs-kelly-gas-town-gap — Yuki AI CEO cross-reference and synthesis recommendations
- yukicapital-ai-ceo-overview — Yuki Capital AI CEO patterns mapped to Kelly equivalents
Memory & Architecture Concepts¶
- five-layer-memory — the 5-layer memory system used for session persistence
- gateway-daemon — the Gateway daemon architecture
- session-isolation — session isolation model
- workspace-boot — workspace boot and context restoration
- exec-tool — the exec tool for shell command execution
- subagent-spawning — sub-agent spawn patterns