Kelly Handbook: Multi-Agent Orchestration¶

Date Compiled: 2026-04-27
Summary: Chapter 7 of the OpenClaw Handbook introduces the Kelly Router architecture — a pattern where a central orchestrator delegates all work to specialized sub-agents rather than doing the work itself. The architecture solves single-agent limitations (context overflow, sequential bottlenecks, lack of specialization) through parallel sub-agent execution, structured handoffs, and quality gates between phases.

Key Concepts¶

Kelly Router (Main Agent): Never does the work — only routes, validates gates, and communicates with the operator
Sub-agents: Spawned workers that execute specific tasks in parallel; each specialized
Named Lead Agents: Research Lead, Project Lead, Test Lead — each orchestrating their own sub-agents
RALPH Protocol: Retry And Learn Protocol — 3 attempts max per task, escalate on repeated failures
Artifact Pattern: Structured project directories with gate files (READY/NOT-READY, PASS/FAIL)
Quality Gates: Checkpoint files between phases that must pass before progression

Notable Patterns¶

The Router Architecture¶

Operator → Kelly Router (main agent)
              │
     ┌────────┼────────┐
     │        │        │
 Research  Project    Test
 Lead     Lead      Lead
     │        │        │
  Research  Planning  Testing
  Sub-      Sub-      Sub-
  agents    agents    agents

The main agent maintains the strategic view — tracking where things are, flagging stuck items, keeping the flow moving. It reads workflow files and spawns the correct agent for each subphase.

Sub-agent Spawning¶

Sub-agents are spawned with a label, task definition, and output directory. They can work in parallel — what takes 15 minutes sequentially takes 5 in parallel when three are spawned simultaneously.

The subagents tool supports: spawn, list, steer (send messages to running agents), and kill.

AGENTS.md Structure¶

Every project needs an AGENTS.md defining:
1. Role — what is the main agent's job?
2. Named agents — what specialized agents exist?
3. Intake procedures — how is new work handled?
4. Routing rules — which agent gets what type of work?
5. Quality gates — what checks between phases?
6. Escalation protocol — what when things break?
7. Memory protocol — what gets written where?

Gate Validation Pattern¶

/projects/{id}/
├── intake.md
├── research-artifacts/research-summary.md  → READY / NOT-READY
├── planning-artifacts/planning-summary.md  → PASS / FAIL
└── implementation-artifacts/

Before routing to the next phase, the router confirms the gate file exists and reads its decision. Only proceeds if gate passes.

Failure Patterns and Fixes¶

Pattern	Signs	Fix
Context Overflow	Repetitive responses, forgets instructions	Break into smaller chunks
Going Off-Script	Output files not requested	Be more prescriptive
Lost Results	"Done" but files missing	Explicitly validate artifacts
Infinite Tool Loops	Excessive tool calls	Add explicit stopping conditions

RALPH Escalation¶

Any sub-agent failure → retry
Same failure twice → escalate (don't waste third attempt)
Three failures → mandatory escalation with structured diagnostic
Unrecoverable → immediate escalation with operator decision requested

kelly-handbook-software-factory, kelly-tweets-agents, kelly-tweets-factory