Kelly Handbook: Multi-Agent Orchestration¶
Date Compiled: 2026-04-27
Summary: Chapter 7 of the OpenClaw Handbook introduces the Kelly Router architecture — a pattern where a central orchestrator delegates all work to specialized sub-agents rather than doing the work itself. The architecture solves single-agent limitations (context overflow, sequential bottlenecks, lack of specialization) through parallel sub-agent execution, structured handoffs, and quality gates between phases.
Key Concepts¶
- Kelly Router (Main Agent): Never does the work — only routes, validates gates, and communicates with the operator
- Sub-agents: Spawned workers that execute specific tasks in parallel; each specialized
- Named Lead Agents: Research Lead, Project Lead, Test Lead — each orchestrating their own sub-agents
- RALPH Protocol: Retry And Learn Protocol — 3 attempts max per task, escalate on repeated failures
- Artifact Pattern: Structured project directories with gate files (READY/NOT-READY, PASS/FAIL)
- Quality Gates: Checkpoint files between phases that must pass before progression
Notable Patterns¶
The Router Architecture¶
Operator → Kelly Router (main agent)
│
┌────────┼────────┐
│ │ │
Research Project Test
Lead Lead Lead
│ │ │
Research Planning Testing
Sub- Sub- Sub-
agents agents agents
The main agent maintains the strategic view — tracking where things are, flagging stuck items, keeping the flow moving. It reads workflow files and spawns the correct agent for each subphase.
Sub-agent Spawning¶
Sub-agents are spawned with a label, task definition, and output directory. They can work in parallel — what takes 15 minutes sequentially takes 5 in parallel when three are spawned simultaneously.
The subagents tool supports: spawn, list, steer (send messages to running agents), and kill.
AGENTS.md Structure¶
Every project needs an AGENTS.md defining:
1. Role — what is the main agent's job?
2. Named agents — what specialized agents exist?
3. Intake procedures — how is new work handled?
4. Routing rules — which agent gets what type of work?
5. Quality gates — what checks between phases?
6. Escalation protocol — what when things break?
7. Memory protocol — what gets written where?
Gate Validation Pattern¶
/projects/{id}/
├── intake.md
├── research-artifacts/research-summary.md → READY / NOT-READY
├── planning-artifacts/planning-summary.md → PASS / FAIL
└── implementation-artifacts/
Before routing to the next phase, the router confirms the gate file exists and reads its decision. Only proceeds if gate passes.
Failure Patterns and Fixes¶
| Pattern | Signs | Fix |
|---|---|---|
| Context Overflow | Repetitive responses, forgets instructions | Break into smaller chunks |
| Going Off-Script | Output files not requested | Be more prescriptive |
| Lost Results | "Done" but files missing | Explicitly validate artifacts |
| Infinite Tool Loops | Excessive tool calls | Add explicit stopping conditions |
RALPH Escalation¶
- Any sub-agent failure → retry
- Same failure twice → escalate (don't waste third attempt)
- Three failures → mandatory escalation with structured diagnostic
- Unrecoverable → immediate escalation with operator decision requested
Related¶
kelly-handbook-software-factory, kelly-tweets-agents, kelly-tweets-factory