Kelly Handbook: Multi-Agent Orchestration
Summary: Chapter 7 of the OpenClaw Handbook introduces the Kelly Router architecture — a pattern where a central orchestrator delegates all work to specialized sub-agents rather than doing the work itself. The architecture solves single-agent limitations (context overflow, sequential bottlenecks, lack of specialization) through parallel sub-agent execution, structured handoffs, and quality gates between phases.
Key Concepts
- **Kelly Router (Main Agent):** Never does the work — only routes, validates gates, and communicates with the operator
- **Sub-agents:** Spawned workers that execute specific tasks in parallel; each specialized
- **Named Lead Agents:** Research Lead, Project Lead, Test Lead — each orchestrating their own sub-agents
- **RALPH Protocol:** Retry And Learn Protocol — 3 attempts max per task, escalate on repeated failures
- **Artifact Pattern:** Structured project directories with gate files (READY/NOT-READY, PASS/FAIL)
- **Quality Gates:** Checkpoint files between phases that must pass before progression
Notable Patterns
The Router Architecture
Operator → Kelly Router (main agent) │ ┌────────┼────────┐ │ │ │ Research Project Test Lead Lead Lead │ │ │ Research Planning Testing Sub- Sub- Sub- agents agents agents
The main agent maintains the strategic view — tracking where things are, flagging stuck items, keeping the flow moving. It reads workflow files and spawns the correct agent for each subphase.
Sub-agent Spawning
Sub-agents are spawned with a label, task definition, and output directory. They can work in parallel — what takes 15 minutes sequentially takes 5 in parallel when three are spawned simultaneously.
The subagents tool supports: spawn, list, steer (send messages to running agents), and kill.
AGENTS.md Structure
Every project needs an AGENTS.md defining:
- Role — what is the main agent's job?
- Named agents — what specialized agents exist?
- Intake procedures — how is new work handled?
- Routing rules — which agent gets what type of work?
- Quality gates — what checks between phases?
- Escalation protocol — what when things break?
- Memory protocol — what gets written where?
Gate Validation Pattern
/projects/{id}/ ├── intake.md ├── research-artifacts/research-summary.md → READY / NOT-READY ├── planning-artifacts/planning-summary.md → PASS / FAIL └── implementation-artifacts/
Before routing to the next phase, the router confirms the gate file exists and reads its decision. Only proceeds if gate passes.
Failure Patterns and Fixes
| Pattern | Signs | Fix |
|---|---|---|
| Context Overflow | Repetitive responses, forgets instructions | Break into smaller chunks |
| Going Off-Script | Output files not requested | Be more prescriptive |
| Lost Results | "Done" but files missing | Explicitly validate artifacts |
| Infinite Tool Loops | Excessive tool calls | Add explicit stopping conditions |
RALPH Escalation
- Any sub-agent failure → retry
- Same failure twice → escalate (don't waste third attempt)
- Three failures → mandatory escalation with structured diagnostic
- Unrecoverable → immediate escalation with operator decision requested
Related
[[kelly-handbook-software-factory]], [[kelly-tweets-agents]], [[kelly-tweets-factory]]