Kelly Handbook: Multi-Agent Orchestration

Summary: Chapter 7 of the OpenClaw Handbook introduces the Kelly Router architecture — a pattern where a central orchestrator delegates all work to specialized sub-agents rather than doing the work itself. The architecture solves single-agent limitations (context overflow, sequential bottlenecks, lack of specialization) through parallel sub-agent execution, structured handoffs, and quality gates between phases.

Key Concepts

**Kelly Router (Main Agent):** Never does the work — only routes, validates gates, and communicates with the operator
**Sub-agents:** Spawned workers that execute specific tasks in parallel; each specialized
**Named Lead Agents:** Research Lead, Project Lead, Test Lead — each orchestrating their own sub-agents
**RALPH Protocol:** Retry And Learn Protocol — 3 attempts max per task, escalate on repeated failures
**Artifact Pattern:** Structured project directories with gate files (READY/NOT-READY, PASS/FAIL)
**Quality Gates:** Checkpoint files between phases that must pass before progression

Notable Patterns

The Router Architecture

Operator → Kelly Router (main agent)               │      ┌────────┼────────┐      │        │        │  Research  Project    Test  Lead     Lead      Lead      │        │        │   Research  Planning  Testing   Sub-      Sub-      Sub-   agents    agents    agents

The main agent maintains the strategic view — tracking where things are, flagging stuck items, keeping the flow moving. It reads workflow files and spawns the correct agent for each subphase.

Sub-agent Spawning

Sub-agents are spawned with a label, task definition, and output directory. They can work in parallel — what takes 15 minutes sequentially takes 5 in parallel when three are spawned simultaneously.

The subagents tool supports: spawn, list, steer (send messages to running agents), and kill.

AGENTS.md Structure

Every project needs an AGENTS.md defining:

Role — what is the main agent's job?
Named agents — what specialized agents exist?
Intake procedures — how is new work handled?
Routing rules — which agent gets what type of work?
Quality gates — what checks between phases?
Escalation protocol — what when things break?
Memory protocol — what gets written where?

Gate Validation Pattern

/projects/{id}/ ├── intake.md ├── research-artifacts/research-summary.md  → READY / NOT-READY ├── planning-artifacts/planning-summary.md  → PASS / FAIL └── implementation-artifacts/

Before routing to the next phase, the router confirms the gate file exists and reads its decision. Only proceeds if gate passes.

Failure Patterns and Fixes

Pattern	Signs	Fix
Context Overflow	Repetitive responses, forgets instructions	Break into smaller chunks
Going Off-Script	Output files not requested	Be more prescriptive
Lost Results	"Done" but files missing	Explicitly validate artifacts
Infinite Tool Loops	Excessive tool calls	Add explicit stopping conditions

RALPH Escalation

Any sub-agent failure → retry
Same failure twice → escalate (don't waste third attempt)
Three failures → mandatory escalation with structured diagnostic
Unrecoverable → immediate escalation with operator decision requested

[[kelly-handbook-software-factory]], [[kelly-tweets-agents]], [[kelly-tweets-factory]]