RALPH Refinements — Lessons from Production Use¶
Type: Operational pattern
Related: kelly-handbook-ch7-multi-agent, kelly-handbook-ch15-troubleshooting
What RALPH Is¶
RALPH is the factory's error handling protocol: Retry → Ask → Log → Pause → Handoff. When a subagent fails, the Router follows this escalation ladder:
- Retry — up to 3 attempts, with diagnostics passed between retries
- Ask — if retries exhaust, escalate to the operator
- Log — record the failure with
log-failure.sh - Pause — stop the pipeline at the failed step
- Handoff — let the operator decide next steps
The protocol itself hasn't changed. These refinements are about how RALPH is applied — the edge cases that production use exposed.
Refinement 1: QA Agents Must Test CRUD, Not Just READ¶
The Problem¶
QA agents were testing whether pages loaded (READ) but never testing whether data could be created, updated, or deleted (CREATE, UPDATE, DELETE). The QA gate passed, but the application was broken for any write operation.
This went undetected because:
- QA agents defaulted to GET requests — the easiest thing to test
- No workflow template specified "test all CRUD operations"
- The quality gate checked "did QA run?" not "did QA test writes?"
The Fix¶
Every QA workflow now explicitly requires testing all CRUD operations:
- CREATE — submit a form, verify data appears
- READ — load the page, verify data displays
- UPDATE — edit an existing record, verify changes persist
- DELETE — remove a record, verify it's gone
The QA agent's task description must include: "Test all CRUD operations — not just page loads."
Why This Matters for RALPH¶
If QA only tests READ, it will pass broken applications. RALPH will never trigger because the failure isn't caught until the user tries to write data. By the time the user reports it, the pipeline has already moved on and the context is lost.
Lesson: RALPH can only catch failures that QA actually tests for. Expand QA scope before relying on RALPH to catch bugs.
Refinement 2: Lessons Must Go into Agent Skill Files¶
The Problem¶
When the Router learned a lesson (e.g., "always test CRUD"), it wrote it to SELF_IMPROVEMENT.md or memory files. But sub-agents don't read the Router's memory. They read their own AGENTS.md, their skill files, and the task description. Lessons in the Router's notes never propagated to the agents that needed them.
A lesson learned by the Router in session 1 was forgotten by session 3's sub-agent.
The Fix¶
Lessons that apply to specific agents must go into those agents' skill files or AGENTS.md. The Router's memory is for the Router. Agent-specific knowledge belongs in agent-specific files.
Before: Router writes "QA must test CRUD" to memory. Next QA spawn doesn't test CRUD.
After: "QA must test CRUD" goes into the QA agent's AGENTS.md or task template. Every QA spawn reads it.
The Propagation Path¶
| Knowledge Type | Where It Goes |
|---|---|
| Router operational lessons | Router's AGENTS.md or memory |
| Agent-specific behavior | That agent's AGENTS.md |
| Workflow-level rules | Workflow markdown file (in factory/workflows/) |
| Cross-agent patterns | Factory-level AGENTS.md or this KB |
Lesson: RALPH fixes the immediate failure. Skill file updates prevent the next one. Both are required.
Refinement 3: Parallel Pipelines Confirmed Working¶
The Question¶
Does RALPH work when multiple pipelines run concurrently? If two pipelines both hit failures at the same time, does the Router handle both correctly?
The Answer: Yes¶
The April 2026 validation run tested this directly — test-web-run and factory-dashboard-rebuild ran simultaneously with three agents (amelia, testlead, phil). Both pipelines had steps that needed retries. RALPH handled them independently:
- Each pipeline has its own bead tracking — failures are pipeline-scoped
- RALPH retries are step-scoped — failing step 4.2 in pipeline A doesn't affect step 3.1 in pipeline B
- The Router processes completion events sequentially — no race conditions
What We Learned¶
- Token usage varies wildly by step type — Python scripts use ~15-30K tokens per step; TypeScript security scans use ~270K. Budget accordingly for parallel runs.
- Total runtime was ~1.5 hours for two full pipelines (SCAFFOLD through DEPLOY). Parallel execution saved roughly 40% wall-clock time vs sequential.
- QA catches what build misses — in
factory-dashboard-rebuild, QA added 22 new tests and caught a CSS class name bug (in_progress/closedvsactive/complete/pending). The build agent never noticed.
Refinement 4: Never Spin on the Same Error¶
The Rule¶
From the factory's AGENTS.md: "Never spin if same error 2x in a row, escalate immediately."
This is a RALPH refinement that prevents wasted retries. If attempt 1 fails with error X, and attempt 2 fails with the same error X, don't try attempt 3 with the same inputs. Escalate.
How It Works¶
Between retries, the Router collects diagnostics:
- The error message
- The output produced so far
- Any partial artifacts
These diagnostics are passed to the next retry attempt so it can learn from the failure. But if the diagnostics show the same error twice, the Router skips retry 3 and goes straight to escalation.
Refinement 5: Failure Type Matters¶
Not all failures are equal. RALPH handles them differently:
| Failure Type | RALPH Response |
|---|---|
| Context overflow | Break work into smaller chunks, retry |
| Lost results (agent said done but no output) | Respawn — don't ask, just respawn |
| Infinite tool loop | Add stopping conditions, retry |
| Agent bug (wrong output) | Log + escalate — the agent needs fixing |
| Timeout | Respawn immediately — don't wait |
The key insight: respawn-first for transient failures, escalate-first for persistent failures. RALPH's job is to distinguish between the two.
Summary of Refinements¶
| Refinement | What Changed |
|---|---|
| CRUD testing | QA must test all operations, not just READ |
| Lesson propagation | Agent-specific knowledge → agent-specific files |
| Parallel pipelines | Confirmed RALPH works across concurrent runs |
| Same-error detection | Skip retry 3 if error 2 matches error 1 |
| Failure type routing | Different failure types get different RALPH responses |
None of these change RALPH's core protocol. They're all about making the protocol work correctly in the messy reality of production pipelines.
These refinements were discovered between March and May 2026, across 10+ pipeline runs. The CRUD gap alone was responsible for two production bugs before it was identified and fixed.
Related¶
- kelly-handbook-ch7-multi-agent — base RALPH protocol
- kelly-handbook-ch15-troubleshooting — troubleshooting patterns
Concept Cross-References¶
- agents/ralph-protocol — The base Retry/Ask/Log/Pause/Handoff protocol
- agents/quality-gates — Quality gates and failure type routing
- tea-agent — TEA agent that runs RALPH on sub-agent failures
- tea-audit — TEA 3-phase audit that RALPH escalates to
- code-review/advisory-code-review — Advisory code review (complementary to QA testing refinement)
- code-review/regression-provenance — Blame tracking for production bugs RALPH catches