Untitled

Type: Operational pattern
Related: kelly-handbook-ch7-multi-agent, kelly-handbook-ch15-troubleshooting

What RALPH Is¶

RALPH is the factory's error handling protocol: Retry → Ask → Log → Pause → Handoff. When a subagent fails, the Router follows this escalation ladder:

Retry — up to 3 attempts, with diagnostics passed between retries
Ask — if retries exhaust, escalate to the operator
Log — record the failure with log-failure.sh
Pause — stop the pipeline at the failed step
Handoff — let the operator decide next steps

The protocol itself hasn't changed. These refinements are about how RALPH is applied — the edge cases that production use exposed.

The Problem¶

QA agents were testing whether pages loaded (READ) but never testing whether data could be created, updated, or deleted (CREATE, UPDATE, DELETE). The QA gate passed, but the application was broken for any write operation.

This went undetected because:
- QA agents defaulted to GET requests — the easiest thing to test
- No workflow template specified "test all CRUD operations"
- The quality gate checked "did QA run?" not "did QA test writes?"

The Fix¶

Every QA workflow now explicitly requires testing all CRUD operations:

CREATE — submit a form, verify data appears
READ — load the page, verify data displays
UPDATE — edit an existing record, verify changes persist
DELETE — remove a record, verify it's gone

The QA agent's task description must include: "Test all CRUD operations — not just page loads."

Why This Matters for RALPH¶

If QA only tests READ, it will pass broken applications. RALPH will never trigger because the failure isn't caught until the user tries to write data. By the time the user reports it, the pipeline has already moved on and the context is lost.

Lesson: RALPH can only catch failures that QA actually tests for. Expand QA scope before relying on RALPH to catch bugs.

The Problem¶

When the Router learned a lesson (e.g., "always test CRUD"), it wrote it to SELF_IMPROVEMENT.md or memory files. But sub-agents don't read the Router's memory. They read their own AGENTS.md, their skill files, and the task description. Lessons in the Router's notes never propagated to the agents that needed them.

A lesson learned by the Router in session 1 was forgotten by session 3's sub-agent.

The Fix¶

Lessons that apply to specific agents must go into those agents' skill files or AGENTS.md. The Router's memory is for the Router. Agent-specific knowledge belongs in agent-specific files.

Before: Router writes "QA must test CRUD" to memory. Next QA spawn doesn't test CRUD.

After: "QA must test CRUD" goes into the QA agent's AGENTS.md or task template. Every QA spawn reads it.

The Propagation Path¶

Knowledge Type	Where It Goes
Router operational lessons	Router's AGENTS.md or memory
Agent-specific behavior	That agent's AGENTS.md
Workflow-level rules	Workflow markdown file (in `factory/workflows/`)
Cross-agent patterns	Factory-level AGENTS.md or this KB

Lesson: RALPH fixes the immediate failure. Skill file updates prevent the next one. Both are required.

The Question¶

Does RALPH work when multiple pipelines run concurrently? If two pipelines both hit failures at the same time, does the Router handle both correctly?

The Answer: Yes¶

The April 2026 validation run tested this directly — test-web-run and factory-dashboard-rebuild ran simultaneously with three agents (amelia, testlead, phil). Both pipelines had steps that needed retries. RALPH handled them independently:

Each pipeline has its own bead tracking — failures are pipeline-scoped
RALPH retries are step-scoped — failing step 4.2 in pipeline A doesn't affect step 3.1 in pipeline B
The Router processes completion events sequentially — no race conditions

What We Learned¶

Token usage varies wildly by step type — Python scripts use ~15-30K tokens per step; TypeScript security scans use ~270K. Budget accordingly for parallel runs.
Total runtime was ~1.5 hours for two full pipelines (SCAFFOLD through DEPLOY). Parallel execution saved roughly 40% wall-clock time vs sequential.
QA catches what build misses — in factory-dashboard-rebuild, QA added 22 new tests and caught a CSS class name bug (in_progress/closed vs active/complete/pending). The build agent never noticed.

The Rule¶

From the factory's AGENTS.md: "Never spin if same error 2x in a row, escalate immediately."

This is a RALPH refinement that prevents wasted retries. If attempt 1 fails with error X, and attempt 2 fails with the same error X, don't try attempt 3 with the same inputs. Escalate.

How It Works¶

Between retries, the Router collects diagnostics:
- The error message
- The output produced so far
- Any partial artifacts

These diagnostics are passed to the next retry attempt so it can learn from the failure. But if the diagnostics show the same error twice, the Router skips retry 3 and goes straight to escalation.

Not all failures are equal. RALPH handles them differently:

Failure Type	RALPH Response
Context overflow	Break work into smaller chunks, retry
Lost results (agent said done but no output)	Respawn — don't ask, just respawn
Infinite tool loop	Add stopping conditions, retry
Agent bug (wrong output)	Log + escalate — the agent needs fixing
Timeout	Respawn immediately — don't wait

The key insight: respawn-first for transient failures, escalate-first for persistent failures. RALPH's job is to distinguish between the two.

Refinement	What Changed
CRUD testing	QA must test all operations, not just READ
Lesson propagation	Agent-specific knowledge → agent-specific files
Parallel pipelines	Confirmed RALPH works across concurrent runs
Same-error detection	Skip retry 3 if error 2 matches error 1
Failure type routing	Different failure types get different RALPH responses

None of these change RALPH's core protocol. They're all about making the protocol work correctly in the messy reality of production pipelines.

These refinements were discovered between March and May 2026, across 10+ pipeline runs. The CRUD gap alone was responsible for two production bugs before it was identified and fixed.

kelly-handbook-ch7-multi-agent — base RALPH protocol
kelly-handbook-ch15-troubleshooting — troubleshooting patterns

Concept Cross-References¶

agents/ralph-protocol — The base Retry/Ask/Log/Pause/Handoff protocol
agents/quality-gates — Quality gates and failure type routing
tea-agent — TEA agent that runs RALPH on sub-agent failures
tea-audit — TEA 3-phase audit that RALPH escalates to
code-review/advisory-code-review — Advisory code review (complementary to QA testing refinement)
code-review/regression-provenance — Blame tracking for production bugs RALPH catches

What RALPH Is¶

Refinement 1: QA Agents Must Test CRUD, Not Just READ¶

The Problem¶

The Fix¶

Why This Matters for RALPH¶

Refinement 2: Lessons Must Go into Agent Skill Files¶

The Problem¶

The Fix¶

The Propagation Path¶

Refinement 3: Parallel Pipelines Confirmed Working¶

The Question¶

The Answer: Yes¶

What We Learned¶

Refinement 4: Never Spin on the Same Error¶

The Rule¶

How It Works¶

Refinement 5: Failure Type Matters¶

Concept Cross-References¶

RALPH Refinements — Lessons from Production Use¶

What RALPH Is¶

Refinement 1: QA Agents Must Test CRUD, Not Just READ¶

The Problem¶

The Fix¶

Why This Matters for RALPH¶

Refinement 2: Lessons Must Go into Agent Skill Files¶

The Problem¶

The Fix¶

The Propagation Path¶

Refinement 3: Parallel Pipelines Confirmed Working¶

The Question¶

The Answer: Yes¶

What We Learned¶

Refinement 4: Never Spin on the Same Error¶

The Rule¶

How It Works¶

Refinement 5: Failure Type Matters¶

Summary of Refinements¶

Related¶

Concept Cross-References¶