DONE Marker Protocol — Why Dual-Write Matters¶
Type: Operational pattern
Related: beads-adoption, kelly-handbook-ch15-troubleshooting
The Requirement¶
Every subphase must complete both:
- DONE marker file — written to the filesystem at
{done_dir}/DONE - Bead closure —
bd update --status in_revieworbddone <bead-id> <agent>
Neither alone is sufficient. This is the dual-write requirement, and BUG-01 is the reason it's mandatory.
What Each Tracks¶
| Mechanism | What It Tracks | Where It Lives |
|---|---|---|
| DONE marker | Artifact output — did the agent produce files? | Filesystem (done_dir/DONE) |
| Bead closure | Work state — is the task complete? | Beads database (.beads/) |
These are different questions. An agent can mark a bead as closed without writing any files (bug, crash, premature completion). An agent can write files without closing the bead (forgot, script error, pre-Beads workflow). The dual-write catches both failure modes.
BUG-01: The DONE Marker Skip¶
The Symptom¶
Agents were completing work — producing correct artifacts, closing their beads — but skipping the DONE marker file. The pipeline continued, but prerequisite checks on subsequent spawns failed silently.
"Silently" is the key word. The spawn script checks for DONE markers from prerequisite steps. If a DONE marker is missing, the spawn doesn't fail loudly — it just doesn't find the prerequisite, and behavior depends on how the check is implemented.
The Root Cause¶
spawn-subphase.sh was extracting the Agent: field from workflow YAML frontmatter to determine who writes the DONE marker. But the extraction was case-sensitive, and some workflow files used agent: (lowercase) while others used Agent: (uppercase).
The script looked for Agent: — if the YAML used agent:, the extraction returned empty. With no agent name, the DONE marker write was skipped.
The Fix¶
A single-line case correction in spawn-subphase.sh:
# Before (broken):
agent=$(grep '^Agent:' "$workflow" | head -1 | cut -d: -f2 | xargs)
# After (fixed):
agent=$(grep -i '^agent:' "$workflow" | head -1 | cut -d: -f2 | xargs)
The -i flag makes the grep case-insensitive. Both Agent: and agent: now match.
Verification¶
After the fix: 14/14 subphases completed with DONE markers. Confirmed across two parallel pipeline runs.
The Dual-Write Checklist¶
Every workflow template now includes this in the completion instructions:
## Completion
1. Write your artifacts to the specified output directory
2. Write the DONE marker file:
- Path: `{done_dir}/DONE`
- Content: agent name, timestamp, brief summary
3. Close the bead:
```bash
bddone <bead-id> <agent-name>
```
4. Report STATUS: COMPLETE
Both the DONE marker AND the bead closure are required. Do not skip either.
DONE Marker Format¶
The DONE marker is a plain text file at {done_dir}/DONE:
Agent: carson
Phase: 1.1-intake
Completed: 2026-05-28T10:30:00Z
Some agents include additional context (summary, file count, test results), but the minimum is agent name and timestamp.
Bead Closure Format¶
The bddone wrapper handles bead closure:
bddone <bead-id> [agent-name]
# e.g. bddone workspace-mol-abc carson
This calls bd update <bead-id> --status closed with a timestamp and agent note. The bead's status changes from in_progress to closed, and the change is recorded in bead history.
Why Not Just One?¶
Why Not Beads Only?¶
Beads track state, not artifacts. A bead can be closed with bd update --status closed even if the agent produced no files. The beads system doesn't verify that output exists — it only records that someone said the work was done.
If you rely on beads alone, you can have "completed" steps with no actual output. Downstream agents that depend on those artifacts will fail.
Why Not DONE Markers Only?¶
DONE markers track artifacts, not workflow state. A DONE marker can exist from a previous run, a manual file creation, or a corrupted write. The filesystem doesn't know if the work was actually done — it only knows that a file exists at a path.
If you rely on DONE markers alone, you can have "completed" steps that were never actually executed by the pipeline. The beads system provides the audit trail of what actually happened.
Dual-Write Catches Both¶
| Scenario | Beads Only | DONE Only | Dual-Write |
|---|---|---|---|
| Agent completes correctly | ✅ Closed | ✅ File exists | ✅ Both |
| Agent closes bead but writes no files | ✅ Closed (wrong) | ❌ Missing | ❌ Caught |
| Files exist but bead never closed | ❌ Still open | ✅ Exists (stale) | ❌ Caught |
| Previous run's DONE marker exists | ❌ Not tracked | ✅ Exists (stale) | ❌ Caught (bead audit) |
Prerequisite Checking¶
Downstream steps check prerequisites by looking for DONE markers:
# In workflow YAML frontmatter:
prerequisites:
- "scaffold/1.1-intake/DONE"
- "planning/2.1-prd/DONE"
spawn-subphase.sh verifies these exist before spawning. If a DONE marker is missing, the spawn fails — preventing downstream work from starting on incomplete prerequisites.
This is why the DONE marker path must be exact. The formula's output_path must match where the marker actually lands. A mismatch (like the artifacts/ prefix bug from April 2026) causes prerequisite checks to fail on correct work or pass on missing work.
The artifacts/ Prefix Bug¶
In April 2026, the formula TOML files used output_path = "artifacts/planning/2.1-prd/" but actual DONE files went to planning/2.1-prd/DONE (no artifacts/ prefix). All workflow YAML files had the same bug in done_dir and prerequisites fields.
Impact: Prerequisites passed when they shouldn't have (stale DONE markers from a different path convention were found) or failed when they shouldn't have (correct DONE markers at the wrong path).
Fix: 30 YAML frontmatters updated to remove the artifacts/ prefix. Formula TOML output_path values corrected.
Lesson: The dual-write is only as good as the path consistency. If DONE markers land at unexpected paths, prerequisite checking breaks.
Enforcement¶
The dual-write is now mandatory in all workflow templates. The checklist appears in every workflow's completion section. The Router verifies both conditions before considering a step complete:
- DONE marker exists at the expected path
- Bead status is
closed
If either is missing, the Router respawns the step (RALPH retry) rather than advancing.
BUG-01 was discovered on April 6, 2026, when a pipeline completed 8/14 steps but only 6/14 had DONE markers. The root cause was a case-sensitivity issue in a single grep command. The fix was one line. The lesson was permanent: always dual-write.
Related¶
- beads-adoption — bead closure in the dual-write
- kelly-handbook-ch15-troubleshooting — BUG-01 debugging
Concept Cross-References¶
- agents/ralph-protocol — Router respawns step (RALPH retry) if dual-write is missing
- agents/quality-gates — Quality gates that verify DONE + bead closed before advancing
- tea-audit — TEA 3-phase audit that consumes DONE/bead state
- pipeline-state — Pipeline state machine that the dual-write drives