Story-by-Story Build — Context-Bounded Implementation

Type: Operational pattern
Related: kelly-handbook-ch7-multi-agent, kelly-handbook-ch8-memory


The Problem: Context Overflow in Long BUILD Phases

When a BUILD phase has 20+ stories, handing the agent the entire sprint backlog in a single session doesn't work. The agent's context window fills up with:

  • The full task description (all stories)
  • Every file it reads to understand the codebase
  • Every file it writes or edits
  • Tool call history (exec output, file contents)
  • Its own reasoning

By story 5 or 6, the agent is spending most of its context re-reading files it already read. By story 10, it starts looping — reading the same files repeatedly without producing output. By story 15, it may hallucinate or produce duplicate code.

The Solution: One Story Per Agent Session

The factory spawns one story at a time. Each story gets its own subagent session with:

  • Bounded context — only the files relevant to that story
  • Fresh state — no accumulated tool history from previous stories
  • Clear scope — the agent knows exactly what to build

The Pattern

For each story in the sprint backlog:
    1. spawn-subphase.sh <project> <step> <pipeline>
    2. Task description includes ONE story + relevant context files
    3. sessions_yield — wait for completion
    4. Verify DONE marker + bead closure
    5. Next story

Stories are sequential within a build step. The agent finishes story N before story N+1 starts.

What the Task Description Looks Like

<!-- BEAD: workspace-123 -->

## Task: Implement STORY-003 — User Authentication

**Project:** my-project
**Story:** implementation-artifacts/stories/STORY-003-auth-login.md
**Context files:**
- implementation-artifacts/stories/STORY-001-models.md (completed — reference only)
- implementation-artifacts/SPEC.md

**Requirements:**
- Implement login/logout endpoints
- Add session middleware
- Write tests for auth flow

**Output:** Code in implementation-artifacts/, tests passing

The agent gets ONE story, the minimum context needed, and a clear output expectation.

The Anti-Pattern: Multi-Story Spawns

Spawning multiple stories in one session is the most common cause of build failures in the factory.

What Happens

  1. Agent receives 5 stories in one task description
  2. Reads all relevant files for story 1 — context used: 20%
  3. Implements story 1 — context used: 35%
  4. Reads files for story 2 — context used: 50%
  5. Implements story 2 — context used: 65%
  6. By story 3, context is tight — agent re-reads files it already loaded
  7. By story 4, agent is looping — reading models.py for the third time
  8. Story 5 never gets written

Real Example: Sally's Multi-Storyboard Failure

Sally (the UI/UX agent) was given a multi-storyboard spawn — implement 9 storyboard frames in one session. The result:

  • Frame 1-3: implemented correctly
  • Frame 4-6: implemented with copy-paste errors from frames 1-3
  • Frame 7-9: never produced — Sally looped reading the spec file

The fix: spawn Sally once per storyboard frame. All 9 frames completed successfully in separate sessions.

Real Example: Amelia's Success with Separate Sessions

Amelia (the build agent) completed 24+ stories across test-web-run and factory-dashboard-rebuild — all in separate sessions. Each session produced clean, working code. No loops, no context overflow, no duplicate work.

Context Window Budget

The practical limit depends on the model, but the factory's rule of thumb:

Stories per Session Outcome
1 Reliable — agent finishes with context to spare
2-3 Usually works — but watch for re-reads
4-5 Risky — agent may loop on later stories
6+ Almost certain to fail — context overflow

For complex stories (many files, large codebase), even 2 stories may be too many. When in doubt, use 1.

Implementation Details

Step Ordering

Stories are assigned to steps in the pipeline formula. Each build step maps to one or more stories:

4.1 → Sprint Planning (assigns stories to steps)
4.2 → STORY-001 implementation
4.3 → STORY-002 implementation
4.4 → STORY-003 implementation

The auto-spawn-chain handles chaining: 4.2 completes → 4.3 spawns → 4.4 spawns → done.

Context Files

Each story's task description includes only the context files it needs:
- The story file itself
- The SPEC (if it exists)
- Previously completed stories that this story depends on
- Relevant design artifacts

The agent does not get the full backlog, other stories' files, or unrelated artifacts.

Verification

After each story:
1. DONE marker exists at the expected path
2. Bead is closed
3. Code compiles / tests pass (if QA step follows)

If verification fails, RALPH handles the retry — but only for that one story, not the entire backlog.

When to Break This Rule

One story per session is the default. Exceptions:

  • Trivial stories — if two stories are 5-line changes with no overlap, they can share a session. But this is rare.
  • Story dependencies — if story B depends on story A's exact implementation, running them in the same session ensures consistency. But the better fix is to make story B's task description include story A's output as context.
  • Token budget pressure — if running many small stories and token cost is a concern, batching 2-3 may save money. But watch for the looping signs.

The Core Lesson

Context windows are finite. The factory's job is to bound the work to match. One story per session is the simplest bound — and the most reliable.

The pattern scales: 20 stories = 20 sessions = 20 fresh context windows. Each one reliable. The auto-spawn-chain makes this hands-free.


This pattern was validated in April 2026 when Sally's multi-storyboard spawn failed and Amelia's single-story spawns succeeded across 24+ stories. The lesson was immediate and clear: bound the work, bound the context.

Concept Cross-References

  • world-model — story-by-story builds maintain bounded context windows, which is the memory management equivalent of world-model: keeping the cognitive substrate small and demand-loaded
  • lobster-pipelines — the sequential one-story-per-session pattern mirrors lobster-pipelines: each story is a checkpointed, verifiable unit that feeds into the next step

Based on operational experience building upwork-online-school (24+ stories), upwork-rv-chatbot, and camping-checklist pipelines. Pattern validated across Amelia (build agent), Sally (design agent), and Carson (research agent).