Closed Feedback Loop

Type: Architectural principle
Referenced from: closed-loop-agent-control

Definition

A closed feedback loop in agent architecture means the agent has direct, programmatic access to the target system's state and can act on it without human intermediation. The loop is: agent acts → state updates → agent reads state → agent acts again. This contrasts with human-in-the-loop designs where the agent produces output, a human observes and interprets, then feeds directions back — introducing latency and information loss.

Nikita M.'s formulation: "If I give the agent the right tools and close the feedback loop, even a smaller model with a closed loop can outperform a stronger model that relies on human-observed feedback."

Three Mechanisms That Make It Work

  1. Latency Collapse — Human-in-the-loop has inherent delay: agent → human observes → human interprets → human articulates → agent receives. With tools, the loop is: agent acts → state updates → agent reads state. Bounded only by tool call roundtrip.

  2. Information Fidelity — Human feedback is lossy ("the bird hit the ceiling" loses velocity, position, timing). Structured state access returns complete, parseable data — no interpretation needed.

  3. Temporal Control — The agent decides when the world advances (via clock control), transforming from passive observer to active experimenter.

Design Principles (from the experiment)

  1. Give agents structured state access — not screenshots, not logs, structured data they can parse programmatically
  2. Let agents control execution flow — clock control, pause/resume, step-through
  3. Minimize the observation-to-action cycle — every tool call should return immediately actionable information
  4. Embed the endpoint in the target system — not a sidecar, not a proxy, the actual binary

Open Questions

  • Does this generalize beyond games (deterministic, machine-readable state) to web apps, APIs, business processes?
  • What's the minimum viable toolset? (Nikita used 3 tools; would 2 suffice?)
  • At what model quality does the loop advantage disappear?