Closed Feedback Loop¶

Type: Architectural principle
Referenced from: closed-loop-agent-control

Definition¶

A closed feedback loop in agent architecture means the agent has direct, programmatic access to the target system's state and can act on it without human intermediation. The loop is: agent acts → state updates → agent reads state → agent acts again. This contrasts with human-in-the-loop designs where the agent produces output, a human observes and interprets, then feeds directions back — introducing latency and information loss.

Nikita M.'s formulation: "If I give the agent the right tools and close the feedback loop, even a smaller model with a closed loop can outperform a stronger model that relies on human-observed feedback."

Three Mechanisms That Make It Work¶

Latency Collapse — Human-in-the-loop has inherent delay: agent → human observes → human interprets → human articulates → agent receives. With tools, the loop is: agent acts → state updates → agent reads state. Bounded only by tool call roundtrip.
Information Fidelity — Human feedback is lossy ("the bird hit the ceiling" loses velocity, position, timing). Structured state access returns complete, parseable data — no interpretation needed.
Temporal Control — The agent decides when the world advances (via clock control), transforming from passive observer to active experimenter.

Design Principles (from the experiment)¶

Give agents structured state access — not screenshots, not logs, structured data they can parse programmatically
Let agents control execution flow — clock control, pause/resume, step-through
Minimize the observation-to-action cycle — every tool call should return immediately actionable information
Embed the endpoint in the target system — not a sidecar, not a proxy, the actual binary

tool-mediated-game-state-control — the specific tool implementation of this principle
time-as-knob — temporal control as the most novel mechanism in the loop
model-size-agnostic-iteration — the implication that loop quality > model quality
operator-control-patterns — factory's human-in-the-loop version of the same principle

Open Questions¶

Does this generalize beyond games (deterministic, machine-readable state) to web apps, APIs, business processes?
What's the minimum viable toolset? (Nikita used 3 tools; would 2 suffice?)
At what model quality does the loop advantage disappear?

Closed Feedback Loop¶

Definition¶

Three Mechanisms That Make It Work¶

Design Principles (from the experiment)¶

Related Concepts¶

Open Questions¶