Failure Radius

Type: Risk management concept
Referenced from: audit-test-automate-ai-delegation

Definition

The failure radius is the worst-case downside when AI output cannot be directly verified. Instead of trying to verify the unverifiable, you bound the damage.

"When verification is structurally impossible, the answer isn't 'verify harder' — it's 'bound the downside.'"

Two Cases of Evaluation

Case 1: Verifiable Output

If you can judge the work (code if you're a programmer, design if you're a designer):
- Rule: Never spend more time reviewing AI work than you'd spend reviewing human work
- You're the main cost, not the AI — exceeding review time means you've increased cost instead of decreasing it

Case 2: Unverifiable Output

When you can't evaluate the output (like judging your CPA's work or your doctor's prescription):
- Traditional solution: operate from trust (known people, known brands)
- AI isn't "people" — so trust doesn't apply
- New approach: Define the failure radius — what's the worst that can happen?

Examples

Task Verification Failure Radius Action
Vibe coding (non-coder) Can't judge code Run locally, test behavior, keep away from production Bound by isolation
Comcast credit negotiation Can evaluate (money back) Near zero (worst case: Comcast says no) Safe to automate
Health provider report Can evaluate (documents) Too large (expulsion risk) Add human review step

Application to Agent Design

  • Quality gates = failure radius controls (catch bad output before it propagates)
  • Human review steps = explicit failure radius bounds for high-stakes tasks
  • Permission boundaries = scope limits that contain failure radius
  • The pipeline's NOT-READY gate decision IS a failure radius mechanism