Audit, Test, Automate: How We Decide What AI Can Own

Type: Field note / framework
Source: deviantabstraction.com — 2026-06-02
Author: Compiler startup founder (two-person, pre-revenue)


Core Framework: The 3-Questions Test

A task is suitable for AI delegation when all three conditions hold:

  1. Publicly documented — The domain is learnable from public sources (books, standards, accredited courses). Frontier labs train on everything public, so if it's somewhat public, the LLM likely knows it.

  2. Average result is fine — If the task doesn't need outlier quality, average AI output is acceptable. LinkedIn writing fails here (needs above-average to cut through noise).

  3. Can eval or bound failure radius — Either you can directly judge the output, or you can define the worst-case downside and keep it contained.

Key Nuances

  • Translation problems are a sweet spot: converting between well-documented formalisms (code↔docs, spec↔tests, prose↔structured data). The transformer architecture was built for translation, and it forces tacit assumptions to surface — vernacular hides assumptions, formal target languages can't.

  • "Publicly documented" doesn't mean "there are blog posts." It means an outsider could learn it from public sources.

  • Review cost principle: Never spend more time reviewing AI work than you'd spend reviewing human work. You're the main cost, not the AI.

  • Failure radius: When verification is structurally impossible, the answer isn't "verify harder" — it's "bound the downside." Vibe coding is the canonical example: run locally, test behavior, keep away from production.

  • Delegation redesign: Most tasks fail the test initially. That's a signal to redesign the process — add checklists, tests, human review steps, narrower scope, permission boundaries.

Process: Audit → Log → Test → Redesign

Step 1: Task Audit (1 month)

Log every task you do, how long it takes, and everything you wanted to do but didn't. This becomes the automation backlog.

Step 2: Group & Analyze

Dump logs into LLM, ask it to group them. Create a spreadsheet of internal processes and durations.

Step 3: Apply 3-Questions Test

Run each task through the three questions. Tasks that fail indicate processes still too vague.

Step 4: Redesign Before Delegating

Add checklists, tests, human review steps, narrower scope, permission boundaries. Make work inspectable → delegable → automatable.

Examples

Task Publicly Doc'd Average OK Eval/Failure Radius Result
Comcast outage credit ✓ (experiment) ✓ (near-zero: worst case = no) ✅ Automated
LinkedIn post ✗ (needs above-avg) ❌ Not automated
Patent filing (routine) ✓ (for small co) ✓ (bounded: alternative = don't file) ✅ 90% AI
Health provider report ✗ (failure radius too large — expulsion risk) ⚠️ Added human review

Key Quote

"AI is a forcing function for better delegation. It rewards clear inputs, bounded tasks, explicit review, and honest judgment about risk."

"We do not win by asking AI to 'do the work.' We win by redesigning the work so that humans keep what 'only humans can do', while AI handles everything else."

Relevance to dark-factory-kb

  • 3-Questions Test can be applied to decide which factory pipeline tasks to delegate to agents
  • Failure radius concept maps to quality gates and human review steps in pipelines
  • Task audit → automation backlog is exactly what the KB pipeline does (log every step, find what to automate)
  • Translation problems as AI sweet spot aligns with the KB's core function (converting source articles → structured concepts → HTML pages)
  • Delegation redesign principle reinforces the factory's approach of explicit task decomposition before agent delegation

Concept Cross-References