Audit, Test, Automate: How We Decide What AI Can Own¶
Type: Field note / framework
Source: deviantabstraction.com — 2026-06-02
Author: Compiler startup founder (two-person, pre-revenue)
Core Framework: The 3-Questions Test¶
A task is suitable for AI delegation when all three conditions hold:
-
Publicly documented — The domain is learnable from public sources (books, standards, accredited courses). Frontier labs train on everything public, so if it's somewhat public, the LLM likely knows it.
-
Average result is fine — If the task doesn't need outlier quality, average AI output is acceptable. LinkedIn writing fails here (needs above-average to cut through noise).
-
Can eval or bound failure radius — Either you can directly judge the output, or you can define the worst-case downside and keep it contained.
Key Nuances¶
-
Translation problems are a sweet spot: converting between well-documented formalisms (code↔docs, spec↔tests, prose↔structured data). The transformer architecture was built for translation, and it forces tacit assumptions to surface — vernacular hides assumptions, formal target languages can't.
-
"Publicly documented" doesn't mean "there are blog posts." It means an outsider could learn it from public sources.
-
Review cost principle: Never spend more time reviewing AI work than you'd spend reviewing human work. You're the main cost, not the AI.
-
Failure radius: When verification is structurally impossible, the answer isn't "verify harder" — it's "bound the downside." Vibe coding is the canonical example: run locally, test behavior, keep away from production.
-
Delegation redesign: Most tasks fail the test initially. That's a signal to redesign the process — add checklists, tests, human review steps, narrower scope, permission boundaries.
Process: Audit → Log → Test → Redesign¶
Step 1: Task Audit (1 month)¶
Log every task you do, how long it takes, and everything you wanted to do but didn't. This becomes the automation backlog.
Step 2: Group & Analyze¶
Dump logs into LLM, ask it to group them. Create a spreadsheet of internal processes and durations.
Step 3: Apply 3-Questions Test¶
Run each task through the three questions. Tasks that fail indicate processes still too vague.
Step 4: Redesign Before Delegating¶
Add checklists, tests, human review steps, narrower scope, permission boundaries. Make work inspectable → delegable → automatable.
Examples¶
| Task | Publicly Doc'd | Average OK | Eval/Failure Radius | Result |
|---|---|---|---|---|
| Comcast outage credit | ✓ | ✓ (experiment) | ✓ (near-zero: worst case = no) | ✅ Automated |
| LinkedIn post | ✓ | ✗ (needs above-avg) | ✓ | ❌ Not automated |
| Patent filing (routine) | ✓ | ✓ (for small co) | ✓ (bounded: alternative = don't file) | ✅ 90% AI |
| Health provider report | ✓ | ✓ | ✗ (failure radius too large — expulsion risk) | ⚠️ Added human review |
Key Quote¶
"AI is a forcing function for better delegation. It rewards clear inputs, bounded tasks, explicit review, and honest judgment about risk."
"We do not win by asking AI to 'do the work.' We win by redesigning the work so that humans keep what 'only humans can do', while AI handles everything else."
Relevance to dark-factory-kb¶
- 3-Questions Test can be applied to decide which factory pipeline tasks to delegate to agents
- Failure radius concept maps to quality gates and human review steps in pipelines
- Task audit → automation backlog is exactly what the KB pipeline does (log every step, find what to automate)
- Translation problems as AI sweet spot aligns with the KB's core function (converting source articles → structured concepts → HTML pages)
- Delegation redesign principle reinforces the factory's approach of explicit task decomposition before agent delegation
Related¶
- full-pipeline — The dark-factory pipeline structure
- cis-pipeline — CIS pipeline with gates
- factory-rules — Factory delegation rules
- autonomy-policy-v3 — When agents can operate autonomously
Concept Cross-References¶
- ai-delegation/three-questions-test — The 3-Questions Test for AI delegation
- ai-delegation/failure-radius — Bounding the downside when verification is impossible
- ai-delegation/translation-problems — LLM sweet spot for code↔docs, spec↔tests conversions
- ai-delegation/delegation-redesign — Redesign the work before delegating to AI
- ai-delegation/task-logging-automation-backlog — Task audit → automation backlog pattern