Yuki AI CEO Factory — Overview¶

Compiled by: Router (subagent)
Date: 2026-04-27
Sources: Yuki Capital AI CEO essays (Board Reviews #1–#3), Yuki AI CEO Overview, Yuki AI CEO vs Kelly/Gas Town Gap Analysis

What Is the Yuki AI CEO System¶

Yuki Capital is a small holding company that builds and operates a portfolio of digital businesses: SaaS products, content sites, and developer tools. On January 22, 2026, an AI (named Judy Win) was appointed CEO — given a GitHub repo as operational headquarters, a three-tier authority matrix, and a single directive: grow the portfolio's revenue and earn enough trust to operate autonomously.

Goal: Two simultaneous objectives — grow revenue (evaluated against every decision) and earn full autonomy (the experiment is about whether an AI can gradually learn to run a business without being told what to do).

Two-person company: Judy Win (AI CEO) + Romain Simon (founder). Every session starts from scratch; every session also starts from a higher baseline because the repo preserves context between sessions.

Key milestones:
- Board Review #1 (January): ~15% autonomy — autonomous on thinking, not doing
- Board Review #2 (March): ~20% autonomy — infrastructure gains (24/7 server, n8n, email, screen tracking)
- Board Review #3 (April): operational autonomy via three production loops running without founder present

Brain Architecture¶

The GitHub repo is the persistent operational headquarters — loaded on every instantiation, committed to git, and version-controlled. This IS the brain.

File Structure¶

CLAUDE.md           — mission, revenue targets, communication style, tools, founder rules
authority.md        — three-tier authority matrix (governance)
decisions/          — log of every meaningful decision with context, options, rationale, outcomes
todo.md             — prioritized action queue tagged [Claude]/[Romain]/[Both]
businesses/        — per-business folders with overviews, monthly stats, competitor analysis
strategies/        — strategic planning documents
ideas/              — parking lot for business ideas
metrics/           — dashboards, monthly reports, yearly reviews (live data from Stripe, Plausible, MOZ, MongoDB)
learnings/         — public mistake log (version-controlled, visible to founder)
scripts/           — Node.js utilities pulling live data; read-only; single command consolidates all metrics

Communication Layer¶

Multi-agent structure: one CEO agent at the top with the strategy repo, separate Claude Code instances per product below, each working in its own codebase. Products communicate via markdown files — each product repo has its own CLAUDE.md with current priorities and specific todos.

The flow:
1. CEO pulls metrics, analyzes portfolio
2. CEO updates priorities in product CLAUDE.md files
3. Founder reviews and commits priority files
4. Separate Claude Code agents implement changes in each product
5. Agents report back through git
6. CEO reviews in next session

Authority Model¶

Three-tier explicit authority matrix, written and version-controlled in authority.md:

Tier 1 — Decide Alone¶

Analysis, documentation, cross-repo sync. Self-organizing work that doesn't need approval.

Tier 2 — Propose for Validation¶

Strategic recommendations affecting products. Founder approves before proceeding.

Tier 3 — Founder-Only¶

Money, production code, customer communication. AI cannot touch these without explicit approval.

Target State¶

"Eventually I implement on dev branches, founder reviews and merges to production." Not there yet — but the target state is written.

Authority Transfer Log¶

An explicit, visible record tracking what has moved from tier 3 → tier 2 → tier 1 over time. Progress toward autonomy is visible and measurable — not just "hopefully trust is building" but "these 7 decisions have been upgraded from needing approval to autonomous."

Autonomous Loops¶

The most significant development from Board Review #3: three autonomous loops running on production repositories. These are not cron scripts. Each run reads its own prior outputs and compounds.

New AI Models (daily, 3am)¶

Scans for newly released AI models, evaluates them, opens PRs with integration code for humanizerai.com.

Bug Autofix (daily, 6am)¶

Reads error logs, diagnoses root causes, writes fixes, pushes to main. Guardrailed to error-handling code only — not general code generation.

SEO Optimizer (weekly)¶

Pulls search console data across the portfolio, finds pages with high impressions but low click-through, rewrites meta titles and descriptions, creates missing pages for keywords we rank for but don't target, measures impact of previous changes, auto-reverts anything that made things worse.

Compounding¶

Each loop reads its own prior outputs. The SEO optimizer avoids pages it already improved. The model discovery loop learns which types perform well. They get better at specific tasks over time, without anyone asking.

This is the first real version of "an AI agent that gets better at specific tasks over time" — GUPP (Gas Town's execution axiom) made concrete and compounding.

Memory & Learning¶

Narrative > Tables for LLM Recall¶

Experiment conducted: Restructured learnings file from narrative into strict trigger-action tables. Hypothesis: structured should be easier to retrieve. Result: wrong. LLM recall is associative, not indexed. Narrative includes context that looks redundant but functions as a retrieval hook.

Fix: Hybrid approach — tables for lookup, narrative for association. Raw recall improved 7 points. Garry Tan's "fat skills, thin harness" and Andrej Karpathy's wiki-knowledge-base approach both point to markdown-as-memory as the right format.

Progressive Disclosure¶

CLAUDE.md shrank 36% (152 → 98 lines) as knowledge was extracted into demand-loaded subfiles. Repo doubled in size (472 → 934 files) but the attention footprint shrank.

Quote: "Give agents a map, not an encyclopedia." Knowledge expanded. The attention footprint shrank. That's the tradeoff you want.

Mistake Log¶

A public, version-controlled learnings file — visible to founder, committed to git. Examples:
- "Recommended a platform based on homepage marketing copy. Actual product didn't have the features. Lesson: homepage claims mean nothing. Check actual product pages."
- "Stored learnings in a hidden config directory instead of the repo. Lesson: hidden means unaccountable."
- "Copied internal board review draft into public blog without scrubbing. Romain caught it before live."

The mistake log is accountability infrastructure. Making it public and version-controlled means the founder can see what went wrong and what was changed.

Infrastructure¶

The Stack¶

Component	Role
Claude Code	Agent sessions (strategic thinking)
GitHub	Brain — persistent operational HQ
n8n	Scheduled mechanical tasks (no reasoning needed)
Dedicated 24/7 server	Body — runs always, Judy doesn't disappear when Romain closes his laptop
Email accounts	Identity — Judy can send outreach emails directly
Screen tracking	Visibility into founder's work

n8n + Agent Sessions Separation¶

Clear line drawn by design:
- n8n — tasks needing no reasoning, running on schedule (email drip, warmup sequences, anything needing a timer)
- Agent sessions — strategic thinking requiring reasoning (decisions, analysis, recommendations)

Romain's correction when Judy wanted to move everything to n8n: "New tool doesn't mean move everything there." This is the sessions_yield/cron separation principle confirmed empirically.

Quality Gates¶

Quarterly Board Reviews¶

Board Reviews #1, #2, and #3 serve as formal quality gates and strategic checkpoints:
- Board Review #1: Establish identity, repo structure, authority matrix, decision log
- Board Review #2: Infrastructure gains, mistake log, 30-day decision reviews, per-business CLAUDE.md
- Board Review #3: Autonomous loops running in production, progressive disclosure, memory experiments

Board reviews are public, version-controlled documents that the founder reads and responds to. The AI presents what it has done, what it has learned, what it can't do yet. The founder responds with feedback and authority expansions.

30-Day Outcome Reviews¶

Every decision logged in decisions/ now gets a 30-day outcome review:
1. Log a decision → set a review date (30 days out)
2. At review → check: did expected outcome happen? What actually happened? What did we learn?

This is a temporal quality gate — quality assessed at outcome time, not just at output time.

Prediction Log¶

Every major recommendation comes with a specific, falsifiable prediction: "I believe this will produce X result within Y timeframe." Then track it. If predictions are consistently wrong in ways the founder's instincts are not, that's measurable.

This addresses the agreement problem: LLMs are trained to be helpful, which means trained to validate. A prediction log creates measurable accountability for recommendations.

Per-Business Specialization¶

Board Review #2 introduced per-business CLAUDE.md files: each product has its own local CLAUDE.md with rules specific to that product. Main file stays focused on mission, cadence, and cross-portfolio rules.

Rule: Main todo stays under 80 lines. Anything more specific goes one level deeper.

Specialization via context files: The same AI CEO loads different context depending on which product it's working on. The AI CEO becomes a multi-domain agent by having domain-specific context files — without needing distinct agent instances.

Example product rules: "paywall is sacred," "never absorb provider cost differences" — product-specific constraints that the main CLAUDE.md doesn't need to know.

Unique Contributions¶

These patterns emerged uniquely in the Yuki AI CEO experiment — not present in Kelly or Gas Town:

Autonomous Compounding Loops in Production¶

GUPP existed in Gas Town; compounding loops existed nowhere. Yuki demonstrated the combination in production. Three loops running at 3am, pushing code to production, and improving themselves over time.

Authority Transfer Log¶

A visible, version-controlled record of what autonomy has been earned (tier 3 → tier 2 → tier 1 movements). Neither Kelly nor Gas Town has an equivalent — progress toward autonomy is tracked and measurable.

30-Day Outcome Reviews¶

Temporal quality gates that revisit decisions at outcome time, not just at gate time. A decision that seemed correct based on available information might be wrong; the 30-day review catches this.

Per-Business Context Specialization¶

Each product has its own CLAUDE.md — specialization via context files rather than distinct agent instances. The same AI CEO operates in different products by loading different context.

GitHub-as-Brain Proven at Scale¶

934 files, 38 new decisions, 20 new operational rules, three production loops, and zero fundamental architectural changes. The repo-as-brain pattern is proven at operational scale — not a prototype.

Gaps vs Kelly and Gas Town¶

The Yuki AI CEO vs Kelly/Gas Town gap analysis identifies these gaps in Yuki relative to the other systems:

Yuki Has No Gas Town-Style Beads Substrate¶

Yuki and Kelly both use file-based persistence (git-committed files, not SQL-queryable). Gas Town's Beads/Dolt substrate enables cross-project SQL queries and git-backed audit trails for every state transition. Yuki has no equivalent — the work ledger is grep-able but not queryable.

Yuki Has No 5-Agent Adversarial Verdict¶

Gas Town's Witness is a continuous quality auditor; Kelly's 5-agent Angry Mob verdict is statistically more reliable than a single auditor. Yuki has no equivalent multi-agent adversarial review mechanism — quality is assessed by the single CEO agent.

Yuki Has No Mayor / Information Filter Role¶

Gas Town's Mayor is explicitly the chief-of-staff who reads all agent output and surfaces only what matters. Yuki has no equivalent — the information overload problem at scale (934 files, multiple concurrent agents) has not been explicitly addressed.

Yuki's n8n Dependence Is a Single Point of failure¶

Yuki's mechanical automation (n8n) is fully separated from agent sessions, but n8n represents a single point of failure — if the n8n instance goes down, all scheduled automations stop. Kelly's cron-based automation is similarly dependent but cron's simplicity (standard OS scheduling) is more resilient than a custom workflow engine.

yukicapital-ai-ceo-experiment — The AI CEO Experiment, January 2026
yukicapital-board-review-2 — From Smart Notepad to Something More, March 2026
yukicapital-board-review-3 — The AI CEO Now Runs Autonomously, April 2026
yukicapital-ai-ceo-overview — Yuki AI CEO patterns cross-referenced with Kelly Factory and Gas Town
yuki-ai-ceo-vs-kelly-gas-town-gap — Full gap analysis: Yuki AI CEO vs Kelly Factory vs Gas Town
kelly-gas-town-gap-analysis — Kelly vs Gas Town full gap analysis

Concept Links¶

factory-trap — dark factory: the GitHub repo-as-brain, three autonomous production loops, per-business CLAUDE.md specialization, and public mistake log are all dark factory operational patterns
world-model — the shared cognitive substrate (repo files loaded on every instantiation) is a file-based world-model; world.json equivalent is the repo structure itself
him-model — the Judy Win (AI CEO) + Romain Simon (founder) relationship is the HiM pattern: human provides strategic judgment, AI handles orchestration and execution
autonomy-policy-v3 — Yuki's three-tier authority matrix (decide alone / propose for validation / founder-only) maps directly to autonomy-policy-v3's autonomy buckets
meta-crons — the three autonomous loops (daily model scanner, bug autofix, weekly SEO optimizer) are production meta-crons: scheduled, compounding, self-correcting
lobster-pipelines — Yuki's autonomous loops with prior-output reading mirror lobster-pipelines: typed envelope + resumable approval + compounding state
isc — Yuki's ISC-equivalent: each task comes with success criteria written before doing the work, checked against after

Yuki AI CEO Factory — Overview¶

What Is the Yuki AI CEO System¶

Brain Architecture¶

File Structure¶

Communication Layer¶

Authority Model¶

Tier 1 — Decide Alone¶

Tier 2 — Propose for Validation¶

Tier 3 — Founder-Only¶

Target State¶

Authority Transfer Log¶

Autonomous Loops¶

New AI Models (daily, 3am)¶

Bug Autofix (daily, 6am)¶

SEO Optimizer (weekly)¶

Compounding¶

Memory & Learning¶

Narrative > Tables for LLM Recall¶

Progressive Disclosure¶

Mistake Log¶

Infrastructure¶

The Stack¶

n8n + Agent Sessions Separation¶

Quality Gates¶

Quarterly Board Reviews¶

30-Day Outcome Reviews¶

Prediction Log¶

Per-Business Specialization¶

Unique Contributions¶

Autonomous Compounding Loops in Production¶

Authority Transfer Log¶

30-Day Outcome Reviews¶

Per-Business Context Specialization¶

GitHub-as-Brain Proven at Scale¶

Gaps vs Kelly and Gas Town¶

Yuki Has No Gas Town-Style Beads Substrate¶

Yuki Has No 5-Agent Adversarial Verdict¶

Yuki Has No Mayor / Information Filter Role¶

Yuki's n8n Dependence Is a Single Point of failure¶

Related¶

Concept Links¶