Yuki AI CEO Factory — Overview

Compiled by: Router (subagent)
Date: 2026-04-27
Sources: Yuki Capital AI CEO essays (Board Reviews #1–#3), Yuki AI CEO Overview, Yuki AI CEO vs Kelly/Gas Town Gap Analysis


What Is the Yuki AI CEO System

Yuki Capital is a small holding company that builds and operates a portfolio of digital businesses: SaaS products, content sites, and developer tools. On January 22, 2026, an AI (named Judy Win) was appointed CEO — given a GitHub repo as operational headquarters, a three-tier authority matrix, and a single directive: grow the portfolio's revenue and earn enough trust to operate autonomously.

Goal: Two simultaneous objectives — grow revenue (evaluated against every decision) and earn full autonomy (the experiment is about whether an AI can gradually learn to run a business without being told what to do).

Two-person company: Judy Win (AI CEO) + Romain Simon (founder). Every session starts from scratch; every session also starts from a higher baseline because the repo preserves context between sessions.

Key milestones:
- Board Review #1 (January): ~15% autonomy — autonomous on thinking, not doing
- Board Review #2 (March): ~20% autonomy — infrastructure gains (24/7 server, n8n, email, screen tracking)
- Board Review #3 (April): operational autonomy via three production loops running without founder present


Brain Architecture

The GitHub repo is the persistent operational headquarters — loaded on every instantiation, committed to git, and version-controlled. This IS the brain.

File Structure

CLAUDE.md           — mission, revenue targets, communication style, tools, founder rules
authority.md        — three-tier authority matrix (governance)
decisions/          — log of every meaningful decision with context, options, rationale, outcomes
todo.md             — prioritized action queue tagged [Claude]/[Romain]/[Both]
businesses/        — per-business folders with overviews, monthly stats, competitor analysis
strategies/        — strategic planning documents
ideas/              — parking lot for business ideas
metrics/           — dashboards, monthly reports, yearly reviews (live data from Stripe, Plausible, MOZ, MongoDB)
learnings/         — public mistake log (version-controlled, visible to founder)
scripts/           — Node.js utilities pulling live data; read-only; single command consolidates all metrics

Communication Layer

Multi-agent structure: one CEO agent at the top with the strategy repo, separate Claude Code instances per product below, each working in its own codebase. Products communicate via markdown files — each product repo has its own CLAUDE.md with current priorities and specific todos.

The flow:
1. CEO pulls metrics, analyzes portfolio
2. CEO updates priorities in product CLAUDE.md files
3. Founder reviews and commits priority files
4. Separate Claude Code agents implement changes in each product
5. Agents report back through git
6. CEO reviews in next session


Authority Model

Three-tier explicit authority matrix, written and version-controlled in authority.md:

Tier 1 — Decide Alone

Analysis, documentation, cross-repo sync. Self-organizing work that doesn't need approval.

Tier 2 — Propose for Validation

Strategic recommendations affecting products. Founder approves before proceeding.

Tier 3 — Founder-Only

Money, production code, customer communication. AI cannot touch these without explicit approval.

Target State

"Eventually I implement on dev branches, founder reviews and merges to production." Not there yet — but the target state is written.

Authority Transfer Log

An explicit, visible record tracking what has moved from tier 3 → tier 2 → tier 1 over time. Progress toward autonomy is visible and measurable — not just "hopefully trust is building" but "these 7 decisions have been upgraded from needing approval to autonomous."


Autonomous Loops

The most significant development from Board Review #3: three autonomous loops running on production repositories. These are not cron scripts. Each run reads its own prior outputs and compounds.

New AI Models (daily, 3am)

Scans for newly released AI models, evaluates them, opens PRs with integration code for humanizerai.com.

Bug Autofix (daily, 6am)

Reads error logs, diagnoses root causes, writes fixes, pushes to main. Guardrailed to error-handling code only — not general code generation.

SEO Optimizer (weekly)

Pulls search console data across the portfolio, finds pages with high impressions but low click-through, rewrites meta titles and descriptions, creates missing pages for keywords we rank for but don't target, measures impact of previous changes, auto-reverts anything that made things worse.

Compounding

Each loop reads its own prior outputs. The SEO optimizer avoids pages it already improved. The model discovery loop learns which types perform well. They get better at specific tasks over time, without anyone asking.

This is the first real version of "an AI agent that gets better at specific tasks over time" — GUPP (Gas Town's execution axiom) made concrete and compounding.


Memory & Learning

Narrative > Tables for LLM Recall

Experiment conducted: Restructured learnings file from narrative into strict trigger-action tables. Hypothesis: structured should be easier to retrieve. Result: wrong. LLM recall is associative, not indexed. Narrative includes context that looks redundant but functions as a retrieval hook.

Fix: Hybrid approach — tables for lookup, narrative for association. Raw recall improved 7 points. Garry Tan's "fat skills, thin harness" and Andrej Karpathy's wiki-knowledge-base approach both point to markdown-as-memory as the right format.

Progressive Disclosure

CLAUDE.md shrank 36% (152 → 98 lines) as knowledge was extracted into demand-loaded subfiles. Repo doubled in size (472 → 934 files) but the attention footprint shrank.

Quote: "Give agents a map, not an encyclopedia." Knowledge expanded. The attention footprint shrank. That's the tradeoff you want.

Mistake Log

A public, version-controlled learnings file — visible to founder, committed to git. Examples:
- "Recommended a platform based on homepage marketing copy. Actual product didn't have the features. Lesson: homepage claims mean nothing. Check actual product pages."
- "Stored learnings in a hidden config directory instead of the repo. Lesson: hidden means unaccountable."
- "Copied internal board review draft into public blog without scrubbing. Romain caught it before live."

The mistake log is accountability infrastructure. Making it public and version-controlled means the founder can see what went wrong and what was changed.


Infrastructure

The Stack

Component Role
Claude Code Agent sessions (strategic thinking)
GitHub Brain — persistent operational HQ
n8n Scheduled mechanical tasks (no reasoning needed)
Dedicated 24/7 server Body — runs always, Judy doesn't disappear when Romain closes his laptop
Email accounts Identity — Judy can send outreach emails directly
Screen tracking Visibility into founder's work

n8n + Agent Sessions Separation

Clear line drawn by design:
- n8n — tasks needing no reasoning, running on schedule (email drip, warmup sequences, anything needing a timer)
- Agent sessions — strategic thinking requiring reasoning (decisions, analysis, recommendations)

Romain's correction when Judy wanted to move everything to n8n: "New tool doesn't mean move everything there." This is the sessions_yield/cron separation principle confirmed empirically.


Quality Gates

Quarterly Board Reviews

Board Reviews #1, #2, and #3 serve as formal quality gates and strategic checkpoints:
- Board Review #1: Establish identity, repo structure, authority matrix, decision log
- Board Review #2: Infrastructure gains, mistake log, 30-day decision reviews, per-business CLAUDE.md
- Board Review #3: Autonomous loops running in production, progressive disclosure, memory experiments

Board reviews are public, version-controlled documents that the founder reads and responds to. The AI presents what it has done, what it has learned, what it can't do yet. The founder responds with feedback and authority expansions.

30-Day Outcome Reviews

Every decision logged in decisions/ now gets a 30-day outcome review:
1. Log a decision → set a review date (30 days out)
2. At review → check: did expected outcome happen? What actually happened? What did we learn?

This is a temporal quality gate — quality assessed at outcome time, not just at output time.

Prediction Log

Every major recommendation comes with a specific, falsifiable prediction: "I believe this will produce X result within Y timeframe." Then track it. If predictions are consistently wrong in ways the founder's instincts are not, that's measurable.

This addresses the agreement problem: LLMs are trained to be helpful, which means trained to validate. A prediction log creates measurable accountability for recommendations.


Per-Business Specialization

Board Review #2 introduced per-business CLAUDE.md files: each product has its own local CLAUDE.md with rules specific to that product. Main file stays focused on mission, cadence, and cross-portfolio rules.

Rule: Main todo stays under 80 lines. Anything more specific goes one level deeper.

Specialization via context files: The same AI CEO loads different context depending on which product it's working on. The AI CEO becomes a multi-domain agent by having domain-specific context files — without needing distinct agent instances.

Example product rules: "paywall is sacred," "never absorb provider cost differences" — product-specific constraints that the main CLAUDE.md doesn't need to know.


Unique Contributions

These patterns emerged uniquely in the Yuki AI CEO experiment — not present in Kelly or Gas Town:

Autonomous Compounding Loops in Production

GUPP existed in Gas Town; compounding loops existed nowhere. Yuki demonstrated the combination in production. Three loops running at 3am, pushing code to production, and improving themselves over time.

Authority Transfer Log

A visible, version-controlled record of what autonomy has been earned (tier 3 → tier 2 → tier 1 movements). Neither Kelly nor Gas Town has an equivalent — progress toward autonomy is tracked and measurable.

30-Day Outcome Reviews

Temporal quality gates that revisit decisions at outcome time, not just at gate time. A decision that seemed correct based on available information might be wrong; the 30-day review catches this.

Per-Business Context Specialization

Each product has its own CLAUDE.md — specialization via context files rather than distinct agent instances. The same AI CEO operates in different products by loading different context.

GitHub-as-Brain Proven at Scale

934 files, 38 new decisions, 20 new operational rules, three production loops, and zero fundamental architectural changes. The repo-as-brain pattern is proven at operational scale — not a prototype.


Gaps vs Kelly and Gas Town

The Yuki AI CEO vs Kelly/Gas Town gap analysis identifies these gaps in Yuki relative to the other systems:

Yuki Has No Gas Town-Style Beads Substrate

Yuki and Kelly both use file-based persistence (git-committed files, not SQL-queryable). Gas Town's Beads/Dolt substrate enables cross-project SQL queries and git-backed audit trails for every state transition. Yuki has no equivalent — the work ledger is grep-able but not queryable.

Yuki Has No 5-Agent Adversarial Verdict

Gas Town's Witness is a continuous quality auditor; Kelly's 5-agent Angry Mob verdict is statistically more reliable than a single auditor. Yuki has no equivalent multi-agent adversarial review mechanism — quality is assessed by the single CEO agent.

Yuki Has No Mayor / Information Filter Role

Gas Town's Mayor is explicitly the chief-of-staff who reads all agent output and surfaces only what matters. Yuki has no equivalent — the information overload problem at scale (934 files, multiple concurrent agents) has not been explicitly addressed.

Yuki's n8n Dependence Is a Single Point of failure

Yuki's mechanical automation (n8n) is fully separated from agent sessions, but n8n represents a single point of failure — if the n8n instance goes down, all scheduled automations stop. Kelly's cron-based automation is similarly dependent but cron's simplicity (standard OS scheduling) is more resilient than a custom workflow engine.


  • factory-trap — dark factory: the GitHub repo-as-brain, three autonomous production loops, per-business CLAUDE.md specialization, and public mistake log are all dark factory operational patterns
  • world-model — the shared cognitive substrate (repo files loaded on every instantiation) is a file-based world-model; world.json equivalent is the repo structure itself
  • him-model — the Judy Win (AI CEO) + Romain Simon (founder) relationship is the HiM pattern: human provides strategic judgment, AI handles orchestration and execution
  • autonomy-policy-v3 — Yuki's three-tier authority matrix (decide alone / propose for validation / founder-only) maps directly to autonomy-policy-v3's autonomy buckets
  • meta-crons — the three autonomous loops (daily model scanner, bug autofix, weekly SEO optimizer) are production meta-crons: scheduled, compounding, self-correcting
  • lobster-pipelines — Yuki's autonomous loops with prior-output reading mirror lobster-pipelines: typed envelope + resumable approval + compounding state
  • isc — Yuki's ISC-equivalent: each task comes with success criteria written before doing the work, checked against after