Multi-Model Review

Type: Review pattern
Referenced from: openclaw-autoreview-skill

Definition

Running two or more LLM review engines against one frozen code bundle to get diverse perspectives on code quality, security, and correctness. Each engine sees the same snapshot but may catch different classes of issues due to different training, reasoning patterns, and tool access.

Default vs Opt-In

Configuration Default Opt-In
Single engine (Codex)
Panel (Codex + Claude)
Custom reviewer list

Single engine is the default. Codex should remain the normal final closeout engine. Panels are used when:
- Explicitly requested by the user
- Risk justifies the extra spend (security-critical changes, large refactors)

Panel Configuration

# Simple panel (Codex + Claude)
autoreview --panel

# Custom reviewers with model/thinking overrides
autoreview --reviewers codex,claude --model codex=gpt-5.1 --thinking codex=high --model claude=sonnet --thinking claude=max

# Inline syntax
autoreview --reviewers codex:gpt-5.1:high,claude:sonnet:max

Why Opt-In

  • Cost — Two engine calls cost roughly double
  • Diminishing returns — Most findings are caught by one good engine
  • Verification burden — The main agent must verify every finding from every engine
  • Complexity — Merging findings across engines introduces dedup and conflict resolution

Frozen Bundle

All reviewers see the same frozen code bundle. This ensures:
- No race conditions between reviewers
- Findings are comparable (same code state)
- One review path, not N parallel code states

Finding Verification

Even with multiple reviewers, the main agent verifies every accepted finding before fixing. Multi-model review multiplies advisory output, not authority.

Application to Factory Design

In a factory pipeline, multi-model review maps to a parallel review stage that's activated on demand:

BUILD → [Closeout Review] → Ship
              ↓ (if risk justifies)
         [Panel Review] → Verify all findings → Ship

The panel is a conditional branch, not the default path.