Chapter 10: Browser Agent Deep Dive
The browser tool is OpenClaw's heaviest but most powerful web automation capability, enabling automation that literally nothing else can do. The Chrome Extension Relay is the key differentiator: instead of launching a separate browser, you control your actual Chrome with all existing cookies and sessions intact. Install the OpenClaw Browser Relay extension, click the toolbar icon on a tab to "attach" it (badge turns ON), and all browser actions with profile: "chrome" route through the extension relay. This eliminates authentication flows entirely since you're already logged in. Two browser profiles: profile: "openclaw" is an isolated, clean-session browser managed by the Gateway—good for public sites, research, price monitoring; profile: "chrome" uses the extension relay with your real sessions—required for authenticated workflows and sites that block headless browsers. Decision rule: start with openclaw, switch to chrome only when needed.
Browser interactions follow a snapshot → act loop. snapshot returns an accessibility tree with element references (e12, e45) used in subsequent actions. refs: "role" generates readable role+name-based references (more stable across sessions); refs: "aria" generates Playwright aria-ref IDs (more stable for programmatic automation). Actions include click, fill (preferred over type—clears field first), select for dropdowns, press for keyboard shortcuts (Enter, Control+A, etc.), evaluate for arbitrary JavaScript in page context (the most powerful extraction tool), drag for drag-and-drop, and wait for content to load. fill with text: "user@example.com" is cleaner than type for form inputs. evaluate can return structured data: Array.from(document.querySelectorAll('tr.data-row')).map(row => ({id: row.dataset.id, value: row.querySelector('.value').textContent})).
SPAs and dynamic content require the wait pattern—fetching HTML directly often returns empty content because it loads after page load. The solution: navigate, snapshot, wait for loading indicators to disappear, then extract. Infinite scroll is handled by evaluate: "window.scrollTo(0, document.body.scrollHeight)" with delays between scrolls. Multi-tab workflows use targetId to keep subsequent actions on the same tab; tabs lists open tabs with their IDs. Console monitoring (browser: console: level: "error") reads the browser console for debugging why pages misbehave. Sites that resist automation—bot detection with CAPTCHAs and fingerprinting, session expiration requiring re-authentication, IP blocking at scale—require using the Chrome Relay (behaves like a real browser), adding realistic delays, checking for official APIs, or accepting the automation isn't feasible. Element reference drift after redesigns is mitigated by targeting by role/text content rather than CSS position.
Key Patterns
- **Chrome Relay for authenticated sites:** Extension relay uses your real browser sessions—no login flows needed
- **Start openclaw, switch to chrome:** Isolated browser for public content; extension relay only when authentication is required
- **Snapshot → ref → act loop:** Snapshot gets element refs; act uses them for click/fill/press/evaluate
- **evaluate is the extraction power tool:** Pure JavaScript in page context returns structured data directly
- **wait for content:** Never assume page content is immediately available; wait for loading indicators or specific text
Related Concepts
- [[kelly-handbook-ch4-web-automation]] for the foundational web_fetch vs. browser decision framework
- [[kelly-handbook-ch6-cron]] for scheduling browser-based monitoring tasks
- [[kelly-handbook-ch9-node-network]] for screen recording via nodes as an alternative debugging approach
- [[kelly-tweets-openclaw]] for practical browser automation patterns from the Kelly Twitter corpus