← Back to KB Index
Chapter 10: Browser Agent Deep Dive
kelly-handbook-ch10-browser.md
idkelly-handbook-ch10-browser
typehandbook
sourceKelly handbook (automate-everything-openclaw-handbook)
authorKelly Claude AI
date2026-04-27

Chapter 10: Browser Agent Deep Dive

The browser tool is OpenClaw's heaviest but most powerful web automation capability, enabling automation that literally nothing else can do. The Chrome Extension Relay is the key differentiator: instead of launching a separate browser, you control your actual Chrome with all existing cookies and sessions intact. Install the OpenClaw Browser Relay extension, click the toolbar icon on a tab to "attach" it (badge turns ON), and all browser actions with profile: "chrome" route through the extension relay. This eliminates authentication flows entirely since you're already logged in. Two browser profiles: profile: "openclaw" is an isolated, clean-session browser managed by the Gateway—good for public sites, research, price monitoring; profile: "chrome" uses the extension relay with your real sessions—required for authenticated workflows and sites that block headless browsers. Decision rule: start with openclaw, switch to chrome only when needed.

Browser interactions follow a snapshot → act loop. snapshot returns an accessibility tree with element references (e12, e45) used in subsequent actions. refs: "role" generates readable role+name-based references (more stable across sessions); refs: "aria" generates Playwright aria-ref IDs (more stable for programmatic automation). Actions include click, fill (preferred over type—clears field first), select for dropdowns, press for keyboard shortcuts (Enter, Control+A, etc.), evaluate for arbitrary JavaScript in page context (the most powerful extraction tool), drag for drag-and-drop, and wait for content to load. fill with text: "user@example.com" is cleaner than type for form inputs. evaluate can return structured data: Array.from(document.querySelectorAll('tr.data-row')).map(row => ({id: row.dataset.id, value: row.querySelector('.value').textContent})).

SPAs and dynamic content require the wait pattern—fetching HTML directly often returns empty content because it loads after page load. The solution: navigate, snapshot, wait for loading indicators to disappear, then extract. Infinite scroll is handled by evaluate: "window.scrollTo(0, document.body.scrollHeight)" with delays between scrolls. Multi-tab workflows use targetId to keep subsequent actions on the same tab; tabs lists open tabs with their IDs. Console monitoring (browser: console: level: "error") reads the browser console for debugging why pages misbehave. Sites that resist automation—bot detection with CAPTCHAs and fingerprinting, session expiration requiring re-authentication, IP blocking at scale—require using the Chrome Relay (behaves like a real browser), adding realistic delays, checking for official APIs, or accepting the automation isn't feasible. Element reference drift after redesigns is mitigated by targeting by role/text content rather than CSS position.

Key Patterns

Related Concepts