🐾 claw-stack

Last updated: March 2026 Β· Source: chrome-devtools-mcp

Web Automation Operator

An MCP server that exposes the Chrome DevTools Protocol to AI agents β€” enabling reliable browser automation, network inspection, console debugging, screenshot capture, and performance tracing from within any MCP-compatible client.

← Module Overview

What

chrome-devtools-mcp launches as an MCP server and connects to a Chrome browser instance. It provides 26 tools across 6 categories that an AI agent can call to interact with a live browser: clicking elements, filling forms, navigating pages, reading console output, capturing performance traces, and more. Automation uses Puppeteer under the hood to wait for page state after each action.

Privacy note

This server exposes browser contents to the MCP client. Performance tools may send trace URLs to the Google CrUX API for real-user field data. Usage statistics are collected by default; both can be disabled via server flags.

Why

Browser automation for AI agents has two common failure modes. Selenium and Playwright are designed for deterministic test scripts: you know exactly which element to click and in what order. An AI agent needs to observe, reason, and adapt β€” it doesn't know the DOM structure in advance and must discover it dynamically. Screenshot-only approaches address this by letting the agent "see" the page visually, but screenshots don't give precise element references and can't express intent as tool calls.

The Chrome DevTools Protocol (CDP) exposes the same primitives developers use in DevTools: the accessibility tree, JavaScript execution, network traffic, console output, and performance traces. An agent working with CDP can inspect the page structure precisely, identify elements by their accessibility roles, execute arbitrary JavaScript, and read exactly what the browser logged β€” all through a single MCP interface.

Architecture

Three layers collaborate to handle a single agent tool call:

Agent
  β†’ MCP tool call (e.g. click, fill, navigate_page)
  β†’ chrome-devtools-mcp server
    β†’ Puppeteer (Chrome management + wait-for-action-result)
      β†’ Chrome DevTools Protocol (browser control)
        β†’ Chrome instance

26 tools across 6 categories:

Input Automation (8 tools)

Tool Description
clickClick an element by uid (single or double click)
dragDrag one element onto another
fillType text into an input or select an option
fill_formFill multiple form elements in one call
handle_dialogAccept or dismiss browser dialogs
hoverHover over an element
press_keyPress a key or key combination
upload_fileUpload a local file via a file input element

Navigation (6 tools)

Tool Description
navigate_pageNavigate to a URL
new_pageOpen a new browser tab
close_pageClose a tab by page ID
list_pagesList all open tabs
select_pageSwitch focus to a tab by page ID
wait_forWait for a condition before proceeding

Debugging (5 tools)

Tool Description
take_screenshotCapture a screenshot of the current page
take_snapshotCapture the page accessibility tree snapshot (returns element UIDs)
evaluate_scriptExecute JavaScript in the page context
get_console_messageRetrieve a specific console message with source-mapped stack trace
list_console_messagesList all console messages from the current page

Network (2), Performance (3), Emulation (2)

Tool Description
list_network_requestsList all network requests made by the page
get_network_requestGet details of a specific request including headers and body
performance_start_traceStart a DevTools performance trace
performance_stop_traceStop the trace and return raw data
performance_analyze_insightExtract actionable insights from a trace (optionally includes CrUX field data)
emulateEmulate a device (mobile, tablet, etc.)
resize_pageResize the browser viewport

Key Design Decisions

Accessibility tree UIDs β€” snapshot first, then act

Tools that interact with elements take a uid parameter. UIDs come from the accessibility tree returned by take_snapshot. The agent always calls take_snapshot first to get current UIDs, then passes the target UID to an action tool. This avoids brittle CSS selectors and XPath expressions that break when the DOM changes.

Puppeteer wait-for-action-result β€” no polling loops

After every interaction (click, fill, navigate), Puppeteer waits for the page to settle before returning control to the agent. The agent doesn't need to explicitly poll for page readiness. This eliminates a common class of timing bugs where the agent acts on a page before JavaScript has finished updating it.

CDP over Selenium/Playwright

CDP gives lower-level access than Playwright. The agent can read console messages with source-mapped stack traces, intercept network requests, execute arbitrary JavaScript, and record DevTools-level performance traces β€” none of which are easily accessible through Playwright's abstraction layer.

Managed Chrome vs. connecting to an existing instance

By default the server launches its own Chrome with a dedicated profile. For cases where the agent needs to maintain session state (logged-in accounts) or work alongside manual testing, it can connect to an existing Chrome instance running with remote debugging enabled.

How to Build Your Own

1. The snapshot-then-act pattern is fundamental

Every interaction sequence starts with a fresh snapshot. UIDs are page-state-specific; a UID from a previous snapshot may be stale after navigation or a DOM mutation. Always snapshot before acting, especially after any navigation or form submission.

2. Implement wait-for-action-result for every interaction

Return from a tool call only after the page has settled β€” not immediately after the DOM event fires. Puppeteer's waitForNavigation and waitForSelector are the right primitives. Without this, the agent will act on a page mid-transition and get inconsistent results.

3. Use accessibility tree for elements, screenshots for visual verification

The accessibility tree gives precise element references (role, name, state, uid). Screenshots give visual context β€” useful for the agent to verify that a page looks correct. Use both: snapshot for acting, screenshot for confirming the visual result looks right.

4. Isolate user data when handling sensitive sites

CDP exposes the full contents of the browser session to the MCP client. If the agent is browsing authenticated pages or handling credentials, use the isolated mode (temporary user data directory cleaned up after the session) to prevent cross-contamination between tasks.

5. Combine lab traces with CrUX field data for performance analysis

A single lab trace shows one user's experience. CrUX field data shows percentile distributions across real users. The performance_analyze_insight tool combines both when the URL is publicly accessible, giving the agent a fuller picture of actual user experience vs. lab conditions.

Frequently Asked Questions

Does the server need Chrome to be running before it starts?

No. By default the server launches its own managed Chrome instance via Puppeteer when a tool that requires a browser is first called. Chrome does not start on server startup β€” only when actually needed.

How do tools identify page elements?

Tools that interact with elements (click, fill, hover) take a uid parameter. UIDs come from the page snapshot returned by take_snapshot. The agent calls take_snapshot, identifies the target element by uid, then passes that uid to the action tool.

What is the CrUX API and when does it send data externally?

The Chrome User Experience Report (CrUX) API provides real-user performance field data for public URLs. It is used by performance_analyze_insight alongside lab trace data. It can be disabled via a server flag to prevent any URL from being sent to Google's API.

Authors: Qiushi Wu & Orange 🍊