Last updated: March 2026 Β· Source: chrome-devtools-mcp
Web Automation Operator
An MCP server that exposes the Chrome DevTools Protocol to AI agents β enabling reliable browser automation, network inspection, console debugging, screenshot capture, and performance tracing from within any MCP-compatible client.
What
chrome-devtools-mcp launches as an MCP server and connects to a Chrome browser instance. It provides 26 tools across 6 categories that an AI agent can call to interact with a live browser: clicking elements, filling forms, navigating pages, reading console output, capturing performance traces, and more. Automation uses Puppeteer under the hood to wait for page state after each action.
Privacy note
This server exposes browser contents to the MCP client. Performance tools may send trace URLs to the Google CrUX API for real-user field data. Usage statistics are collected by default; both can be disabled via server flags.
Why
Browser automation for AI agents has two common failure modes. Selenium and Playwright are designed for deterministic test scripts: you know exactly which element to click and in what order. An AI agent needs to observe, reason, and adapt β it doesn't know the DOM structure in advance and must discover it dynamically. Screenshot-only approaches address this by letting the agent "see" the page visually, but screenshots don't give precise element references and can't express intent as tool calls.
The Chrome DevTools Protocol (CDP) exposes the same primitives developers use in DevTools: the accessibility tree, JavaScript execution, network traffic, console output, and performance traces. An agent working with CDP can inspect the page structure precisely, identify elements by their accessibility roles, execute arbitrary JavaScript, and read exactly what the browser logged β all through a single MCP interface.
Architecture
Three layers collaborate to handle a single agent tool call:
Agent
β MCP tool call (e.g. click, fill, navigate_page)
β chrome-devtools-mcp server
β Puppeteer (Chrome management + wait-for-action-result)
β Chrome DevTools Protocol (browser control)
β Chrome instance 26 tools across 6 categories:
Input Automation (8 tools)
| Tool | Description |
|---|---|
| click | Click an element by uid (single or double click) |
| drag | Drag one element onto another |
| fill | Type text into an input or select an option |
| fill_form | Fill multiple form elements in one call |
| handle_dialog | Accept or dismiss browser dialogs |
| hover | Hover over an element |
| press_key | Press a key or key combination |
| upload_file | Upload a local file via a file input element |
Navigation (6 tools)
| Tool | Description |
|---|---|
| navigate_page | Navigate to a URL |
| new_page | Open a new browser tab |
| close_page | Close a tab by page ID |
| list_pages | List all open tabs |
| select_page | Switch focus to a tab by page ID |
| wait_for | Wait for a condition before proceeding |
Debugging (5 tools)
| Tool | Description |
|---|---|
| take_screenshot | Capture a screenshot of the current page |
| take_snapshot | Capture the page accessibility tree snapshot (returns element UIDs) |
| evaluate_script | Execute JavaScript in the page context |
| get_console_message | Retrieve a specific console message with source-mapped stack trace |
| list_console_messages | List all console messages from the current page |
Network (2), Performance (3), Emulation (2)
| Tool | Description |
|---|---|
| list_network_requests | List all network requests made by the page |
| get_network_request | Get details of a specific request including headers and body |
| performance_start_trace | Start a DevTools performance trace |
| performance_stop_trace | Stop the trace and return raw data |
| performance_analyze_insight | Extract actionable insights from a trace (optionally includes CrUX field data) |
| emulate | Emulate a device (mobile, tablet, etc.) |
| resize_page | Resize the browser viewport |
Key Design Decisions
Accessibility tree UIDs β snapshot first, then act
Tools that interact with elements take a uid parameter. UIDs come from the accessibility tree returned by take_snapshot. The agent always calls take_snapshot first to get current UIDs, then passes the target UID to an action tool. This avoids brittle CSS selectors and XPath expressions that break when the DOM changes.
Puppeteer wait-for-action-result β no polling loops
After every interaction (click, fill, navigate), Puppeteer waits for the page to settle before returning control to the agent. The agent doesn't need to explicitly poll for page readiness. This eliminates a common class of timing bugs where the agent acts on a page before JavaScript has finished updating it.
CDP over Selenium/Playwright
CDP gives lower-level access than Playwright. The agent can read console messages with source-mapped stack traces, intercept network requests, execute arbitrary JavaScript, and record DevTools-level performance traces β none of which are easily accessible through Playwright's abstraction layer.
Managed Chrome vs. connecting to an existing instance
By default the server launches its own Chrome with a dedicated profile. For cases where the agent needs to maintain session state (logged-in accounts) or work alongside manual testing, it can connect to an existing Chrome instance running with remote debugging enabled.
How to Build Your Own
1. The snapshot-then-act pattern is fundamental
Every interaction sequence starts with a fresh snapshot. UIDs are page-state-specific; a UID from a previous snapshot may be stale after navigation or a DOM mutation. Always snapshot before acting, especially after any navigation or form submission.
2. Implement wait-for-action-result for every interaction
Return from a tool call only after the page has settled β not immediately after the DOM event fires. Puppeteer's waitForNavigation and waitForSelector are the right primitives. Without this, the agent will act on a page mid-transition and get inconsistent results.
3. Use accessibility tree for elements, screenshots for visual verification
The accessibility tree gives precise element references (role, name, state, uid). Screenshots give visual context β useful for the agent to verify that a page looks correct. Use both: snapshot for acting, screenshot for confirming the visual result looks right.
4. Isolate user data when handling sensitive sites
CDP exposes the full contents of the browser session to the MCP client. If the agent is browsing authenticated pages or handling credentials, use the isolated mode (temporary user data directory cleaned up after the session) to prevent cross-contamination between tasks.
5. Combine lab traces with CrUX field data for performance analysis
A single lab trace shows one user's experience. CrUX field data shows percentile distributions across real users. The performance_analyze_insight tool combines both when the URL is publicly accessible, giving the agent a fuller picture of actual user experience vs. lab conditions.
Frequently Asked Questions
Does the server need Chrome to be running before it starts?
No. By default the server launches its own managed Chrome instance via Puppeteer when a tool that requires a browser is first called. Chrome does not start on server startup β only when actually needed.
How do tools identify page elements?
Tools that interact with elements (click, fill, hover) take a uid parameter. UIDs come from the page snapshot returned by take_snapshot. The agent calls take_snapshot, identifies the target element by uid, then passes that uid to the action tool.
What is the CrUX API and when does it send data externally?
The Chrome User Experience Report (CrUX) API provides real-user performance field data for public URLs. It is used by performance_analyze_insight alongside lab trace data. It can be disabled via a server flag to prevent any URL from being sent to Google's API.
Authors: Qiushi Wu & Orange π