Alok Bishoyi

Reverse Engineering Antigravity's Browser Automation

November 19, 2025

Google launched Antigravity IDE on November 18, 2025. It's a fork of VS Code with AI agents built in. The agents can write code, edit files, run terminal commands, and control a browser. I got access to the public preview and tried it out.

What struck me was the browser integration. Cursor has something similar—agents that can interact with web pages—but Antigravity's implementation felt different. When you ask it to do something in a browser, a Chrome window opens, the agent navigates and clicks around, and everything gets recorded as a video artifact. I wanted to know how it worked.

So I treated it as a black box and worked backwards.

The Entry Point: The Agent's Tools

The investigation started with the agent itself. I asked it to look at its own system instructions, and it revealed this tool definition:

Tool: browser_subagent Start a browser subagent to perform actions in the browser with the given task description. The subagent has access to tools for both interacting with web page content (clicking, typing, navigating, etc) and controlling the browser window itself (resizing, etc). Please make sure to define a clear condition to return on. After the subagent returns, you should read the DOM or capture a screenshot to see what it did. Note: All browser interactions are automatically recorded and saved as WebP videos to the artifacts directory. This is the ONLY way you can record a browser session video/animation. IMPORTANT: If the subagent returns that the open_browser_url tool failed, there is a browser issue that is out of your control. You MUST ask the user how to proceed and use the suggested_responses tool. Parameters

This was the first clue. The tool isn't a direct command like `click()` or `type()`. It's a request to start a sub-agent. The main agent delegates the high-level goal ("Go to Google") to this sub-agent, and it handles the details.

So the question became: What is this sub-agent? And where does it live? Armed with the tool definition, I turned to the terminal to find the running processes backing this capability.

Chapter 1: The Black Box

My first step was standard reconnaissance. If there's a browser window open, there must be a process running it. I ran ps aux | grep Chrome and found the smoking gun immediately:

$ ps aux | grep Chrome
/Applications/Google Chrome.app/Contents/MacOS/Google Chrome \
  --remote-debugging-port=9222 \
  --user-data-dir=/Users/alokbishoyi/.gemini/antigravity-browser-profile \
  --disable-fre --no-default-browser-check

Standard Chrome, but with remote debugging enabled on port 9222. This is the Chrome DevTools Protocol (CDP) interface. I verified it was listening:

$ curl http://127.0.0.1:9222/json/version
{"Browser":"Chrome/131.0.6778.0","Protocol-Version":"1.3",...}

But who was talking to it? I ran lsof -i :9222:

$ lsof -i :9222
COMMAND   PID  USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
node    38213 alokbishoyi   23u  IPv6 0x...      0t0  TCP *:9222 (LISTEN)

A Node.js process. I checked what it was running:

$ ps -p 38213 -o command=
node /Users/alokbishoyi/.npm/_npx/.../node_modules/@agentdeskai/browser-tools-mcp/...

An MCP (Model Context Protocol) server package named @agentdeskai/browser-tools-mcp. But the MCP server had to be getting instructions from somewhere. I checked what spawned it:

$ ps -ef | grep 38213
  15440     1 alokbishoyi  ... /Applications/Antigravity.app/.../language_server_macos_arm
  38213 15440 alokbishoyi  ... node .../browser-tools-mcp

PID 15440 was the parent. I scanned for all listening ports:

$ lsof -i -P | grep LISTEN | grep 15440
language_server_macos_arm  15440  alokbishoyi   12u  IPv6  ... TCP *:53410 (LISTEN)
language_server_macos_arm  15440  alokbishoyi   13u  IPv6  ... TCP *:53412 (LISTEN)
language_server_macos_arm  15440  alokbishoyi   14u  IPv6  ... TCP *:53413 (LISTEN)
language_server_macos_arm  15440  alokbishoyi   15u  IPv6  ... TCP *:53422 (LISTEN)

Four ports. I checked the process details:

$ ps -p 15440 -o command=
/Applications/Antigravity.app/.../language_server_macos_arm \
  --extension_server_port 53410 \
  --enable_lsp \
  --api_server_url http://jetski-server.corp.goog \
  --csrf_token [REDACTED]

Port 53410 was the extension server—the API endpoint for tool execution. Ports 53412-53413 were standard LSP channels for code intelligence. Port 53422 was an additional service channel. This was Antigravity's custom coordination server handling both code features and agent orchestration. The command-line flags revealed it connects to Google's internal infrastructure (jetski-server.corp.goog) and uses CSRF tokens for authentication.

Chapter 2: Cracking the Binary

The Language Server was a compiled binary, which usually means a dead end. But I decided to try a classic reverse-engineering trick: the strings command.

$ strings /Applications/Antigravity.app/.../language_server_macos_arm | grep -i browser | head -20

The output was a goldmine. It contained specific Go handlers for every browser action:

third_party/jetski/cortex/handlers/browser_subagent_handler.go
third_party/jetski/cortex/handlers/browser_click_element_handler.go
third_party/jetski/cortex/handlers/browser_press_key_handler.go
third_party/jetski/cortex/handlers/browser_resize_window_handler.go
third_party/jetski/cortex/handlers/browser_scroll_down_handler.go
third_party/jetski/cortex/handlers/browser_scroll_handler.go
third_party/jetski/cortex/handlers/browser_scroll_up_handler.go
third_party/jetski/cortex/handlers/browser_select_option_handler.go
third_party/jetski/cortex/handlers/capture_browser_screenshot_handler.go
third_party/jetski/cortex/handlers/read_browser_page_handler.go

"Jetski". That was the internal codename. It was a collection of granular handlers. The browser_subagent_handler.go seemed to be the brain, while others handled specific motor functions like clicking elements, scrolling, capturing screenshots, and reading page content.

I also found references to strongly typed structures:

$ strings language_server_macos_arm | grep -i "Browser.*Tool"
BrowserInputToolConverter
BrowserScrollDownToolArgs
BrowserClickPixelToolArgs
BrowserNavigateToolArgs
BrowserCaptureScreenshotToolArgs
BrowserClickElementToolArgs
BrowserScrollUpToolArgs
BrowserScrollToolArgs
BrowserSelectOptionToolArgs
BrowserResizeWindowToolArgs
BrowserPressKeyToolArgs

This proved that the server maintains a strongly typed internal representation of the browser tools before serializing them for the LLM. Each tool had its own argument structure, suggesting a well-defined API boundary between the language server and the browser automation layer.

Deeper analysis revealed the presence of ToolConverter logic within google3/third_party/jetski/cortex/tools/. This implies a translation layer: when the "Jetski" sub-agent decides to click or type, it calls an internal function that passes through these ToolConverters, which validate the arguments (checking PrerequisiteArgumentNames) and likely handle partial parsing for streaming responses.

The ToolConverter Architecture

Extracting function names from the binary revealed the complete ToolConverter system. I searched for method names containing "ToolConverter" and found the full function signatures:

$ strings language_server_macos_arm | grep "cortex/tools/tools" | grep "ToolConverter"
google3/third_party/jetski/cortex/tools/tools.(*OpenBrowserUrlToolConverter).GetToolDefinition
google3/third_party/jetski/cortex/tools/tools.(*OpenBrowserUrlToolConverter).ToolCallToCortexStep
google3/third_party/jetski/cortex/tools/tools.(*OpenBrowserUrlToolConverter).GetPayloadCase
google3/third_party/jetski/cortex/tools/tools.(*CaptureBrowserScreenshotToolConverter).GetToolDefinition
google3/third_party/jetski/cortex/tools/tools.(*CaptureBrowserScreenshotToolConverter).ToolCallToCortexStep
...

Each browser tool has its own dedicated converter class implementing three core methods:

google3/third_party/jetski/cortex/tools/tools.(*OpenBrowserUrlToolConverter).GetToolDefinition
google3/third_party/jetski/cortex/tools/tools.(*OpenBrowserUrlToolConverter).ToolCallToCortexStep
google3/third_party/jetski/cortex/tools/tools.(*OpenBrowserUrlToolConverter).GetPayloadCase

GetToolDefinition() returns the JSON schema that describes the tool to the LLM (parameters, types, descriptions). ToolCallToCortexStep() converts the LLM's tool call JSON into an internal Cortex step representation. GetPayloadCase() determines which protobuf message type to use for serialization.

To find all the ToolConverter class names, I searched for pointer type signatures:

$ strings language_server_macos_arm | grep -E "\*tools\..*ToolConverter"
*tools.BrowserInputToolConverter
*tools.BrowserGetDomToolConverter
*tools.BrowserScrollToolConverter
...

I found 19 distinct browser ToolConverters in the binary:

*tools.BrowserInputToolConverter
*tools.BrowserGetDomToolConverter
*tools.BrowserScrollToolConverter
*tools.OpenBrowserUrlToolConverter
*tools.ReadBrowserPageToolConverter
*tools.BrowserScrollUpToolConverter
*tools.BrowserPressKeyToolConverter
*tools.BrowserSubagentToolConverter
*tools.ListBrowserPagesToolConverter
*tools.BrowserMoveMouseToolConverter
*tools.ClickBrowserPixelToolConverter
*tools.BrowserScrollDownToolConverter
*tools.BrowserSelectOptionToolConverter
*tools.BrowserClickElementToolConverter
*tools.BrowserResizeWindowToolConverter
*tools.BrowserDragPixelToPixelToolConverter
*tools.CaptureBrowserScreenshotToolConverter
*tools.ExecuteBrowserJavaScriptToolConverter
*tools.CaptureBrowserConsoleLogsToolConverter

Each converter also has a corresponding StringConverter for converting internal steps back into text format that the LLM can read in conversation history. I found these by searching for "StringConverter" patterns:

$ strings language_server_macos_arm | grep -E "\*chatconverters\..*StringConverter" | grep -i browser
*chatconverters.BrowserInputStringConverter
*chatconverters.BrowserGetDomStringConverter
*chatconverters.BrowserScrollStringConverter
*chatconverters.OpenBrowserUrlStringConverter

These StringConverters handle the reverse transformation—converting internal Cortex steps into natural language text that gets included in the LLM's conversation context:

*chatconverters.BrowserInputStringConverter
*chatconverters.BrowserGetDomStringConverter
*chatconverters.BrowserScrollStringConverter
*chatconverters.OpenBrowserUrlStringConverter

This dual-layer architecture—ToolConverters for LLM→internal translation and StringConverters for internal→LLM text (for conversation history)—ensures type safety at every boundary. The system validates tool calls before execution, handles partial parsing for streaming responses, and converts completed steps back into natural language that the LLM can read in subsequent turns.

The Complete Browser Tool Arsenal

By extracting strings from the binary and cross-referencing with the MCP server code, I was able to reconstruct the complete set of browser tools available to the sub-agent:

Browser Navigation Tools: 1. browser_navigate (or open_browser_url) - Description: "Open a URL in Jetski Browser to view the page contents of a URL in a rendered format. You can also use this tool to navigate to different URLs or reload the current page." - Parameters: url (STRING) - The URL to navigate to 2. read_browser_page - Description: "Read browser page in Jetski Browser" / "Get the DOM tree of an open page in the Jetski Browser. Returns only interactive elements and text within the current viewport, each with an index for interaction. If an element is not included, it may be outside the viewport or getting filtered for other reasons - refer to the screenshot to confirm. Then try read_browser_page and browser_scroll tools." - Parameters: page_id (STRING, optional) - The page ID to read Browser Interaction Tools: 3. browser_click_element - Description: Click on an element in the browser by its index - Parameters: - element_index (INTEGER) - Index of the element from the DOM tree - page_id (STRING, optional) - The page ID 4. browser_select_option - Description: Select an option in a dropdown/select element - Parameters: - element_index (INTEGER) - Index of the select element - option_value (STRING) - Value to select - page_id (STRING, optional) 5. browser_press_key - Description: Press a keyboard key - Parameters: - key (STRING) - Key to press (e.g., "Enter", "Escape", "ArrowLeft") - page_id (STRING, optional) Browser Scrolling Tools: 6. browser_scroll - Description: "A tool used to scroll on an element or the page in the browser. For vertical scroll, dy is automatically set to the height of the element/page. For horizontal scroll, dx the width of the element/page. Will output the number of pixels scrolled, indicating 0 pixels if no scrolling occurred." - Parameters: - element_index (INTEGER, optional) - Index of element to scroll, or omit for page scroll - direction (STRING, optional) - "up", "down", "left", "right" - dx (INTEGER, optional) - Horizontal scroll distance - dy (INTEGER, optional) - Vertical scroll distance - page_id (STRING, optional) 7. browser_scroll_up - Description: Scroll up on the page or element - Parameters: - element_index (INTEGER, optional) - page_id (STRING, optional) 8. browser_scroll_down - Description: Scroll down on the page or element - Parameters: - element_index (INTEGER, optional) - page_id (STRING, optional) Browser Window Management: 9. browser_resize_window - Description: Resize the browser window - Parameters: - width (INTEGER) - New window width - height (INTEGER) - New window height Browser Capture Tools: 10. capture_browser_screenshot - Description: Capture a screenshot of the current browser page - Parameters: - page_id (STRING, optional) - The page ID to capture 11. execute_browser_javascript - Description: Execute JavaScript code in the browser context - Parameters: - code (STRING) - JavaScript code to execute - page_id (STRING, optional) Browser Page Management: 12. list_browser_pages - Description: "List all open pages in Jetski Browser and their metadata (page_id, url, title, viewport size, etc.)" - Parameters: None

I also found references to the internal infrastructure:

$ strings language_server_macos_arm | grep -i jetski
jetski-server.corp.goog
jetski/cortex/handlers
jetski/cortex/tools
google3/third_party/jetski/prompt/template_provider/templates/system_prompts/

The binary confirmed the connection to Google's internal "jetski-server" infrastructure that I'd seen in the command-line flags. The presence of template_provider paths suggests that prompts are loaded from template files, explaining the fragmented nature of the prompt strings.

Chapter 3: The Soul (The Reconstructed Prompt)

If "Jetski" was the body, what was the soul? I wanted to find the system prompt—the text that tells the AI who it is.

I started by searching for common prompt patterns in the binary:

$ strings /Applications/Antigravity.app/.../language_server_macos_arm | grep -i "you are" | head -3
You are an expert AI coding assistant and are pair programming with a USER to solve a coding task. When asked, you focus on outlining the USER's main goals and anticipating likely next steps they will take.

Found the main persona. But I needed the specific instructions for the browser agent. I searched for "Jetski Browser":

$ strings /Applications/Antigravity.app/.../language_server_macos_arm | grep -C 2 "Jetski Browser"
*Listed Jetski Browser pages*
*Took screenshot in Jetski Browser*
*Clicked on pixel in Jetski Browser*
*Read browser page in Jetski Browser*
*Captured DOM tree in Jetski Browser*

These looked like internal log messages or "thoughts" the agent emits. Then I found the tool definitions themselves:

$ strings /Applications/Antigravity.app/.../language_server_macos_arm | grep "Open a URL in Jetski Browser"
Open a URL in Jetski Browser to view the page contents of a URL in a rendered format. You can also use this tool to navigate to different URLs or reload the current page.

The prompt wasn't a single contiguous block of text I could extract. Instead, it was fragmented—compiled as individual string literals scattered throughout the binary. The Language Server likely assembles these pieces dynamically at runtime to construct the full system prompt. This explains why a simple strings dump didn't reveal a neat "You are Jetski..." paragraph.

By extracting all relevant strings and cross-referencing with the template provider paths found in the binary (google3/third_party/jetski/prompt/template_provider/templates/system_prompts/), I was able to reconstruct a more complete picture of the system prompt:

Core Identity: "You are an expert AI coding assistant and are pair programming with a USER to solve a coding task. When asked, you focus on outlining the USER's main goals and anticipating likely next steps they will take." Browser Agent Context: You are operating within the "Jetski Browser" context. This is a specialized browser automation environment where you have access to browser-specific tools for interacting with web pages. Browser Capabilities: - "Open a URL in Jetski Browser to view the page contents of a URL in a rendered format. You can also use this tool to navigate to different URLs or reload the current page." - "Get the DOM tree of an open page in the Jetski Browser. Returns only interactive elements and text within the current viewport, each with an index for interaction. If an element is not included, it may be outside the viewport or getting filtered for other reasons - refer to the screenshot to confirm. Then try read_browser_page and browser_scroll tools." - "List all open pages in Jetski Browser and their metadata (page_id, url, title, viewport size, etc.)" Tool Usage Guidelines: - "Act as if the tool calls will be executed immediately after your message, and your next response will have access to their results." - "Formulate your tool calls using the xml and json format specified for each tool." - "The tool arguments should be in a valid json inside of the xml tags." - "The tool name should be the xml tag surrounding the tool call." - "You are REQUIRED to call a tool in your response." Error Handling & Recovery: - "You may have seen the following lint errors as feedback for a previous edit, but they still exist at this point. Please respond accordingly, erring toward explicitness." - "There was a problem parsing the tool call. Error Message: %v. Guidance: You are trying to correct your previous tool call error, you must focus on fixing the failed tool call with sequential tool calls and try again. Do not do parallel tool calls and if you are fixing multiple tool calls, do them one at a time. Do not apologize. Retries remaining: %d." Browser Interaction Patterns: - When elements are not visible, use browser_scroll tools to bring them into viewport - Always capture a screenshot after significant actions to verify state - Use read_browser_page to get the current DOM structure before interacting - Elements are indexed for interaction - use the index from the DOM tree response - If an element is not found, check if it's outside the viewport and scroll first Internal Thought Patterns (Log Messages): The agent emits internal thoughts that appear in logs: - "*Listed Jetski Browser pages*" - "*Took screenshot in Jetski Browser*" - "*Clicked on pixel in Jetski Browser*" - "*Read browser page in Jetski Browser*" - "*Captured DOM tree in Jetski Browser*" Task Completion: - Define clear conditions to return on - After the subagent returns, read the DOM or capture a screenshot to see what it did - All browser interactions are automatically recorded and saved as WebP videos to the artifacts directory

This confirmed that the browser_subagent spins up a dedicated sub-agent with a specific persona ("Jetski Browser") constructed from these embedded fragments. The prompt is assembled from template files at runtime, which explains why it appears fragmented in the binary—each component is stored separately and combined dynamically.

Template-Based Prompt System

The binary references suggest the prompt system uses templates stored in:

google3/third_party/jetski/prompt/template_provider/templates/system_prompts/
  - notify_user_tool.tmpl
  - conversation_logs.tmpl
  - ephemeral_message.tmpl
  - mode_descriptions.tmpl
  - persistent_context.tmpl
  - task_boundary_tool.tmpl
  - communication_style.tmpl
  - file_diffs_artifact.tmpl
  - knowledge_discovery.tmpl
  - walkthrough_artifact.tmpl

This template-based approach allows Google to update prompts without recompiling the binary, and enables dynamic prompt construction based on context, mode, and available tools.

Chapter 4: The Bridge

I had the Brain (Language Server) and the Eyes (Chrome). But how did they talk? The MCP server was the middleman, but it wasn't talking to Chrome directly.

I scrutinized the browser-tools-mcp code. Looking at the source in ~/.npm/_npx/.../browser-tools-mcp/, I saw it making HTTP requests to a discovery endpoint:

$ cat ~/.npm/_npx/.../browser-tools-mcp/src/index.ts | grep -A 5 "discover"
const discoverPort = async () => {
  for (let port = 3025; port <= 3035; port++) {
    const response = await fetch(`http://localhost:${port}/.identity`);
    if (response.ok) return port;
  }
}

It was trying ports 3025-3035, looking for a /.identity endpoint. But nothing showed up on those ports in my initial scan. Then I realized: the server might only be running when a browser session is active.

I checked the Chrome extensions directory:

$ ls -la ~/.gemini/antigravity-browser-profile/Default/Extensions/
eeijfnjmjelapkebgockoeaadonbchdd/

Found an extension with ID eeijfnjmjelapkebgockoeaadonbchdd. I looked at its manifest:

$ cat ~/.gemini/antigravity-browser-profile/.../manifest.json
{
  "name": "Antigravity Browser Connector",
  "background": { "service_worker": "service_worker_binary.js" },
  "permissions": ["debugger", "tabs"]
}

Digging into its service_worker_binary.js (minified, but readable enough), I found the missing link:

// De-minified for clarity
app.post('/navigate', async (req, res) => {
  const { url } = req.body;
  await chrome.debugger.sendCommand({ tabId: tabId }, 'Page.navigate', { url });
  res.json({ success: true });
});

app.get('/.identity', (req, res) => {
  res.json({ identity: 'mcp-browser-connector-24x7' });
});

The Extension runs a local HTTP server inside the browser. It receives high-level commands (like "navigate") via HTTP and translates them into low-level CDP WebSocket messages. The /.identity endpoint is how the MCP server discovers it.

Why use an extension instead of talking to CDP directly? CDP is a low-level protocol that reports network idle states, but doesn't indicate when a page is actually ready for interaction. Running code inside the browser via an extension provides several advantages: it can access the DOM directly, handle complex page state, bypass CORS restrictions, and expose a simpler high-level API ("navigate", "click") rather than requiring direct manipulation of the DevTools protocol.

The Synthesis: The 6-Layer Architecture

Putting it all together, here is the complete flow of an Antigravity browser action:

  1. The Trigger: You ask the agent to "Go to Google".
  2. The Coordinator: The Language Server (Port 53410) spins up the "Jetski" Sub-Agent.
  3. The Brain: The Sub-Agent plans the action using its reconstructed system prompt.
  4. The Tool: It calls navigate(), which goes to the MCP Server.
  5. The Bridge: The MCP Server sends an HTTP POST to the Extension's Server (Port 3025).
  6. The Execution: The Extension translates this to CDP commands for Chrome (Port 9222).

Parting Thoughts

Previous MCP integrations followed a simpler pattern: the IDE would spawn an MCP server as a child process, communicate via STDIO, and expose tools directly to the main AI agent. Tools were typically thin wrappers around existing APIs—file operations, terminal commands, or simple HTTP requests. The agent would call these tools directly, and the MCP server would execute them synchronously.

Antigravity's approach It does it a bit differently. First, it uses a sub-agent pattern: instead of exposing browser tools directly to the main agent, it spawns a dedicated "Jetski" sub-agent with its own system prompt and specialized tool set. This sub-agent runs as a separate AI instance, allowing it to maintain browser-specific context and decision-making logic independently from the main IDE agent.

Second, the MCP server isn't spawned directly by the IDE—it's orchestrated by the language server, which acts as a coordination layer. The language server manages the sub-agent lifecycle, routes tool calls, and handles the translation between the sub-agent's tool invocations and the actual browser automation layer.

Third, instead of using a standard browser automation library like Playwright or Puppeteer directly, Antigravity inserts a Chrome extension as an intermediary. This extension runs an HTTP server inside the browser, providing a high-level API that abstracts away the complexity of Chrome DevTools Protocol while still allowing low-level CDP access when needed.

What's interesting here is that MCP servers aren't just tool providers anymore—they're agent coordinators. When you have something complex like browser automation, you don't want your main agent thinking about DOM elements and network timing. You want it focused on code. So Antigravity delegates: the main agent coordinates, the sub-agent handles browser logic, the language server routes, and the extension executes. Each layer does one thing well.