Google launched Antigravity IDE on November 18, 2025. It's a fork of VS Code with AI agents built in. The agents can write code, edit files, run terminal commands, and control a browser. I got access to the public preview and tried it out.
What struck me was the browser integration. Cursor has something similar—agents that can interact with web pages—but Antigravity's implementation felt different. When you ask it to do something in a browser, a Chrome window opens, the agent navigates and clicks around, and everything gets recorded as a video artifact. I wanted to know how it worked.
So I treated it as a black box and worked backwards.
The investigation started with the agent itself. I asked it to look at its own system instructions, and it revealed this tool definition:
open_browser_url tool failed, there is a browser issue that is out of your control. You MUST ask the user how to proceed and use the suggested_responses tool.
Parameters
This was the first clue. The tool isn't a direct command like `click()` or `type()`. It's a request to start a sub-agent. The main agent delegates the high-level goal ("Go to Google") to this sub-agent, and it handles the details.
So the question became: What is this sub-agent? And where does it live? Armed with the tool definition, I turned to the terminal to find the running processes backing this capability.
My first step was standard reconnaissance. If there's a browser window open, there must be a process running it. I ran ps aux | grep Chrome and found the smoking gun immediately:
$ ps aux | grep Chrome
/Applications/Google Chrome.app/Contents/MacOS/Google Chrome \
--remote-debugging-port=9222 \
--user-data-dir=/Users/alokbishoyi/.gemini/antigravity-browser-profile \
--disable-fre --no-default-browser-check
Standard Chrome, but with remote debugging enabled on port 9222. This is the Chrome DevTools Protocol (CDP) interface. I verified it was listening:
$ curl http://127.0.0.1:9222/json/version
{"Browser":"Chrome/131.0.6778.0","Protocol-Version":"1.3",...}
But who was talking to it? I ran lsof -i :9222:
$ lsof -i :9222
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
node 38213 alokbishoyi 23u IPv6 0x... 0t0 TCP *:9222 (LISTEN)
A Node.js process. I checked what it was running:
$ ps -p 38213 -o command=
node /Users/alokbishoyi/.npm/_npx/.../node_modules/@agentdeskai/browser-tools-mcp/...
An MCP (Model Context Protocol) server package named @agentdeskai/browser-tools-mcp. But the MCP server had to be getting instructions from somewhere. I checked what spawned it:
$ ps -ef | grep 38213
15440 1 alokbishoyi ... /Applications/Antigravity.app/.../language_server_macos_arm
38213 15440 alokbishoyi ... node .../browser-tools-mcp
PID 15440 was the parent. I scanned for all listening ports:
$ lsof -i -P | grep LISTEN | grep 15440
language_server_macos_arm 15440 alokbishoyi 12u IPv6 ... TCP *:53410 (LISTEN)
language_server_macos_arm 15440 alokbishoyi 13u IPv6 ... TCP *:53412 (LISTEN)
language_server_macos_arm 15440 alokbishoyi 14u IPv6 ... TCP *:53413 (LISTEN)
language_server_macos_arm 15440 alokbishoyi 15u IPv6 ... TCP *:53422 (LISTEN)
Four ports. I checked the process details:
$ ps -p 15440 -o command=
/Applications/Antigravity.app/.../language_server_macos_arm \
--extension_server_port 53410 \
--enable_lsp \
--api_server_url http://jetski-server.corp.goog \
--csrf_token [REDACTED]
Port 53410 was the extension server—the API endpoint for tool execution. Ports 53412-53413 were standard LSP channels for code intelligence. Port 53422 was an additional service channel. This was Antigravity's custom coordination server handling both code features and agent orchestration. The command-line flags revealed it connects to Google's internal infrastructure (jetski-server.corp.goog) and uses CSRF tokens for authentication.
The Language Server was a compiled binary, which usually means a dead end. But I decided to try a classic reverse-engineering trick: the strings command.
$ strings /Applications/Antigravity.app/.../language_server_macos_arm | grep -i browser | head -20
The output was a goldmine. It contained specific Go handlers for every browser action:
third_party/jetski/cortex/handlers/browser_subagent_handler.go
third_party/jetski/cortex/handlers/browser_click_element_handler.go
third_party/jetski/cortex/handlers/browser_press_key_handler.go
third_party/jetski/cortex/handlers/browser_resize_window_handler.go
third_party/jetski/cortex/handlers/browser_scroll_down_handler.go
third_party/jetski/cortex/handlers/browser_scroll_handler.go
third_party/jetski/cortex/handlers/browser_scroll_up_handler.go
third_party/jetski/cortex/handlers/browser_select_option_handler.go
third_party/jetski/cortex/handlers/capture_browser_screenshot_handler.go
third_party/jetski/cortex/handlers/read_browser_page_handler.go
"Jetski". That was the internal codename. It was a collection of granular handlers. The browser_subagent_handler.go seemed to be the brain, while others handled specific motor functions like clicking elements, scrolling, capturing screenshots, and reading page content.
I also found references to strongly typed structures:
$ strings language_server_macos_arm | grep -i "Browser.*Tool"
BrowserInputToolConverter
BrowserScrollDownToolArgs
BrowserClickPixelToolArgs
BrowserNavigateToolArgs
BrowserCaptureScreenshotToolArgs
BrowserClickElementToolArgs
BrowserScrollUpToolArgs
BrowserScrollToolArgs
BrowserSelectOptionToolArgs
BrowserResizeWindowToolArgs
BrowserPressKeyToolArgs
This proved that the server maintains a strongly typed internal representation of the browser tools before serializing them for the LLM. Each tool had its own argument structure, suggesting a well-defined API boundary between the language server and the browser automation layer.
Deeper analysis revealed the presence of ToolConverter logic within google3/third_party/jetski/cortex/tools/. This implies a translation layer: when the "Jetski" sub-agent decides to click or type, it calls an internal function that passes through these ToolConverters, which validate the arguments (checking PrerequisiteArgumentNames) and likely handle partial parsing for streaming responses.
Extracting function names from the binary revealed the complete ToolConverter system. I searched for method names containing "ToolConverter" and found the full function signatures:
$ strings language_server_macos_arm | grep "cortex/tools/tools" | grep "ToolConverter"
google3/third_party/jetski/cortex/tools/tools.(*OpenBrowserUrlToolConverter).GetToolDefinition
google3/third_party/jetski/cortex/tools/tools.(*OpenBrowserUrlToolConverter).ToolCallToCortexStep
google3/third_party/jetski/cortex/tools/tools.(*OpenBrowserUrlToolConverter).GetPayloadCase
google3/third_party/jetski/cortex/tools/tools.(*CaptureBrowserScreenshotToolConverter).GetToolDefinition
google3/third_party/jetski/cortex/tools/tools.(*CaptureBrowserScreenshotToolConverter).ToolCallToCortexStep
...
Each browser tool has its own dedicated converter class implementing three core methods:
google3/third_party/jetski/cortex/tools/tools.(*OpenBrowserUrlToolConverter).GetToolDefinition
google3/third_party/jetski/cortex/tools/tools.(*OpenBrowserUrlToolConverter).ToolCallToCortexStep
google3/third_party/jetski/cortex/tools/tools.(*OpenBrowserUrlToolConverter).GetPayloadCase
GetToolDefinition() returns the JSON schema that describes the tool to the LLM (parameters, types, descriptions). ToolCallToCortexStep() converts the LLM's tool call JSON into an internal Cortex step representation. GetPayloadCase() determines which protobuf message type to use for serialization.
To find all the ToolConverter class names, I searched for pointer type signatures:
$ strings language_server_macos_arm | grep -E "\*tools\..*ToolConverter"
*tools.BrowserInputToolConverter
*tools.BrowserGetDomToolConverter
*tools.BrowserScrollToolConverter
...
I found 19 distinct browser ToolConverters in the binary:
*tools.BrowserInputToolConverter
*tools.BrowserGetDomToolConverter
*tools.BrowserScrollToolConverter
*tools.OpenBrowserUrlToolConverter
*tools.ReadBrowserPageToolConverter
*tools.BrowserScrollUpToolConverter
*tools.BrowserPressKeyToolConverter
*tools.BrowserSubagentToolConverter
*tools.ListBrowserPagesToolConverter
*tools.BrowserMoveMouseToolConverter
*tools.ClickBrowserPixelToolConverter
*tools.BrowserScrollDownToolConverter
*tools.BrowserSelectOptionToolConverter
*tools.BrowserClickElementToolConverter
*tools.BrowserResizeWindowToolConverter
*tools.BrowserDragPixelToPixelToolConverter
*tools.CaptureBrowserScreenshotToolConverter
*tools.ExecuteBrowserJavaScriptToolConverter
*tools.CaptureBrowserConsoleLogsToolConverter
Each converter also has a corresponding StringConverter for converting internal steps back into text format that the LLM can read in conversation history. I found these by searching for "StringConverter" patterns:
$ strings language_server_macos_arm | grep -E "\*chatconverters\..*StringConverter" | grep -i browser
*chatconverters.BrowserInputStringConverter
*chatconverters.BrowserGetDomStringConverter
*chatconverters.BrowserScrollStringConverter
*chatconverters.OpenBrowserUrlStringConverter
These StringConverters handle the reverse transformation—converting internal Cortex steps into natural language text that gets included in the LLM's conversation context:
*chatconverters.BrowserInputStringConverter
*chatconverters.BrowserGetDomStringConverter
*chatconverters.BrowserScrollStringConverter
*chatconverters.OpenBrowserUrlStringConverter
This dual-layer architecture—ToolConverters for LLM→internal translation and StringConverters for internal→LLM text (for conversation history)—ensures type safety at every boundary. The system validates tool calls before execution, handles partial parsing for streaming responses, and converts completed steps back into natural language that the LLM can read in subsequent turns.
By extracting strings from the binary and cross-referencing with the MCP server code, I was able to reconstruct the complete set of browser tools available to the sub-agent:
I also found references to the internal infrastructure:
$ strings language_server_macos_arm | grep -i jetski
jetski-server.corp.goog
jetski/cortex/handlers
jetski/cortex/tools
google3/third_party/jetski/prompt/template_provider/templates/system_prompts/
The binary confirmed the connection to Google's internal "jetski-server" infrastructure that I'd seen in the command-line flags. The presence of template_provider paths suggests that prompts are loaded from template files, explaining the fragmented nature of the prompt strings.
If "Jetski" was the body, what was the soul? I wanted to find the system prompt—the text that tells the AI who it is.
I started by searching for common prompt patterns in the binary:
$ strings /Applications/Antigravity.app/.../language_server_macos_arm | grep -i "you are" | head -3
You are an expert AI coding assistant and are pair programming with a USER to solve a coding task. When asked, you focus on outlining the USER's main goals and anticipating likely next steps they will take.
Found the main persona. But I needed the specific instructions for the browser agent. I searched for "Jetski Browser":
$ strings /Applications/Antigravity.app/.../language_server_macos_arm | grep -C 2 "Jetski Browser"
*Listed Jetski Browser pages*
*Took screenshot in Jetski Browser*
*Clicked on pixel in Jetski Browser*
*Read browser page in Jetski Browser*
*Captured DOM tree in Jetski Browser*
These looked like internal log messages or "thoughts" the agent emits. Then I found the tool definitions themselves:
$ strings /Applications/Antigravity.app/.../language_server_macos_arm | grep "Open a URL in Jetski Browser"
Open a URL in Jetski Browser to view the page contents of a URL in a rendered format. You can also use this tool to navigate to different URLs or reload the current page.
The prompt wasn't a single contiguous block of text I could extract. Instead, it was fragmented—compiled as individual string literals scattered throughout the binary. The Language Server likely assembles these pieces dynamically at runtime to construct the full system prompt. This explains why a simple strings dump didn't reveal a neat "You are Jetski..." paragraph.
By extracting all relevant strings and cross-referencing with the template provider paths found in the binary (google3/third_party/jetski/prompt/template_provider/templates/system_prompts/), I was able to reconstruct a more complete picture of the system prompt:
This confirmed that the browser_subagent spins up a dedicated sub-agent with a specific persona ("Jetski Browser") constructed from these embedded fragments. The prompt is assembled from template files at runtime, which explains why it appears fragmented in the binary—each component is stored separately and combined dynamically.
The binary references suggest the prompt system uses templates stored in:
google3/third_party/jetski/prompt/template_provider/templates/system_prompts/
- notify_user_tool.tmpl
- conversation_logs.tmpl
- ephemeral_message.tmpl
- mode_descriptions.tmpl
- persistent_context.tmpl
- task_boundary_tool.tmpl
- communication_style.tmpl
- file_diffs_artifact.tmpl
- knowledge_discovery.tmpl
- walkthrough_artifact.tmpl
This template-based approach allows Google to update prompts without recompiling the binary, and enables dynamic prompt construction based on context, mode, and available tools.
I had the Brain (Language Server) and the Eyes (Chrome). But how did they talk? The MCP server was the middleman, but it wasn't talking to Chrome directly.
I scrutinized the browser-tools-mcp code. Looking at the source in ~/.npm/_npx/.../browser-tools-mcp/, I saw it making HTTP requests to a discovery endpoint:
$ cat ~/.npm/_npx/.../browser-tools-mcp/src/index.ts | grep -A 5 "discover"
const discoverPort = async () => {
for (let port = 3025; port <= 3035; port++) {
const response = await fetch(`http://localhost:${port}/.identity`);
if (response.ok) return port;
}
}
It was trying ports 3025-3035, looking for a /.identity endpoint. But nothing showed up on those ports in my initial scan. Then I realized: the server might only be running when a browser session is active.
I checked the Chrome extensions directory:
$ ls -la ~/.gemini/antigravity-browser-profile/Default/Extensions/
eeijfnjmjelapkebgockoeaadonbchdd/
Found an extension with ID eeijfnjmjelapkebgockoeaadonbchdd. I looked at its manifest:
$ cat ~/.gemini/antigravity-browser-profile/.../manifest.json
{
"name": "Antigravity Browser Connector",
"background": { "service_worker": "service_worker_binary.js" },
"permissions": ["debugger", "tabs"]
}
Digging into its service_worker_binary.js (minified, but readable enough), I found the missing link:
// De-minified for clarity
app.post('/navigate', async (req, res) => {
const { url } = req.body;
await chrome.debugger.sendCommand({ tabId: tabId }, 'Page.navigate', { url });
res.json({ success: true });
});
app.get('/.identity', (req, res) => {
res.json({ identity: 'mcp-browser-connector-24x7' });
});
The Extension runs a local HTTP server inside the browser. It receives high-level commands (like "navigate") via HTTP and translates them into low-level CDP WebSocket messages. The /.identity endpoint is how the MCP server discovers it.
Why use an extension instead of talking to CDP directly? CDP is a low-level protocol that reports network idle states, but doesn't indicate when a page is actually ready for interaction. Running code inside the browser via an extension provides several advantages: it can access the DOM directly, handle complex page state, bypass CORS restrictions, and expose a simpler high-level API ("navigate", "click") rather than requiring direct manipulation of the DevTools protocol.
Putting it all together, here is the complete flow of an Antigravity browser action:
navigate(), which goes to the MCP Server.Previous MCP integrations followed a simpler pattern: the IDE would spawn an MCP server as a child process, communicate via STDIO, and expose tools directly to the main AI agent. Tools were typically thin wrappers around existing APIs—file operations, terminal commands, or simple HTTP requests. The agent would call these tools directly, and the MCP server would execute them synchronously.
Antigravity's approach It does it a bit differently. First, it uses a sub-agent pattern: instead of exposing browser tools directly to the main agent, it spawns a dedicated "Jetski" sub-agent with its own system prompt and specialized tool set. This sub-agent runs as a separate AI instance, allowing it to maintain browser-specific context and decision-making logic independently from the main IDE agent.
Second, the MCP server isn't spawned directly by the IDE—it's orchestrated by the language server, which acts as a coordination layer. The language server manages the sub-agent lifecycle, routes tool calls, and handles the translation between the sub-agent's tool invocations and the actual browser automation layer.
Third, instead of using a standard browser automation library like Playwright or Puppeteer directly, Antigravity inserts a Chrome extension as an intermediary. This extension runs an HTTP server inside the browser, providing a high-level API that abstracts away the complexity of Chrome DevTools Protocol while still allowing low-level CDP access when needed.
What's interesting here is that MCP servers aren't just tool providers anymore—they're agent coordinators. When you have something complex like browser automation, you don't want your main agent thinking about DOM elements and network timing. You want it focused on code. So Antigravity delegates: the main agent coordinates, the sub-agent handles browser logic, the language server routes, and the extension executes. Each layer does one thing well.