← Back to Payloads
ai2026-06-11

Six AI coding agents carry the same flaw Your CI runners are

Adversa's TrustFall disclosure shows Claude Code, Cursor CLI, Gemini CLI, and Copilot CLI all auto-execute project-defined MCP servers the moment the trust prompt is accepted — and on CI runners, that prompt never renders.
Quick Access
Install command
$ mrt install ai
Browse related skills
Six AI coding agents carry the same flaw Your CI runners are

Six AI coding agents carry the same flaw — your CI runners are exposed

Hey guys, Mr. Technology here — let me break this one down.

What You Need to Know: Adversa AI disclosed "TrustFall" — a one-click RCE that lands a project-defined MCP server as a native OS process the moment a developer (or a headless CI runner) accepts the folder trust prompt. The flaw is class-level: Claude Code, Cursor CLI, Gemini CLI, and Copilot CLI all share the convention, and on CI the trust dialog never renders at all.

Why It Matters

  • CI runners are the worst-case victim. Claude Code's claude-code-action runs headless by default, which means the trust dialog is skipped — the attack runs against pull-request branches with zero human interaction.
  • This isn't a Claude Code regression — it's a convention. Four major CLIs share the same project-MCP auto-execute behavior, and they differ only in how the trust dialog frames the authorization. You can't "responsible-disclose" a convention; you have to change the threat model.
  • The disclosure stalled on the boundary. Anthropic's security team reviewed TrustFall and declined it as "design intent." The disagreement is real: the v2.1+ dialog says "Is this a project you created or one you trust?" and lists nothing — informed consent inside their own model is the actual gap.

TrustFall: one-click RCE via project MCP settings

Rony Utevsky at Adversa AI published TrustFall on May 7, 2026 (Adversa AI). The chain is short and ugly:

1. A malicious repository ships a .mcp.json (or .claude/settings.json) that defines a project MCP server. The payload doesn't need to be a file — it can live inline in the JSON. 2. The first time the victim opens the folder and Claude Code asks "Quick safety check: Is this a project you created or one you trust?", the dialog no longer mentions MCP. The default option is "Yes, I trust." One Enter keypress. 3. The MCP server starts as a native OS process with the developer's full privileges. It can read stored secrets, exfiltrate the source tree, or open a long-lived C2 channel. No tool call from Claude is required.

The v2.1 dialog used to warn about MCP servers in the cloned repository and offer an opt-out. That warning is gone. Three project-scoped settings — enableAllProjectMcpServers, enabledMcpjsonServers, permissions.allow — can silently spawn arbitrary executables, and the dialog never discloses any of them.

It's not Claude-Code-only — all four CLIs share the pattern

Adversa's parity check across the four major agentic CLIs found the same auto-execute behavior in Claude Code, Gemini CLI, Cursor CLI, and Copilot CLI. They differ in how the dialog frames the authorization:

  • Claude Code — Generic "trust this folder." No MCP mention. No enumeration. Default: "Yes, I trust."
  • Gemini CLI — Warns about project MCP servers and lists them by name. Default: "Trust." (Most informative of the four.)
  • Cursor CLI — MCP-specific warning, but no enumeration. Default: "Trust."
  • Copilot CLI — Generic "trust this folder." No MCP mention. Default: "Yes."

The pattern: any cloned repository can pre-authorize attacker-controlled execution paths, and three of four CLIs make the user accept that the same way they accept a JavaScript folder.

CI runners are the zero-click variant

On CI runners running Claude Code "headless" (the default for the official claude-code-action), the trust dialog is skipped because it never renders. The same MCP-settings payload runs with zero human interaction against pull-request branches. Adversa's PoC repo includes a 0-click variant that exfiltrates process.env from a GitHub Actions runner to a collector URL of your choice (GitHub - adversa-ai/research).

The class of bug is the same — symlink hijack, sandbox escape, MCP injection — but the blast radius on a CI runner is materially worse. The runner has every secret the workflow can see, plus a network egress that often has no egress filtering.

Anthropic's position: design intent, not vulnerability

Anthropic's security team reviewed the report and declined it as outside their threat model. Their position: accepting "Yes, I trust this folder" constitutes consent to the full project configuration, and post-trust-dialog execution is the boundary functioning as designed. The disagreement is about the dialog itself — the v2.1+ version doesn't say what it's asking permission for, and three project-scoped settings can silently spawn arbitrary executables behind it.

For platform and security teams, the practical advice is: pin Claude Code to v2.0.x on developer endpoints until the trust dialog is restored, run Claude Code in a sandbox on CI (deny outbound by default, allowlist specific destinations), and add a pre-clone lint that fails the build if .claude/settings.json or .mcp.json exists in a PR from a non-trusted maintainer.

The Take

This is the most important security story of June 2026, and most teams will read past it. TrustFall is a class-level convention, not a single bug. The reason it matters more than any specific CVE is that every AI coding tool is now a CI primitive, and CI is where the secrets are.

Three things I'd do today:

  • **Block .claude/settings.json, .mcp.json, and claude_desktop_config.json from your dependency tree.** Treat them as code-execution manifests until the tools say otherwise.
  • Run coding agents in sandboxes with no outbound network by default. The PoC collector URL is real; egress filtering breaks it.
  • Push back on the threat model. "Yes, I trust this folder" consenting to an unsandboxed OS process is the security industry's version of "I have read and agree to the terms." It needs friction, not defaults.

The next Cursor / Windsurf / Copilot breach is going to look exactly like this. The only question is whether the runner was a developer machine or a CI box.

Quick Summary

Adversa AI's TrustFall disclosure shows that Claude Code, Cursor CLI, Gemini CLI, and Copilot CLI all auto-execute project-defined MCP servers when the folder trust prompt is accepted — and on CI runners, that prompt is skipped. The flaw is class-level, not vendor-specific. Anthropic's security team reviewed and declined as "design intent." Pin to v2.0.x on dev endpoints, sandbox on CI, and treat .mcp.json as a code-execution manifest.


Sources:

Related Dispatches