← Back to Payloads
Open Source2026-06-16

Cua Is the Open-Source Computer-Use Agent Infrastructure Every Other Framework Is Going to Copy in the Next 6 Months

Every computer-use agent in 2026 — Agent S3, OpenCUA, Skyvern, the closed ones from Anthropic and OpenAI — is going to run on top of the same OS-level virtualization layer in a year. That layer is Cua, MIT-licensed, 17.8K stars, and shipping a macOS VM manager (Lume) that is the only thing on Apple Silicon that boots macOS Sequoia with near-native performance. The interesting story is not the wrapper. It is the sandboxes.
Quick Access
Install command
$ mrt install cua
Browse related skills
Cua Is the Open-Source Computer-Use Agent Infrastructure Every Other Framework Is Going to Copy in the Next 6 Months

Cua Is the Open-Source Computer-Use Agent Infrastructure Every Other Framework Is Going to Copy in the Next 6 Months

Hey guys, Mr. Technology here.

Every "computer-use agent" demo you have seen in 2026 — Agent S3 at 72.6% on OSWorld, the Anthropic and OpenAI CUA launches, the OpenCUA framework, the Skyvern pitch — runs against a desktop. The desktop is the bottleneck and the eval surface. The desktop is the part nobody talks about, and it decides whether a CUA is a research artifact or a product.

The team that owns the desktop layer owns the next year of computer-use agents. That team is trycua / Cua, MIT-licensed, ~17.8K GitHub stars, YC-backed, shipping the only open-source stack in 2026 that gives you a real, isolated macOS/Linux/Windows/Android desktop that any agent can drive through one Python API. The fact that you have not heard of it tells you the agent ecosystem is still mostly benchmark theater, not infrastructure.

What Cua Actually Is

Cua is not an agent. Cua is the infrastructure an agent sits on top of. Three pieces, all in the same monorepo:

1. Lume — a CLI for macOS/Linux VMs on Apple Silicon using Apple's own Virtualization.Framework. Single binary, near-native performance, no UTM, no Vagrant, no GUI. The best macOS VM story on Apple Silicon in 2026, and macOS is the eval surface nobody can replicate without it. 2. cua-sandbox / cua-computer-server — the SDK that lets an agent screenshot, click, type, scroll, run shell, and gesture on whatever VM or container you point it at. Same API for Linux containers, Linux VMs, macOS, Windows, and Android. 3. cua-bench — OSWorld, ScreenSpot, Windows Arena, plus trajectory export for training your own CUA. The OpenCUA paper is the research counterpart; Cua is the runtime.

The fourth piece is cua-driver — a background computer-use driver for macOS/Windows that lets Claude Code, Cursor, Codex, OpenClaw, or any MCP client drive a real desktop app without stealing the cursor or focus. Install as MCP server, agent runs in background, you keep working.

The API Is The Real Story

Every CUA framework gets the integration surface wrong. They ship a model, a screen-capture loop, and a clicker. They do not ship the part where you actually run a real desktop, isolated, reproducibly, at scale, against a benchmark or a production user. Cua does:

```python from cua import Sandbox, Image

async with Sandbox.ephemeral(Image.macos()) as sb: screenshot = await sb.screenshot() await sb.mouse.click(420, 312) await sb.keyboard.type("hello from an agent") result = await sb.shell.run("sw_vers") ```

That is the whole integration. The OS is a parameter — Linux container, Linux VM, macOS, Windows, Android, or a BYOI .qcow2/.iso — same Sandbox.ephemeral(Image.…) call. Behind it, Cua does the unsexy work every CUA paper glosses over: QEMU with KVM, a noVNC console on :8006, a shared storage volume, a VNC-to-MCP bridge for the agent, and full teardown on context exit. The 17.8K stars are not for a clever demo. They are because the unsexy part has been done right, in the open, MIT-licensed, with Apple Silicon as a first-class target.

The Cloud tier (cua.ai) wraps the same API for hosted macOS sandboxes — the only hosted macOS CUA eval surface outside of closed Anthropic/OpenAI programs. cua sb create --os macos --size small and you have a real macOS Sequoia instance driving an agent in under 90 seconds.

What Cua Threatens

  • Closed CUAs from frontier labs. Anthropic's CUA and OpenAI's Operator are model + harness + a hidden eval surface. Cua is model-agnostic with an open eval surface. Closed agents win on integration for now; they lose the second the open ecosystem agrees on Cua as the substrate.
  • Browser-Use and Playwright as the only CUA abstraction. Great for browser work. Does nothing for native macOS/Windows apps or Android. Cua does both.
  • E2B / Firecracker for agent sandboxes. E2B is a Linux container with a Python REPL. It is not a desktop. CUAs that need a desktop cannot live there.
  • The "OSWorld leaderboard" framing entirely. OSWorld measures Linux desktop. macOSWorld, MacArena (421 tasks, June 2026), and Windows Arena are the real surfaces. Cua is the only open-source framework that runs all on the same harness.

What Is Actually Wrong

Three things. Lume is Apple Silicon only — no Intel macOS, no x86, no Windows host. The Windows path is a real VM, not a sandbox — licensing-wise it is an evaluation ISO, not a production image. The Cloud tier is the obvious lock-in — the local path is the open path, the hosted path is the business. Read the FAQ.

My Take

The CUA conversation in 2026 is dominated by who hit what number on OSWorld. That is the wrong conversation. The right one is who owns the desktop substrate, because the model layer is commoditizing fast — Claude, GPT, Gemini, and Qwen 3 can all drive a screen at above-human reliability on narrow tasks. The desktop substrate is the moat. Cua is the only open-source project I have seen in 2026 that treats it as a first-class engineering problem. Lume is the best macOS VM story on Apple Silicon, cua-driver as an MCP server is the right coding-agent integration, cua-bench the right framework for evaluating your own CUA.

If you are building a computer-use agent in 2026, stop bolting together QEMU, VNC, Selenium, and a screenshot loop. Install Cua. pip install cua, lume run macos-sequoia-cua:latest, point your agent at the sandbox, ship the product. The desktop should already work. Cua is the part that works.

Mr. Technology


*Repo: github.com/trycua/cua — MIT, ~17.8K stars, YC-backed. Monorepo: cua-driver (background MCP computer-use for macOS/Windows), cua-agent (agent SDK), cua-sandbox / cua-computer-server (sandbox SDK), cua-bench (OSWorld, ScreenSpot, Windows Arena + RL envs), Lume (macOS/Linux VM manager on Apple Silicon), Lumier (Docker-compatible Lume interface). Cloud: cua.ai. License: MIT, with optional cua-agent[omni] including ultralytics (AGPL-3.0) and Microsoft OmniParser (CC-BY-4.0).*

Related Dispatches