Back to Blog
Original

Cua: The Open-Source Framework Giving AI Agents Full Computer Access

Cua is an MIT-licensed infrastructure framework backed by Y Combinator that lets AI agents control desktop applications across macOS, Linux, and Windows — no APIs required. Here's how it works, how it compares to Browser-Use and OpenCUA, and why it matters for business automation in 2026.

23 June 20268 min read
Cua: The Open-Source Framework Giving AI Agents Full Computer Access

Cua: The Open-Source Framework Giving AI Agents Full Computer Access

Last Updated: June 23, 2026

AI agents can browse the web. They can write code. But until recently, they couldn't do what every office worker does daily — open a desktop app, click through a interface, fill out forms, and complete real work across multiple applications. Cua changes that.

What Is Cua?

Cua is an open-source infrastructure framework that lets AI agents control full desktop environments across macOS, Linux, and Windows. Backed by Y Combinator (Spring 2025 batch) with $500K in seed funding from 468 Capital, Orange Collective, and Script Capital, Cua provides the sandboxes, SDKs, and benchmarks needed to build, train, and evaluate computer-use agents. With over 18,700 GitHub stars, it's one of the fastest-growing open-source AI infrastructure projects of 2026.

The key innovation is simple but powerful: agents interact with computer interfaces the same way humans do — by seeing screens, clicking buttons, typing text, and navigating applications. No APIs required. No integrations to build. If a human can use the software, a Cua-powered agent can too.

How Cua Works: Three Core Components

Cua's architecture is built around three pillars that work together to enable full computer control.

Cua Driver is the background computer-use layer. It lets agents click, type, scroll, and inspect accessibility trees without taking over the user's cursor or focus. This means agents can work alongside humans on the same machine — a critical feature for real-world deployment. Cua Driver includes both an MCP (Model Context Protocol) server and a CLI, so it integrates directly with Claude Code, Cursor, Codex, OpenClaw, and custom agent frameworks.

Cua Sandbox provides isolated, high-performance virtual environments where agents operate safely. These sandboxes run on Linux containers, Linux VMs, macOS VMs, Windows VMs, and even Android environments — all controlled through a single Python API. On Apple Silicon, sandboxes achieve up to 97% native CPU speed, meaning agents run at near-real-time performance.

Cua Bench delivers benchmarks and reinforcement learning environments for evaluating computer-use agents. It supports OSWorld, ScreenSpot, Windows Arena, and custom task evaluation, with trajectory export for model training. This makes Cua not just a runtime but a full research and development platform.

Why Cua Matters for Business Automation

The practical implications for growing businesses are significant. Traditional automation requires APIs, integrations, or brittle RPA scripts that break when interfaces change. Cua-powered agents adapt dynamically — they can see when a button moves, recognize new UI elements, and adjust their approach without reprogramming.

Real-world use cases include:

  • Legacy system automation — Agents can operate enterprise software that has no API, no export function, and no modern integration path
  • CAD and design tools — A CAD copilot can manipulate complex engineering software through visual interaction
  • Data collection across platforms — Agents navigate multiple disconnected systems to gather and consolidate data
  • Form-heavy workflows — Insurance claims, compliance filings, government tenders — any process that requires clicking through complex interfaces
  • Cross-application orchestration — Agents move data between apps that were never designed to talk to each other

For businesses stuck with "we'd automate this if the software had an API," Cua removes that constraint entirely.

Cua vs Browser-Use vs OpenCUA: How They Compare

The computer-use agent landscape has three major open-source players, each with different strengths.

Cua (trycua/cua) focuses on infrastructure — sandboxes, drivers, and evaluation. It's the most production-ready option for deploying agents that need to operate desktop applications in real environments. Its background computer-use driver (works without stealing cursor focus) and MCP integration make it uniquely suited for real-world deployment alongside human workers.

Browser-Use is the web-focused alternative. It gives AI agents autonomous browser control using Playwright under the hood. If your automation needs are purely web-based — navigating websites, filling web forms, scraping web data — Browser-Use is lighter weight and simpler to deploy. But it can't touch desktop applications.

OpenCUA (from xlang-ai) is the research-focused option. It includes AgentNet, a massive dataset of human computer-use demonstrations across Windows, macOS, and Ubuntu, plus AgentNetBench for evaluation. OpenCUA models (7B, 32B, and 72B parameter sizes) have achieved state-of-the-art performance on OSWorld benchmarks, sometimes outperforming proprietary models from OpenAI and Anthropic. It's ideal for teams training custom computer-use models.

The bottom line: Cua for production infrastructure, Browser-Use for web-only tasks, OpenCUA for model training and research.

Getting Started With Cua

Installing Cua takes minutes. The Cua Driver works on macOS, Windows, and Linux (pre-release):

macOS / Linux:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh)"

Windows (PowerShell):

irm https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.ps1 | iex

Wire it into Claude Code as an MCP server and your agent can drive the desktop in the background:

claude mcp add --transport stdio cua-driver -- cua-driver mcp

For the Python SDK:

pip install cua

Then build agents that see screens, click buttons, and complete tasks autonomously:

from cua import Sandbox, Image

async with Sandbox.ephemeral(Image.linux()) as sb:
    result = await sb.shell.run("echo hello")
    screenshot = await sb.screenshot()
    await sb.mouse.click(100, 200)
    await sb.keyboard.type("Hello from Cua!")

The same API works regardless of operating system — Linux, macOS, Windows, or Android. You can run sandboxes in the cloud via cua.ai or locally using QEMU.

The Bigger Picture: Computer-Use Agents in 2026

Cua exists within a rapidly maturing ecosystem. Anthropic's Computer Use API (Claude), OpenAI's Computer-Using Agent (Operator), and Microsoft's Copilot Studio computer-use tool all offer proprietary approaches to the same problem. Google's Project Mariner is exploring similar territory.

What makes Cua different is openness. The MIT-licensed framework means businesses can deploy computer-use agents on their own infrastructure, with full control over data and execution. No vendor lock-in. No per-action pricing. No black box.

For Australian businesses evaluating AI automation, this matters. The ability to run agents on-premise (or BYOC — bring your own cloud) addresses data sovereignty concerns that block many enterprise deployments. SOC 2 readiness and on-prem availability make it viable for regulated industries.

The computer-use agent market is where browser automation was five years ago — early, powerful, but requiring expertise to deploy effectively. The teams that figure it out now will have a 12-18 month head start on competitors still waiting for APIs that may never come.

Should You Build With Cua?

Build with Cua if:

  • You need to automate desktop applications without APIs
  • You want agents that work alongside human operators
  • You need on-premise or BYOC deployment
  • You're building custom agent workflows with MCP-compatible tools

Wait if:

  • Your automation is purely web-based (Browser-Use is simpler)
  • You need production stability on Linux (still in pre-release)
  • You don't have Python development capability

Cua represents a fundamental shift in what's automatable. The question isn't whether computer-use agents will become standard — it's whether your business will be early or late to adopt them.


Want to explore what computer-use agents could do for your business? Flowtivity builds custom AI agent solutions for growing Australian companies. Get in touch for a prototype built around your specific workflow.

Want AI insights for your business?

Get a free AI readiness scan and discover automation opportunities specific to your business.