Observe and interact with any UI, from the command line.

A desktop automation CLI designed for AI agents. Reads the OS accessibility tree, targets elements with CSS-like selectors, and exposes click, type, scroll, and screenshot as shell commands. macOS, Linux, and Windows. Native Rust, single binary.

macOS · Linux · Windows MIT licensed

Install

# via cargo
cargo install agent-desktop

# verify
agent-desktop --version

Usage

Add the following to your agent's system prompt or instructions:

Prompt
You have access to agent-desktop, a CLI for desktop automation via accessibility APIs. Run agent-desktop --help to see available commands. Use it to observe, click, type, scroll, and screenshot any application on the user's desktop. Start by observing the screen to understand what's visible, then interact with elements using their IDs or CSS-like selectors.

Example

# List all running apps
agent-desktop observe

# Get the accessibility tree for a specific app
agent-desktop observe --app Safari

# Filter with CSS-like selectors
agent-desktop observe --app Safari --query 'button[name="OK"]'

# Interact
agent-desktop click --app Safari --query 'button[name="OK"]'
agent-desktop type --text "hello world"
agent-desktop screenshot --output /tmp/screen.png

How it works

  1. observe — query the accessibility tree, get structured element data back.
  2. Use element IDs or selectors to click, type, scroll, or send a keystroke.
  3. Re-observe after each action to get updated state.

Commands

observe
List apps, read the accessibility tree, filter with selectors, show role distribution.
click
By element, selector, or absolute coordinates.
type
At cursor, or into a specific element via --query.
scroll
Direction, amount, target element.
key
Single key or combo, e.g. cmd+n.
focus
Bring an app to the foreground.
read
Element text or value; also the system clipboard.
wait
Block until a selector resolves.
interact
Invoke a native accessibility action directly.
screenshot
Write a PNG to disk.

Selectors

The --query flag accepts CSS-like selectors for targeting elements in the accessibility tree.

By role

agent-desktop observe --app Safari --query 'button'
agent-desktop observe --app Safari --query 'text_field'
agent-desktop observe --app Safari --query 'menu_item'

By attribute

Supported attributes: name, value, description, role.

[name="Submit"]
Exact match on name.
[name*="addr"]
Substring match (case-insensitive).
[name^="addr"]
Starts-with match (case-insensitive).
[value="foo"]
Match by value attribute.

Combinators

toolbar > text_field
Direct child — text_field must be an immediate child of toolbar.
toolbar text_field
Descendant — text_field anywhere inside toolbar.

Nth matching

# Click the 2nd button (1-based index)
agent-desktop click --app Safari --query 'button:nth(2)'

Combined

# Text field named "Address" inside a toolbar
agent-desktop observe --app Safari --query 'toolbar > text_field[name*="Address"]'

# 3rd menu item with "File" in the name
agent-desktop click --app Finder --query 'menu_item[name^="File"]:nth(3)'

Platforms

macOS
arm64 · x64
Accessibility API: AXUIElement
Linux
x64
Accessibility API: AT-SPI2 via D-Bus
Windows
x64
Accessibility API: UI Automation

License

MIT. Source on GitHub.