Observe and interact with any UI, from the command line.
A desktop automation CLI designed for AI agents. Reads the OS accessibility tree, targets elements with CSS-like selectors, and exposes click, type, scroll, and screenshot as shell commands. macOS, Linux, and Windows. Native Rust, single binary.
Install
# via cargo
cargo install agent-desktop
# verify
agent-desktop --version
Usage
Add the following to your agent's system prompt or instructions:
Prompt
You have access to
agent-desktop, a CLI for desktop automation via accessibility APIs. Run agent-desktop --help to see available commands. Use it to observe, click, type, scroll, and screenshot any application on the user's desktop. Start by observing the screen to understand what's visible, then interact with elements using their IDs or CSS-like selectors.Example
# List all running apps
agent-desktop observe
# Get the accessibility tree for a specific app
agent-desktop observe --app Safari
# Filter with CSS-like selectors
agent-desktop observe --app Safari --query 'button[name="OK"]'
# Interact
agent-desktop click --app Safari --query 'button[name="OK"]'
agent-desktop type --text "hello world"
agent-desktop screenshot --output /tmp/screen.png
How it works
observe— query the accessibility tree, get structured element data back.- Use element IDs or selectors to click, type, scroll, or send a keystroke.
- Re-observe after each action to get updated state.
Commands
observeList apps, read the accessibility tree, filter with selectors, show role distribution.
clickBy element, selector, or absolute coordinates.
typeAt cursor, or into a specific element via
--query.scrollDirection, amount, target element.
keySingle key or combo, e.g.
cmd+n.focusBring an app to the foreground.
readElement text or value; also the system clipboard.
waitBlock until a selector resolves.
interactInvoke a native accessibility action directly.
screenshotWrite a PNG to disk.
Selectors
The --query flag accepts CSS-like selectors for targeting elements in the accessibility tree.
By role
agent-desktop observe --app Safari --query 'button'
agent-desktop observe --app Safari --query 'text_field'
agent-desktop observe --app Safari --query 'menu_item'
By attribute
Supported attributes: name, value, description, role.
[name="Submit"]Exact match on name.
[name*="addr"]Substring match (case-insensitive).
[name^="addr"]Starts-with match (case-insensitive).
[value="foo"]Match by value attribute.
Combinators
toolbar > text_fieldDirect child — text_field must be an immediate child of toolbar.
toolbar text_fieldDescendant — text_field anywhere inside toolbar.
Nth matching
# Click the 2nd button (1-based index)
agent-desktop click --app Safari --query 'button:nth(2)'
Combined
# Text field named "Address" inside a toolbar
agent-desktop observe --app Safari --query 'toolbar > text_field[name*="Address"]'
# 3rd menu item with "File" in the name
agent-desktop click --app Finder --query 'menu_item[name^="File"]:nth(3)'
Platforms
macOS
arm64 · x64
Accessibility API: AXUIElement
Linux
x64
Accessibility API: AT-SPI2 via D-Bus
Windows
x64
Accessibility API: UI Automation
License
MIT. Source on GitHub.