Mobile Automation with agent-device
For agent-driven exploration: use refs. For deterministic replay scripts: use selectors.
Quick start
agent-device open Settings --platform ios agent-device snapshot -i agent-device click @e3 agent-device wait text "Camera" agent-device alert wait 10000 agent-device fill @e5 "test" agent-device close
If not installed, run:
npx -y agent-device
Core workflow
- •Open app or deep link:
open [app|url](openhandles target selection + boot/activation in the normal flow) - •Snapshot:
snapshotto get refs from accessibility tree - •Interact using refs (
click @ref,fill @ref "text") - •Re-snapshot after navigation/UI changes
- •Close session when done
Commands
Navigation
agent-device boot # Ensure target is booted/ready without opening app agent-device boot --platform ios # Boot iOS simulator agent-device boot --platform android # Boot Android emulator/device target agent-device open [app|url] # Boot device/simulator; optionally launch app or deep link URL agent-device open [app] --relaunch # Terminate app process first, then launch (fresh runtime) agent-device open [app] --activity com.example/.MainActivity # Android: open specific activity (app targets only) agent-device open "myapp://home" --platform android # Android deep link agent-device open "https://example.com" --platform ios # iOS simulator deep link agent-device close [app] # Close app or just end session agent-device reinstall <app> <path> # Uninstall + install app in one command agent-device session list # List active sessions
boot requires either an active session or an explicit selector (--platform, --device, --udid, or --serial).
boot is a fallback, not a regular step: use it when starting a new session only if open cannot find/connect to an available target.
Snapshot (page analysis)
agent-device snapshot # Full XCTest accessibility tree snapshot agent-device snapshot -i # Interactive elements only (recommended) agent-device snapshot -c # Compact output agent-device snapshot -d 3 # Limit depth agent-device snapshot -s "Camera" # Scope to label/identifier agent-device snapshot --raw # Raw node output agent-device snapshot --backend xctest # default: XCTest snapshot (fast, complete, no permissions) agent-device snapshot --backend ax # macOS Accessibility tree (fast, needs permissions, less fidelity, optional)
XCTest is the default: fast and complete and does not require permissions. Use it in most cases and only fall back to AX when something breaks.
Find (semantic)
agent-device find "Sign In" click agent-device find text "Sign In" click agent-device find label "Email" fill "user@example.com" agent-device find value "Search" type "query" agent-device find role button click agent-device find id "com.example:id/login" click agent-device find "Settings" wait 10000 agent-device find "Settings" exists
Settings helpers (simulators)
agent-device settings wifi on agent-device settings wifi off agent-device settings airplane on agent-device settings airplane off agent-device settings location on agent-device settings location off
Note: iOS wifi/airplane toggles status bar indicators, not actual network state. Airplane off clears status bar overrides.
App state
agent-device appstate agent-device apps --metadata --platform ios agent-device apps --metadata --platform android
Interactions (use @refs from snapshot)
agent-device click @e1 agent-device focus @e2 agent-device fill @e2 "text" # Clear then type (Android: verifies value and retries once on mismatch) agent-device type "text" # Type into focused field without clearing agent-device press 300 500 # Tap by coordinates agent-device press 300 500 --count 12 --interval-ms 45 agent-device press 300 500 --count 6 --hold-ms 120 --interval-ms 30 --jitter-px 2 agent-device swipe 540 1500 540 500 120 agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-pong agent-device long-press 300 500 800 # Long press (where supported) agent-device scroll down 0.5 agent-device pinch 2.0 # Zoom in 2x (iOS simulator) agent-device pinch 0.5 200 400 # Zoom out at coordinates (iOS simulator) agent-device back agent-device home agent-device app-switcher agent-device wait 1000 agent-device wait text "Settings" agent-device is visible 'id="settings_anchor"' # selector assertions for deterministic checks agent-device is text 'id="header_title"' "Settings" agent-device alert get
Get information
agent-device get text @e1 agent-device get attrs @e1 agent-device screenshot out.png
Deterministic replay and updating
agent-device open App --relaunch # Fresh app process restart in the current session agent-device open App --save-script # Save session script (.ad) on close agent-device replay ./session.ad # Run deterministic replay from .ad script agent-device replay -u ./session.ad # Update selector drift and rewrite .ad script in place
replay reads .ad recordings.
--relaunch controls launch semantics; --save-script controls recording. Combine only when both are needed.
Trace logs (AX/XCTest)
agent-device trace start # Start trace capture agent-device trace start ./trace.log # Start trace capture to path agent-device trace stop # Stop trace capture agent-device trace stop ./trace.log # Stop and move trace log
Devices and apps
agent-device devices agent-device apps --platform ios agent-device apps --platform android # default: launchable only agent-device apps --platform android --all agent-device apps --platform android --user-installed
Best practices
- •
presssupports gesture series controls:--count,--interval-ms,--hold-ms,--jitter-px. - •
swipesupports coordinate + timing controls and repeat patterns:swipe x1 y1 x2 y2 [durationMs] --count --pause-ms --pattern. - •
swipetiming is platform-safe: Android uses requested duration; iOS uses normalized safe timing to avoid long-press side effects. - •Pinch (
pinch <scale> [x y]) is currently supported on iOS simulators only. - •Snapshot refs are the core mechanism for interactive agent flows.
- •Use selectors for deterministic replay artifacts and assertions (e.g. in e2e test workflows).
- •Prefer
snapshot -ito reduce output size. - •On iOS,
xctestis the default and does not require Accessibility permission. - •If XCTest returns 0 nodes (foreground app changed), agent-device falls back to AX when available.
- •
open <app|url>can be used within an existing session to switch apps or open deep links. - •
open <app>updates session app bundle context; URL opens do not set an app bundle id. - •Use
open <app> --relaunchduring React Native/Fast Refresh debugging when you need a fresh app process without ending the session. - •If AX returns the Simulator window or empty tree, restart Simulator or use
--backend xctest. - •Use
--session <name>for parallel sessions; avoid device contention. - •Use
--activity <component>on Android to launch a specific activity (e.g. TV apps with LEANBACK); do not combine with URL opens. - •iOS deep-link opens are simulator-only in v1.
- •Use
fillwhen you want clear-then-type semantics. - •Use
typewhen you want to append/enter text without clearing. - •On Android, prefer
fillfor important fields; it verifies entered text and retries once when IME reorders characters. - •If using deterministic replay scripts, use
replay -uduring maintenance runs to update selector drift in replay scripts. Use plainreplayin CI.