PhoneAgent
Use this workflow to drive iOS or Android UI through PhoneAgent's JSON-RPC bridge.
All shell commands below assume you are in the repo root:
cd "$(git rev-parse --show-toplevel)"
Start the RPC bridge
- •Choose a platform bridge (both listen on
127.0.0.1:45678by default).
# iOS (XCTest-hosted bridge) ./.agents/skills/phoneagent/scripts/start_rpc_bridge_local.sh # Android (adb bridge; emulator or physical device) ./.agents/skills/phoneagent/scripts/start_android_rpc_bridge_local.sh
Notes:
- •
start_rpc_bridge_local.shis interactive and will show a numbered list of iOS devices/simulators. Enter the number for the destination you want. - •
start_rpc_bridge_local.shstarts a localhost-only forwarder. - •On Xcode "Connect via network", it uses the CoreDevice tunnel automatically (no extra deps).
- •For USB fallback forwarding, install
pymobiledevice3into a local venv:python3 -m venv .venv && ./.venv/bin/python -m pip install -U pip && ./.venv/bin/python -m pip install pymobiledevice3 - •
start_android_rpc_bridge_local.shusesadb; if multiple devices are connected it prompts for the serial.
- •Keep the bridge process running.
- •Wait for
PHONEAGENT_RPC_READY ...in logs before sending RPC calls. - •Confirm socket readiness before first RPC:
./.agents/skills/phoneagent/scripts/rpc.py get-tree >/dev/null && echo rpc-ready
Resolve host and port
- •Always use
127.0.0.1:45678as the RPC endpoint (orrpc.py --port <port>if customized).
Notes:
- •Both bridges are localhost-only.
- •iOS physical-device flow uses a localhost forwarder.
- •If you need to forward manually, first get a device UDID via
xcrun devicectl list devices, then run:python3 ./.agents/skills/phoneagent/scripts/forward_rpc_localhost.py --udid <UDID>(binds127.0.0.1:45678)
Send RPC calls
Use the helper CLI:
# iOS bundle identifier
./.agents/skills/phoneagent/scripts/rpc.py open-app com.apple.Preferences
# Android package name
./.agents/skills/phoneagent/scripts/rpc.py open-app com.android.settings
./.agents/skills/phoneagent/scripts/rpc.py get-tree | head
# Use coordinates copied from the tree (XCUI frame string).
./.agents/skills/phoneagent/scripts/rpc.py enter-text \
--coordinate '{{33.0, 861.0}, {364.0, 38.0}}' \
--text 'Display'
./.agents/skills/phoneagent/scripts/rpc.py tap-element \
--coordinate '{{37.7, 969.7}, {199.7, 29.0}}'
Core operating loop
- •Call
get_tree. - •Identify the best target element in the tree (label/identifier) and copy its frame coordinate string.
- •Prefer coordinate-based actions (
tap_element/enter_text). - •Use the returned
treefrom the action response to verify the UI changed as expected. - •Repeat until complete.
- •When the task is complete, always capture a screenshot for the user:
- •Prefer
get_contextand writeresult.screenshot_base64to a PNG (or use./.agents/skills/phoneagent/scripts/rpc.py get-screen-image, which writes PNG files to/tmp/phoneagent-artifacts). - •Include the PNG path in your final message so the user can open it.
- •Prefer
Use swipe to reveal off-screen content, then use the returned tree (or call get_tree if needed).
Use one request at a time per server. Do not fire concurrent batches.
Split long keyboard input into chunks; do not send giant enter_text payloads in one call.
RPC method reference
All RPC requests are newline-delimited JSON objects with this shape:
{"id":1,"method":"<method>","params":{...}}
All success responses look like:
{"id":1,"result":{...}}
get_tree
- •Does: Returns the accessibility tree of the currently focused app.
- •Params: none.
- •Returns:
{"tree": "<string>"}
Example:
{"id":1,"method":"get_tree","params":{}}
get_screen_image
- •Does: Captures the current screen as a base64-encoded PNG plus image dimensions (when available).
- •Params: none.
- •Returns:
{"screenshot_base64":"<base64>","metadata":{"width":<number>,"height":<number>}}
Example:
{"id":2,"method":"get_screen_image","params":{}}
get_context
- •Does: Convenience method that returns both the current accessibility tree and the current screen image.
- •Params: none.
- •Returns:
{"tree":"<string>","screenshot_base64":"<base64>","metadata":{"width":<number>,"height":<number>}}
Example:
{"id":3,"method":"get_context","params":{}}
open_app
- •Does: Brings the specified app to the foreground (and makes it the focused app for subsequent calls).
- •Params:
bundle_identifier(string, required).- •iOS: pass bundle identifier (example
com.apple.Preferences). - •Android: pass package name (example
com.android.settings).
- •iOS: pass bundle identifier (example
- •Returns:
{"bundle_identifier":"<string>", "tree":"<string>"}(Android also includespackage_name).
Example:
{"id":4,"method":"open_app","params":{"bundle_identifier":"com.apple.Preferences"}}
tap
- •Does: Taps an absolute point in the current app.
- •Params:
x(number, required),y(number, required). Coordinates are in absolute screen points as reported by the tree. - •Returns:
{"tree":"<string>"}
Example:
{"id":5,"method":"tap","params":{"x":120,"y":300}}
tap_element
- •Does: Taps the center of an element using its XCUI frame string from the accessibility tree.
- •Params:
- •
coordinate(string, required). Must look like{{x, y}, {w, h}}(copied from the tree). - •
count(integer, optional; default 1). Use 2 for double-tap. - •
longPress(boolean, optional; default false). When true, performs a long-press gesture. - •Returns:
{"coordinate":"<string>", "count":<number>, "longPress":<bool>, "tree":"<string>"}
Example:
{"id":6,"method":"tap_element","params":{"coordinate":"{{20.0, 165.0}, {390.0, 90.0}}","count":1,"longPress":false}}
enter_text
- •Does: Taps the center of the target element (to focus it), waits briefly for the keyboard, then types the provided text followed by a newline (Return).
- •Params:
- •
coordinate(string, required). Must look like{{x, y}, {w, h}}(copied from the tree). - •
text(string, required). - •Returns:
{"coordinate":"<string>", "tree":"<string>"}
Example:
{"id":7,"method":"enter_text","params":{"coordinate":"{{33.0, 861.0}, {364.0, 38.0}}","text":"hello"}}
scroll
- •Does: Scrolls by dragging from a starting point by the provided deltas.
- •Params:
x(number, required),y(number, required),distanceX(number, required),distanceY(number, required). - •Returns:
{"tree":"<string>"}
Example:
{"id":8,"method":"scroll","params":{"x":215,"y":760,"distanceX":0,"distanceY":-460}}
swipe
- •Does: Swipes in a direction starting from a given point (implemented as a bounded drag gesture).
- •Params:
x(number, required),y(number, required),direction(string, required; one ofup,down,left,right). - •Returns:
{"tree":"<string>"}
Example:
{"id":9,"method":"swipe","params":{"x":215,"y":760,"direction":"up"}}
stop
- •Does: Stops the RPC server test (ends the
xcodebuild testsession). - •Params: none.
- •Returns:
{}
Example:
{"id":10,"method":"stop","params":{}}
iOS app bundle IDs
- •Settings:
com.apple.Preferences - •Camera:
com.apple.camera - •Photos:
com.apple.mobileslideshow - •Messages:
com.apple.MobileSMS - •Home Screen:
com.apple.springboard
Android package names
- •Settings:
com.android.settings - •Camera (AOSP):
com.android.camera2 - •Photos (Google):
com.google.android.apps.photos - •Messages (Google):
com.google.android.apps.messaging - •Home Screen: launcher package varies by emulator/device
Recovery playbook
- •If RPC hangs after
open_app, restart the test-hosted server and retry with a known-good bundle id. - •If taps fail due stale UI, call
get_treeagain and recalculate target. - •If iOS bridge becomes unresponsive, stop/restart
xcodebuild testand resume from latest verified app state. - •If Android bridge becomes unresponsive, restart
adb(adb kill-server && adb start-server), relaunch the bridge, and retry.
End session
- •Send
stoponly when the task is complete. - •If
stopis not sent, terminate thexcodebuildsession manually.