Android Device Control
CLI for Android device control via ADB. Requires device connected with USB debugging enabled.
Prerequisites
Before using this skill:
- •Android device with USB debugging enabled
- •ADB installed (
adb versionshould work) - •Device connected via USB or emulator running
Agent Installation (Autonomous Setup)
When this skill is not installed, agents should:
- •
Run installer (recommended):
bashcurl -fsSL https://raw.githubusercontent.com/iurysza/android-use/main/install.sh | bash
- •
Installed structure:
- •
~/.config/opencode/skill/android-use/SKILL.md - •
~/.config/opencode/skill/android-use/scripts/android-use - •
~/.config/opencode/skill/android-use/references/ - •
~/.config/opencode/skill/android-use/assets/ - •
~/.config/opencode/skill/android-use/repo/
- •
- •
Verify installation:
bash~/.config/opencode/skill/android-use/scripts/android-use check-device
Quick Start
android-use check-device # List connected devices android-use get-screen # Get UI hierarchy (compact JSON) android-use tap 540 960 # Tap at coordinates android-use type-text "Hello" # Type text android-use key HOME # Press key android-use screenshot ./screen.png # Capture screen
Core Agent Workflow
STEP 1: Check device
android-use check-device
- •Lists all connected devices
- •Note the serial number if multiple devices
- •For "phone" use [PHYSICAL] device
- •For "emulator" use [EMULATOR] device
STEP 2: Get screen state
android-use get-screen
- •Returns compact JSON with pre-calculated tap coordinates
- •Search for:
text,contentDesc,resourceId - •Use the
centerfield for coordinates (e.g.,"center": [540, 289]) - •Cache location:
/tmp/.ai-artifacts/skills/android-use/screen.json
STEP 3: Execute action
- •Tap:
android-use tap <center_x> <center_y> - •Type:
android-use tap <field_x> <field_y>thenandroid-use type-text "text" - •Swipe:
android-use swipe <x1> <y1> <x2> <y2> [duration_ms] - •Key:
android-use key <KEY_NAME>(HOME, BACK, ENTER, etc.)
STEP 4: Verify and repeat
- •Run
get-screenagain to verify state change - •Handle any dialogs that appeared
- •Repeat until goal achieved
Commands
Device & Screen
| Command | Args | Description |
|---|---|---|
check-device | [serial] | List/verify connected devices |
wake | [serial] | Wake device + dismiss lock |
get-screen | [serial] | Dump UI accessibility tree (compact JSON) |
screenshot | [path] [serial] | Capture screen image |
Input Actions
| Command | Args | Description |
|---|---|---|
tap | <x> <y> [serial] | Tap at coordinates |
type-text | <text> [serial] | Type text string |
swipe | <x1> <y1> <x2> <y2> [ms] [serial] | Swipe gesture |
key | <keycode|name> [serial] | Press key |
App Management
| Command | Args | Description |
|---|---|---|
launch-app | <package> [serial] | Launch app by package name |
install-apk | <path> [serial] | Install APK file |
Global Options
| Option | Description |
|---|---|
-s, --serial <id> | Target specific device |
--json | Output as JSON |
--verbose | Verbose logging |
--timeout <ms> | Timeout (default: 15000) |
--adb-path <path> | Path to ADB binary |
--full | Full XML output (for get-screen) |
Common App Package Names
Use these package names with launch-app:
| App Name | Package Name |
|---|---|
| Chrome | com.android.chrome |
| Settings | com.android.settings |
| Phone / Dialer | com.android.dialer |
| Messages / SMS | com.google.android.apps.messaging |
| Camera | com.android.camera |
| Photos | com.google.android.apps.photos |
| Gmail | com.google.android.gm |
| Maps | com.google.android.apps.maps |
| YouTube | com.google.android.youtube |
| Play Store | com.android.vending |
| Calendar | com.google.android.calendar |
| Clock | com.google.android.deskclock |
| Calculator | com.google.android.calculator |
| Contacts | com.android.contacts |
| Files | com.google.android.documentsui |
com.whatsapp | |
com.instagram.android | |
com.facebook.katana | |
| Twitter / X | com.twitter.android |
| Spotify | com.spotify.music |
| Netflix | com.netflix.mediaclient |
| Telegram | org.telegram.messenger |
| Discord | com.discord |
| Slack | com.Slack |
| Zoom | us.zoom.videomeetings |
| Teams | com.microsoft.teams |
| Outlook | com.microsoft.office.outlook |
| Drive | com.google.android.apps.docs |
| Keep / Notes | com.google.android.keep |
com.reddit.frontpage | |
| Bluesky | xyz.blueskyweb.app |
Reading Screen Data
Compact JSON (default):
get-screen outputs pre-calculated tap coordinates and filtered elements (99% smaller than XML):
{
"elements": [
{
"text": "Settings",
"resourceId": "com.android.settings:id/title",
"contentDesc": "",
"clickable": true,
"scrollable": false,
"focused": false,
"bounds": [42, 234, 1038, 345],
"center": [540, 289]
}
],
"clickable": [...],
"scrollable": [...],
"withText": [...],
"withContentDesc": [...]
}
Key attributes:
- •
text- Visible text - •
contentDesc- Accessibility description (icons) - •
resourceId- Element identifier - •
clickable/scrollable- Interaction states - •
center- Pre-calculated tap coordinates[x, y] - •
bounds- Original bounds[left, top, right, bottom]
Use pre-calculated center:
- •Already calculated:
tap 540 289(no manual math needed)
Full XML (when needed):
- •Use
get-screen --fullfor raw XML - •Saves to
/tmp/.ai-artifacts/skills/android-use/screen.xml
Cache location (memory-backed):
- •
/tmp/.ai-artifacts/skills/android-use/screen.json(compact) - •
/tmp/.ai-artifacts/skills/android-use/screen.xml(full)
Key Names
HOME, BACK, MENU, POWER, ENTER, TAB, DEL, ESCAPE, VOLUME_UP, VOLUME_DOWN, DPAD_UP, DPAD_DOWN, DPAD_LEFT, DPAD_RIGHT
Multi-Device Support
When multiple devices connected:
- •Run
check-deviceto see all devices with types - •User says "phone/physical" -> use
[PHYSICAL]device - •User says "emulator" -> use
[EMULATOR]device - •Pass
-s <serial>to ALL subsequent commands
android-use check-device # Multiple devices connected (2): # [PHYSICAL] 1A051FDF6007PA - Pixel 6 # [EMULATOR] emulator-5554 - sdk_gphone64_arm64 android-use -s 1A051FDF6007PA get-screen android-use -s 1A051FDF6007PA tap 540 960
Common Patterns
Tap a button
android-use get-screen # Get JSON with pre-calculated centers # Search JSON for button, find center: [540, 289] android-use tap 540 289 # Tap at center
Enter text in field
android-use tap 540 184 # Focus field android-use type-text "search term" android-use key ENTER # Submit
Scroll to find content
android-use get-screen # Check if visible android-use swipe 540 1500 540 500 # Swipe up (scroll down) android-use get-screen # Check again
Handle dialogs
# Look for "OK", "Allow", "Accept" buttons in XML android-use tap <button-center> # Or dismiss with back android-use key BACK
Open app and navigate
android-use launch-app com.android.chrome android-use get-screen # Find URL bar, tap it android-use tap 540 184 android-use type-text "example.com" android-use key ENTER
Agent Examples
See the references/ directory for:
- •
AGENTS_GETTING_STARTED.md- Setup and command guide - •
AGENT_WORKFLOWS.md- End-to-end agent workflows - •
01-taking-a-screenshot.md- Capture and verify screenshots - •
02-opening-an-app.md- App launch and navigation - •
03-tapping-a-button.md- Tap flow from screen JSON - •
04-filling-a-form.md- Multi-step form input pattern - •
05-scrolling-to-find-content.md- Scroll/search interaction loop - •
06-handling-dialogs.md- Popup detection and dismissal
JSON Output
Use --json for structured output:
android-use --json check-device
{
"success": true,
"exitCode": 0,
"data": { "devices": [...], "count": 1 },
"message": "Found 1 device(s)",
"trace": { ... }
}
Error Handling
- •No device: Check USB, verify USB debugging enabled, accept "Allow USB debugging?" prompt
- •Element not found: Get fresh screen dump, try scrolling
- •Action didn't work: Add minimal delay (300ms max) only if retrying, then verify coordinates, check for popups/dialogs, get fresh screen dump
- •Device offline: Reconnect USB, run
adb kill-server && adb start-server
Agent Best Practices
- •Always get-screen first - understand current UI state
- •No artificial delays needed - Commands execute synchronously; UI is ready for next command immediately
- •Check your work - get-screen after each action to verify
- •Use screenshots - when JSON doesn't capture enough info
- •Be consistent - use same serial for all commands in session
- •Compact JSON default - 99% smaller, pre-calculated tap coords, cached to
/tmp/.ai-artifacts/skills/android-use/ - •Handle dialogs - popups often block interactions
- •Use center coordinates - from JSON output, no manual calculation needed
Compact JSON Filter Logic
The compact JSON includes elements with ANY of:
- •Non-empty
text(visible labels) - •Non-empty
contentDesc(accessibility descriptions) - •
clickable = true(interactive elements) - •
scrollable = true(scrollable containers)
This filters ~336 raw nodes → ~55 useful elements (4x smaller)
Repository
- •GitHub: https://github.com/iurysza/android-use
- •Issues: https://github.com/iurysza/android-use/issues
- •References: See
references/folder