AgentSkillsCN

Windows UI Automation Control

通过 UiAutomationGRPC.Server,以“看→思→行”的循环模式,高效驱动 LLM 实现 UI 自动化,从而操控 Windows 应用程序。

SKILL.md
--- frontmatter
name: Windows UI Automation Control
description: Control Windows applications using UiAutomationGRPC.Server with the "See → Think → Act" loop for efficient LLM-driven UI automation.

Windows UI Automation Control

This skill enables you to control Windows desktop applications through a gRPC-based automation server. It uses Approach 2: App Structure (LLM-Friendly) for efficient "See → Think → Act" loops.

Prerequisites

  • UiAutomationGRPC.Server running (default: localhost:50051)
  • grpccurl installed (for direct gRPC calls)

Security Configuration

The server supports two security modes. You must configure grpccurl accordingly.

Insecure Mode (Development/Testing)

When the server is running with Security.Enabled: false (default for development):

bash
# Use -plaintext flag for HTTP connections
grpccurl -plaintext -d '{"app_name": "calc"}' localhost:50051 UiAutomation.UiAutomationService/GetAppStructure

Secure Mode (Production)

When the server is running with Security.Enabled: true:

bash
# Use HTTPS (no -plaintext flag) + TLS certificate verification
grpccurl -cacert server.crt -d '{"app_name": "calc"}' localhost:50051 UiAutomation.UiAutomationService/GetAppStructure

# Or skip certificate verification (not recommended for production)
grpccurl -insecure -d '{"app_name": "calc"}' localhost:50051 UiAutomation.UiAutomationService/GetAppStructure

Token Authentication

When the server has Security.TokenAuthEnabled: true, include the authorization header:

bash
# Add -H for Authorization header with Bearer token
grpccurl -plaintext \
  -H "Authorization: Bearer YOUR_SECRET_TOKEN" \
  -d '{"app_name": "calc"}' \
  localhost:50051 UiAutomation.UiAutomationService/GetAppStructure

Combined Secure + Token Auth:

bash
grpccurl -insecure \
  -H "Authorization: Bearer YOUR_SECRET_TOKEN" \
  -d '{"app_name": "calc"}' \
  localhost:50051 UiAutomation.UiAutomationService/GetAppStructure

Note: If you receive Unauthenticated errors, verify:

  1. The token matches one in the server's Security.ValidTokens array
  2. The header format is exactly Authorization: Bearer <token>

The Loop: See → Think → Act

code
┌─────────────────────────────────────────────────────────┐
│  1. SEE    → GetAppStructure (get full UI as JSON)      │
│  2. THINK  → Analyze JSON, find target element UniqId   │
│  3. ACT    → PerformActionWithStructure (action + new UI)│
│  4. REPEAT → Response includes updated UI, continue     │
└─────────────────────────────────────────────────────────┘

Commands

Get App Structure (SEE)

Retrieve the complete UI tree of an application.

bash
grpccurl -plaintext -d '{"app_name": "calc"}' localhost:50051 UiAutomation.UiAutomationService/GetAppStructure

Parameters:

ParameterDescription
app_nameProcess name (e.g., "calc", "notepad")
process_idAlternative: use PID instead
use_process_idSet true to use PID lookup

Returns: json_structure containing the UI tree.

Understanding AppNode

json
{
  "UniqId": "42,12345",           // ← Use this for actions
  "Name": "Five",                 // Display name
  "UiAutomationId": "num5Button", // Stable identifier
  "ControlType": "ControlType.Button",
  "BoundingRectangle": "x,y,w,h",
  "IsClickable": true,
  "IsVisible": true,
  "Children": [ ... ]
}

Perform Action with Structure (ACT)

This is the key method for the loop - performs an action AND returns the updated UI structure.

bash
grpccurl -plaintext -d '{"runtime_id": "YOUR_UNIQ_ID", "action": 9}' localhost:50051 UiAutomation.UiAutomationService/PerformActionWithStructure

Parameters:

ParameterDescription
runtime_idThe UniqId from the JSON
actionAction code (see table below)
argumentsOptional string array

Action Reference

ActionCodeUse CaseArguments
INVOKE0Default action (buttons)-
TOGGLE1Checkboxes, switches-
SET_VALUE4Type text["text"]
SET_FOCUS5Focus element-
MoveTo8Move mouse to element-
LeftClick9Click (recommended)-
RightClick10Right-click-
DoubleClick17Double-click-

Example: Calculator 9 × 9

bash
# 1. Open calculator (if not running)
grpccurl -plaintext -d '{"app_name": "calc"}' localhost:50051 UiAutomation.UiAutomationService/OpenApp

# 2. Get structure → Find "Nine" button UniqId
grpccurl -plaintext -d '{"app_name": "calc"}' localhost:50051 UiAutomation.UiAutomationService/GetAppStructure

# 3. Click "9" → Returns updated structure
grpccurl -plaintext -d '{"runtime_id": "42,xxx", "action": 9}' localhost:50051 UiAutomation.UiAutomationService/PerformActionWithStructure

# 4. Click "×" → Find multiply button, click it
grpccurl -plaintext -d '{"runtime_id": "42,yyy", "action": 9}' localhost:50051 UiAutomation.UiAutomationService/PerformActionWithStructure

# 5. Click "9" again
grpccurl -plaintext -d '{"runtime_id": "42,xxx", "action": 9}' localhost:50051 UiAutomation.UiAutomationService/PerformActionWithStructure

# 6. Click "=" → Get result from display element
grpccurl -plaintext -d '{"runtime_id": "42,zzz", "action": 9}' localhost:50051 UiAutomation.UiAutomationService/PerformActionWithStructure

Other Useful Methods

MethodDescription
OpenAppLaunch application by path/name
CloseAppClose by process name
CloseAppByProcessIdClose by PID
SendKeysSend keyboard input
TakeScreenshotCapture window/element

Troubleshooting

IssueSolution
App not foundEnsure app is running; use OpenApp first
Element not foundUniqIds are runtime-specific; re-fetch structure
Access deniedRun server with Admin privileges
Server unreachableVerify UiAutomationGRPC.Server is running on 50051
SSL connection errorServer uses HTTP; add -plaintext flag to grpccurl
Certificate errorUse -insecure flag or provide valid -cacert
UnauthenticatedToken auth enabled; add -H "Authorization: Bearer TOKEN"
Invalid tokenVerify token is in server's Security.ValidTokens list