Osaurus macOS Use
Automate macOS through accessibility APIs. This plugin gives you direct control over any application's UI — click buttons, type text, fill forms, navigate menus, browse the web in Safari, and more.
Core Workflow
Every interaction follows the Open-Observe-Act pattern:
- •Open the app —
open_applicationto launch/activate, returns apid. - •Observe the UI — ALWAYS call
get_ui_elementswith thepidbefore doing anything else. This confirms the app is ready and gives you element IDs. Never skip this step. Never send keyboard or mouse actions before observing. - •Act — use element IDs to
click_element,type_text,set_value,press_key, etc. - •Re-observe when the UI changes — after navigation, dialogs, tab switches, or form submissions. Do NOT re-observe after actions that don't change the visible UI (typing, toggling, pressing shortcuts).
This decoupled approach keeps token usage low. A typical 5-step interaction costs ~3K tokens vs ~150K if you re-observed after every action.
When to Re-Observe
Call get_ui_elements again when:
- •You clicked a button/link that opens a new view, dialog, page, or menu
- •You switched tabs or windows
- •You submitted a form and the UI refreshed
- •An element ID returns "Element not found"
Do NOT re-observe when:
- •You just typed text into a field (the field still has focus)
- •You just pressed a keyboard shortcut (e.g., Cmd+S to save)
- •You clicked a toggle, checkbox, or other control that doesn't change the surrounding UI
Tool Selection Guide
Clicking
- •
click_element— Default choice. Uses accessibility actions (most reliable). Supportsbutton: "right"for context menus anddoubleClick: truefor double-clicks. - •
click— Fallback for coordinate-based clicks. Only use when elements aren't accessible (canvas apps, image regions, screenshot-guided interaction).
Entering Text
- •
set_value— Best for form fields. Directly sets the element's value. Instant, reliable, replaces existing content. - •
type_text— Simulates keystroke-by-keystroke typing. Use whenset_valuedoesn't work (e.g., search fields that need live filtering, password fields, or fields that trigger on-type events). Passidto auto-focus the element first.
Prefer set_value over type_text when filling forms. Fall back to type_text if set_value returns an error.
Screenshots
- •
take_screenshot— Use when the accessibility tree is insufficient to understand the visual layout (e.g., verifying styling, reading images, canvas apps, or when elements don't have labels). - •
get_ui_elements— Preferred for most interactions. Lighter, faster, and returns structured data.
Use take_screenshot with pid to capture a specific app window. Default settings (JPEG, 0.7 quality, 0.5 scale) are optimized for token efficiency.
Keyboard
- •
press_key— For keyboard shortcuts, navigation keys, and special keys. Always prefer keyboard shortcuts over UI clicking when available (faster, more reliable).
Scrolling
- •
scroll— Passx/yto scroll a specific area. Without coordinates, scrolls at the current mouse position. Useamountto control scroll distance (default: 3 pixels).
Token Efficiency Tips
- •Use
interactiveOnly: true(default) when callingget_ui_elements. Only set tofalsewhen you need to read static text labels. - •Keep
maxElementslow. Default is 100. For simple UIs (dialogs, settings panes), use 30-50. For complex UIs (web pages), use 100-150. - •Use
rolesfilter to narrow results. For example,roles: ["button"]when looking for a specific button, orroles: ["textField", "textArea"]when looking for input fields. - •Avoid unnecessary screenshots. Screenshots consume vision tokens. Use
get_ui_elementsfirst — only screenshot if you need visual context. - •Batch actions between observations. After the initial observe, perform multiple actions (click, type, press key) before re-observing — but always do that initial observe after
open_application. - •Use keyboard shortcuts instead of navigating menus.
press_key("s", modifiers: ["command"])is cheaper than finding and clicking File > Save.
Common Recipes
Open an App and Inspect It
Always observe after opening — never skip this step:
1. open_application(identifier: "Notes")
→ { pid: 1234, name: "Notes" }
2. get_ui_elements(pid: 1234)
→ Returns elements with IDs — app is confirmed ready
Click a Button
After observing (step 2 above), find the element and click it:
click_element(id: 5)
→ { success: true }
Fill a Text Field
Use roles to filter for input fields, then set the value:
1. get_ui_elements(pid: 1234, roles: ["textField", "searchField"])
→ Find text field with ID = 8
2. set_value(id: 8, value: "Hello, world!")
→ { success: true }
If set_value fails, fall back to type_text:
type_text(text: "Hello, world!", id: 8)
→ { success: true }
Navigate a Menu
Use keyboard shortcuts when possible. Otherwise:
1. click_element(id: <menu_bar_item_id>) → Opens menu 2. get_ui_elements(pid: 1234, roles: ["menuItem"]) → Find the menu item 3. click_element(id: <menu_item_id>)
Right-Click for Context Menu
1. click_element(id: 5, button: "right") → Opens context menu 2. get_ui_elements(pid: 1234, roles: ["menuItem"]) → Find context menu items 3. click_element(id: <menu_item_id>)
Handle a Dialog
After an action triggers a dialog:
1. get_ui_elements(pid: 1234) → Dialog elements appear (buttons like "OK", "Cancel", "Save") 2. click_element(id: <ok_button_id>)
Switch Between Apps
1. open_application(identifier: "Safari")
→ { pid: 5678 }
2. get_ui_elements(pid: 5678)
→ Safari's UI elements — now safe to interact
You can also use press_key("tab", modifiers: ["command"]) to switch, but always follow up with get_ui_elements before sending any input to the newly focused app.
Safari Web Browsing
Safari's web content is fully accessible through the accessibility tree. Links, buttons, headings, text fields, and other interactive elements all appear in get_ui_elements.
Navigate to a URL
1. open_application(identifier: "Safari")
→ { pid: 5678 }
2. get_ui_elements(pid: 5678)
→ Confirm Safari is loaded and ready
3. press_key("l", modifiers: ["command"])
→ Focuses the address bar
4. type_text(text: "https://example.com")
5. press_key("return")
→ Page loads
6. get_ui_elements(pid: 5678)
→ Web page elements (links, buttons, inputs)
Click a Link on a Web Page
Once Safari is open and observed:
1. click_element(id: 12) → Navigates to sign-in page (ID 12 was "Sign In" link from observation) 2. get_ui_elements(pid: 5678) → New page elements (re-observe because the page changed)
Fill a Web Form
After navigating to a page with a form:
1. get_ui_elements(pid: 5678, roles: ["textField"]) → Find email field ID = 15, password field ID = 16 2. set_value(id: 15, value: "user@example.com") 3. set_value(id: 16, value: "password123") 4. click_element(id: <submit_button_id>)
Search the Web
Assumes Safari is already open and observed (you have the pid):
1. press_key("l", modifiers: ["command"])
2. type_text(text: "weather in San Francisco")
3. press_key("return")
4. get_ui_elements(pid: 5678)
→ Search results page elements
Tab Management
- •New tab:
press_key("t", modifiers: ["command"]) - •Close tab:
press_key("w", modifiers: ["command"]) - •Next tab:
press_key("}", modifiers: ["command", "shift"]) - •Previous tab:
press_key("{", modifiers: ["command", "shift"]) - •Reopen closed tab:
press_key("z", modifiers: ["command", "shift"])
Reading Page Content
Use get_ui_elements with interactiveOnly: false to read static text on a page. If the page layout matters, use take_screenshot to visually inspect it.
Scrolling a Web Page
scroll(direction: "down", amount: 5, x: 700, y: 400)
Pass the center of the Safari content area as x/y to ensure scrolling happens in the right place.
macOS Keyboard Shortcuts
Use these with press_key to avoid navigating menus:
System
| Action | Key | Modifiers |
|---|---|---|
| Switch app | tab | ["command"] |
| Spotlight search | space | ["command"] |
| Force quit | escape | ["command", "option"] |
| Lock screen | q | ["command", "control"] |
| Screenshot (clipboard) | 3 | ["command", "shift"] |
| Screenshot (selection) | 4 | ["command", "shift"] |
File Operations
| Action | Key | Modifiers |
|---|---|---|
| Save | s | ["command"] |
| Save As | s | ["command", "shift"] |
| Open | o | ["command"] |
| New | n | ["command"] |
| Close window | w | ["command"] |
| Quit app | q | ["command"] |
p | ["command"] |
Editing
| Action | Key | Modifiers |
|---|---|---|
| Copy | c | ["command"] |
| Cut | x | ["command"] |
| Paste | v | ["command"] |
| Undo | z | ["command"] |
| Redo | z | ["command", "shift"] |
| Select all | a | ["command"] |
| Find | f | ["command"] |
| Find next | g | ["command"] |
Safari
| Action | Key | Modifiers |
|---|---|---|
| Focus address bar | l | ["command"] |
| New tab | t | ["command"] |
| Close tab | w | ["command"] |
| Reload | r | ["command"] |
| Back | [ | ["command"] |
| Forward | ] | ["command"] |
| Downloads | l | ["command", "option"] |
| Bookmarks | b | ["command", "option"] |
| Reader mode | r | ["command", "shift"] |
Navigation
| Action | Key | Modifiers |
|---|---|---|
| Next field | tab | |
| Previous field | tab | ["shift"] |
| Confirm/submit | return | |
| Cancel/dismiss | escape | |
| Page up | pageup | |
| Page down | pagedown | |
| Top of page | home | |
| Bottom of page | end |
Tool Reference
open_application
- •Accepts app name (
"Safari"), bundle ID ("com.apple.Safari"), or file path. - •If already running, activates the app. Otherwise launches it.
- •Returns
pid,bundleId, andname.
get_ui_elements
- •Returns interactive elements with assigned IDs. Each element has:
id,role,label,value,x,y,w,h,actions. - •IDs are valid until the next
get_ui_elementscall (which resets the cache). - •Use
rolesfilter for targeted queries:["button"],["textField", "textArea"],["link"],["menuItem"], etc. - •Common roles:
button,link,textField,textArea,checkBox,radioButton,popUpButton,comboBox,slider,menuItem,tab,searchField.
click_element
- •Left-click by default. Pass
button: "right"for right-click. PassdoubleClick: truefor double-click. - •Uses AXPress action first (most reliable), falls back to coordinate click.
- •Returns
{ success: true }or{ success: false, error: "..." }.
click
- •Clicks at raw screen coordinates. Only use when elements aren't accessible.
- •Supports
button(left/right/center) anddoubleClick.
type_text
- •Types keystroke-by-keystroke into the focused element.
- •Pass
idto auto-focus an element before typing. - •Use for search fields, password fields, or fields that need on-type events.
set_value
- •Directly sets an element's value via accessibility API.
- •Preferred over
type_textfor form fields — instant and replaces existing content. - •Returns error if the element isn't editable.
press_key
- •Key names:
return,escape,tab,delete,space,up,down,left,right,f1-f12,home,end,pageup,pagedown, or single characters (a,1,,, etc.). - •Modifier names:
command,shift,option,control.
scroll
- •Directions:
up,down,left,right. - •
amountcontrols scroll distance in pixels (default: 3). Use higher values (5-10) for faster scrolling. - •Pass
x/yto position the mouse before scrolling (important for scrolling specific areas).
drag
- •Drags from (
startX,startY) to (endX,endY). - •Useful for sliders, window resizing, drag-and-drop, and drawing.
take_screenshot
- •Defaults: JPEG format, 0.7 quality, 0.5 scale.
- •Pass
pidto capture a specific app's window. - •Pass
savePathto save to disk (avoids base64 token costs). - •Returns MCP ImageContent format for vision model consumption.
get_active_window
- •Returns:
pid,appname,title,x,y,w,h. - •Useful when you don't know which app is in front.
list_displays
- •Returns all connected displays with index, position, and dimensions.
- •Only needed for multi-monitor setups.
Troubleshooting
"Element not found"
The element cache was reset or the element is no longer on screen. Call get_ui_elements again to refresh.
"Failed to set element value"
The element may not be editable via accessibility. Fall back to type_text with the element id.
No elements returned
- •Verify the
pidis correct (useget_active_windowto check). - •Some apps have poor accessibility support. Try
take_screenshotand use coordinate-basedclickinstead. - •For web content in Safari, ensure the page has fully loaded before querying elements.
Stale element positions
Elements may move after window resize or scroll. Call get_ui_elements again if coordinate-based fallback clicks miss.
Accessibility permission denied
The host application needs Accessibility permission in System Settings > Privacy & Security > Accessibility.
Limitations
- •Canvas-based apps (Figma, games) — No element tree. Use
take_screenshot+clickwith coordinates. - •Poorly accessible apps — Some apps don't expose their UI through accessibility APIs. Use screenshot-guided coordinate clicks as fallback.
- •Complex web apps — Very dynamic SPAs may have elements that appear/disappear rapidly. Re-observe frequently and use shorter
maxElements. - •Element modification — Cannot reorder, resize, or restyle UI elements. This plugin observes and interacts with the existing UI.