Skip to content

ZcwDev/macos-control-skills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

English · 中文

macos-control

A lightweight macOS desktop automation toolkit — control the real mouse, keyboard, and screen from the shell. It is packaged as a Cursor Agent Skill (SKILL.md), but the scripts are plain Bash and work standalone in any terminal.

No MCP server, no Node, no Python. Just system tools (screencapture, osascript, sips) plus cliclick for precise mouse/keyboard events.

⚠️ Safety warning. These scripts move your real cursor, press real keys, and capture your screen. Run only code you understand. Anything driving the GUI can click the wrong thing if a window isn't focused — review each action. Avoid destructive operations and never let it type secrets.

What it does

  • 📸 Screenshots (downscaled JPEG, cheap to read) and high-detail region "zoom"
  • 🖱️ Mouse: move, single / double / right click at logical coordinates
  • ⌨️ Keyboard: type text and press shortcuts (cmd c, return, arrows, …)
  • 🪟 App control: activate apps, read frontmost app / window / mouse / screen size

It is designed to be driven by an AI agent in an observe → act → verify loop, but every script is usable by hand.

Requirements

  • macOS (Apple Silicon or Intel)
  • cliclick: brew install cliclick
  • Grant the controlling app (e.g. Cursor, Terminal, iTerm) these in System Settings → Privacy & Security:
    • Screen Recording (for screenshots)
    • Accessibility (for clicks / keystrokes)

Install

As a Cursor skill

git clone https://github.com/ZcwDev/macos-control.git
cp -R macos-control ~/.cursor/skills/macos-control   # SKILL.md is auto-discovered

Standalone

git clone https://github.com/ZcwDev/macos-control.git
cd macos-control && chmod +x scripts/*.sh

Scripts

Script Purpose
context.sh Frontmost app, window title, mouse point, logical screen size
screenshot.sh [path] Downscaled JPEG of the screen; prints path + logical_size
zoom.sh X Y [W H] High-detail capture around a point (cursor shown) to verify a target
move.sh X Y Move mouse, no click
click.sh X Y [single|double|right] Click at logical points
type.sh "text" Type literal ASCII (use the clipboard for non-ASCII)
key.sh [cmd ctrl alt shift] KEY Keys / shortcuts (return, cmd c, cmd shift 4, …)
app.sh "Name" Activate / focus an app

Coordinates

Coordinates are logical points (what cliclick uses). screenshot.sh prints logical_size=WxH. Convert a target by fraction of the image, not raw pixels (the image you view is downscaled to a variable size):

click_x = (target_x_in_image / image_width)  * logical_width
click_y = (target_y_in_image / image_height) * logical_height

For small/adjacent targets, refine with the move.sh → zoom.sh → click.sh loop.

Example

scripts/app.sh Safari
scripts/screenshot.sh                 # read it, estimate a target's fraction
scripts/click.sh 735 480
scripts/type.sh "hello"
scripts/key.sh return

Notes & limitations

  • Works best with native macOS apps. Electron-based apps (some chat and editor apps) don't expose accessibility data, so locate targets visually with screenshots + the zoom loop.
  • Typing goes through the active input source. Switch to a plain Latin/ABC input source before typing ASCII, or use the clipboard for other text.
  • On multi-monitor setups it captures and controls the main display.

Credits & license

  • This project: MIT (see LICENSE).
  • Depends at runtime on cliclick by Carsten Blüm (BSD-3-Clause), installed separately — not bundled here.

About

Shell-based macOS desktop automation (mouse, keyboard, screenshots, app control) via cliclick + osascript. Packaged as a Cursor Agent Skill.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages