Skip to content

Add local Whisper backend (mlx_whisper / openai-whisper)#18

Open
glen-carlson wants to merge 1 commit into
bradautomates:mainfrom
glen-carlson:add-local-whisper-backend
Open

Add local Whisper backend (mlx_whisper / openai-whisper)#18
glen-carlson wants to merge 1 commit into
bradautomates:mainfrom
glen-carlson:add-local-whisper-backend

Conversation

@glen-carlson

Copy link
Copy Markdown

Summary

Adds a third Whisper backend that shells to a local binary instead of POSTing audio to Groq or OpenAI. Useful for:

  • Offline use/watch works on a plane / behind a corporate proxy.
  • Privacy — audio never leaves the machine.
  • The 25 MB cap — local has no upload limit, so long-form videos with no captions stop being a problem.
  • Apple Silicon speedmlx_whisper runs whisper-large-v3-turbo faster than the cloud round-trip on an M-series machine, with no per-call cost.

The implementation reuses the existing audio-extraction step (mono 16 kHz mp3) and shells to a binary that takes the openai-whisper CLI shape:

<bin> <audio> --output-format json --output-dir <dir> [--model <id>]

mlx_whisper and openai-whisper both expose this exact CLI, so one code path handles both. The resulting <stem>.json matches the Whisper API verbose_json schema, so _segments_from_response works unchanged.

How to use

/watch ~/Movies/clip.mov                  # auto-detected if no cloud key set
/watch <url> --whisper local              # force the local backend
WATCH_WHISPER_BACKEND=local /watch <url>  # global override

Optional knobs:

  • WATCH_LOCAL_WHISPER_BIN=/path/to/binary — point at any compatible binary.
  • WATCH_LOCAL_WHISPER_MODEL=large-v3 — override the model id.

Resolution order

When --whisper isn't set, the priority is:

  1. WATCH_WHISPER_BACKEND env / .env (groq | openai | local)
  2. GROQ_API_KEY
  3. OPENAI_API_KEY
  4. local binary (mlx_whisperopenai-whisper)

So existing users see zero behaviour change — cloud keys still win when present. Local only kicks in when nothing else is configured, or when explicitly requested.

Changes

  • scripts/whisper.py — new resolve_backend() helper, _post_local() runner, _resolve_local_bin() probe. load_api_key() retained unchanged for back-compat with anyone importing it.
  • scripts/watch.py--whisper accepts local; uses resolve_backend; targeted hint when --whisper local fires without a binary on PATH.
  • scripts/setup.py_have_local_whisper() probe; --check / --json / installer report the local backend; setup is "ready" when either a cloud key or a local binary is present. local_whisper_bin added to the --json snapshot.
  • hooks/scripts/check-setup.sh — SessionStart hook is silent when local Whisper is available, even with no cloud key.
  • README.md / SKILL.md / CHANGELOG.md — documentation.

Test plan

  • python3 scripts/whisper.py /tmp/test.mp3 --backend local end-to-end on Apple Silicon with mlx-community/whisper-large-v3-turbo — JSON parses, segments normalise to {start, end, text}.
  • resolve_backend(None) returns ("local", "/opt/homebrew/bin/mlx_whisper") when no cloud key is set.
  • resolve_backend("groq") returns (None, None) when GROQ_API_KEY is unset (existing behaviour preserved).
  • setup.py --check exits 0 with no output when local binary is present and no cloud key is set.
  • hooks/scripts/check-setup.sh exits 0 with no output in the same situation (after first run).
  • CI: I haven't run the existing test suite — there isn't one in the repo. Happy to add unit tests for resolve_backend if you'd like.

Notes for review

  • I kept load_api_key() exported and unchanged so external callers don't break. New code uses resolve_backend().
  • The WATCH_LOCAL_WHISPER_BIN env knob accepts either a name on PATH or an absolute path, mirroring how WATCH_LOCAL_WHISPER_MODEL works.
  • I followed the existing "pure stdlib, no SDK" convention — no new pip deps, no new imports beyond what was already there.

Adds a third Whisper backend that shells to a local binary instead of POSTing
audio to Groq or OpenAI. Useful for offline transcription, privacy (audio
never leaves the machine), and dodging the 25 MB cloud upload cap on long
videos.

- whisper.py: new resolve_backend() picks groq | openai | local based on
  --whisper, WATCH_WHISPER_BACKEND env, then cloud key, then local binary.
  _post_local() runs `<bin> <audio> --output-format json --output-dir <dir>`,
  parses the resulting JSON, and reuses _segments_from_response. mlx_whisper
  and openai-whisper share the same CLI shape, so one call site handles both.
  load_api_key() retained unchanged for back-compat with anyone importing it.
- watch.py: --whisper accepts `local`; uses resolve_backend(); prints a
  targeted hint when --whisper local fires without a binary on PATH.
- setup.py: _have_local_whisper() probe; --check / --json / installer report
  the local backend; setup is "ready" when either a cloud key or a local
  binary is present.
- check-setup.sh: SessionStart hook is silent when local Whisper is
  available, even with no cloud key.
- README / SKILL.md / CHANGELOG: document the new backend, env knobs
  (WATCH_WHISPER_BACKEND, WATCH_LOCAL_WHISPER_BIN, WATCH_LOCAL_WHISPER_MODEL),
  and the relaxed 25 MB note for the local path.

Smoke-tested end-to-end on Apple Silicon with mlx-community/whisper-large-v3-turbo;
output JSON parses to the same {start, end, text} shape the cloud path returns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant