Add local Whisper backend (mlx_whisper / openai-whisper) by glen-carlson · Pull Request #18 · bradautomates/claude-video

glen-carlson · 2026-05-09T11:23:07Z

Summary

Adds a third Whisper backend that shells to a local binary instead of POSTing audio to Groq or OpenAI. Useful for:

Offline use — /watch works on a plane / behind a corporate proxy.
Privacy — audio never leaves the machine.
The 25 MB cap — local has no upload limit, so long-form videos with no captions stop being a problem.
Apple Silicon speed — mlx_whisper runs whisper-large-v3-turbo faster than the cloud round-trip on an M-series machine, with no per-call cost.

The implementation reuses the existing audio-extraction step (mono 16 kHz mp3) and shells to a binary that takes the openai-whisper CLI shape:

<bin> <audio> --output-format json --output-dir <dir> [--model <id>]

mlx_whisper and openai-whisper both expose this exact CLI, so one code path handles both. The resulting <stem>.json matches the Whisper API verbose_json schema, so _segments_from_response works unchanged.

How to use

/watch ~/Movies/clip.mov                  # auto-detected if no cloud key set
/watch <url> --whisper local              # force the local backend
WATCH_WHISPER_BACKEND=local /watch <url>  # global override

Optional knobs:

WATCH_LOCAL_WHISPER_BIN=/path/to/binary — point at any compatible binary.
WATCH_LOCAL_WHISPER_MODEL=large-v3 — override the model id.

Resolution order

When --whisper isn't set, the priority is:

WATCH_WHISPER_BACKEND env / .env (groq | openai | local)
GROQ_API_KEY
OPENAI_API_KEY
local binary (mlx_whisper → openai-whisper)

So existing users see zero behaviour change — cloud keys still win when present. Local only kicks in when nothing else is configured, or when explicitly requested.

Changes

scripts/whisper.py — new resolve_backend() helper, _post_local() runner, _resolve_local_bin() probe. load_api_key() retained unchanged for back-compat with anyone importing it.
scripts/watch.py — --whisper accepts local; uses resolve_backend; targeted hint when --whisper local fires without a binary on PATH.
scripts/setup.py — _have_local_whisper() probe; --check / --json / installer report the local backend; setup is "ready" when either a cloud key or a local binary is present. local_whisper_bin added to the --json snapshot.
hooks/scripts/check-setup.sh — SessionStart hook is silent when local Whisper is available, even with no cloud key.
README.md / SKILL.md / CHANGELOG.md — documentation.

Test plan

python3 scripts/whisper.py /tmp/test.mp3 --backend local end-to-end on Apple Silicon with mlx-community/whisper-large-v3-turbo — JSON parses, segments normalise to {start, end, text}.
resolve_backend(None) returns ("local", "/opt/homebrew/bin/mlx_whisper") when no cloud key is set.
resolve_backend("groq") returns (None, None) when GROQ_API_KEY is unset (existing behaviour preserved).
setup.py --check exits 0 with no output when local binary is present and no cloud key is set.
hooks/scripts/check-setup.sh exits 0 with no output in the same situation (after first run).
CI: I haven't run the existing test suite — there isn't one in the repo. Happy to add unit tests for resolve_backend if you'd like.

Notes for review

I kept load_api_key() exported and unchanged so external callers don't break. New code uses resolve_backend().
The WATCH_LOCAL_WHISPER_BIN env knob accepts either a name on PATH or an absolute path, mirroring how WATCH_LOCAL_WHISPER_MODEL works.
I followed the existing "pure stdlib, no SDK" convention — no new pip deps, no new imports beyond what was already there.

Adds a third Whisper backend that shells to a local binary instead of POSTing audio to Groq or OpenAI. Useful for offline transcription, privacy (audio never leaves the machine), and dodging the 25 MB cloud upload cap on long videos. - whisper.py: new resolve_backend() picks groq | openai | local based on --whisper, WATCH_WHISPER_BACKEND env, then cloud key, then local binary. _post_local() runs `<bin> <audio> --output-format json --output-dir <dir>`, parses the resulting JSON, and reuses _segments_from_response. mlx_whisper and openai-whisper share the same CLI shape, so one call site handles both. load_api_key() retained unchanged for back-compat with anyone importing it. - watch.py: --whisper accepts `local`; uses resolve_backend(); prints a targeted hint when --whisper local fires without a binary on PATH. - setup.py: _have_local_whisper() probe; --check / --json / installer report the local backend; setup is "ready" when either a cloud key or a local binary is present. - check-setup.sh: SessionStart hook is silent when local Whisper is available, even with no cloud key. - README / SKILL.md / CHANGELOG: document the new backend, env knobs (WATCH_WHISPER_BACKEND, WATCH_LOCAL_WHISPER_BIN, WATCH_LOCAL_WHISPER_MODEL), and the relaxed 25 MB note for the local path. Smoke-tested end-to-end on Apple Silicon with mlx-community/whisper-large-v3-turbo; output JSON parses to the same {start, end, text} shape the cloud path returns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add local Whisper backend (mlx_whisper / openai-whisper)#18

Add local Whisper backend (mlx_whisper / openai-whisper)#18
glen-carlson wants to merge 1 commit into
bradautomates:mainfrom
glen-carlson:add-local-whisper-backend

glen-carlson commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

glen-carlson commented May 9, 2026

Summary

How to use

Resolution order

Changes

Test plan

Notes for review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant