Add local Whisper backend (mlx_whisper / openai-whisper)#18
Open
glen-carlson wants to merge 1 commit into
Open
Add local Whisper backend (mlx_whisper / openai-whisper)#18glen-carlson wants to merge 1 commit into
glen-carlson wants to merge 1 commit into
Conversation
Adds a third Whisper backend that shells to a local binary instead of POSTing
audio to Groq or OpenAI. Useful for offline transcription, privacy (audio
never leaves the machine), and dodging the 25 MB cloud upload cap on long
videos.
- whisper.py: new resolve_backend() picks groq | openai | local based on
--whisper, WATCH_WHISPER_BACKEND env, then cloud key, then local binary.
_post_local() runs `<bin> <audio> --output-format json --output-dir <dir>`,
parses the resulting JSON, and reuses _segments_from_response. mlx_whisper
and openai-whisper share the same CLI shape, so one call site handles both.
load_api_key() retained unchanged for back-compat with anyone importing it.
- watch.py: --whisper accepts `local`; uses resolve_backend(); prints a
targeted hint when --whisper local fires without a binary on PATH.
- setup.py: _have_local_whisper() probe; --check / --json / installer report
the local backend; setup is "ready" when either a cloud key or a local
binary is present.
- check-setup.sh: SessionStart hook is silent when local Whisper is
available, even with no cloud key.
- README / SKILL.md / CHANGELOG: document the new backend, env knobs
(WATCH_WHISPER_BACKEND, WATCH_LOCAL_WHISPER_BIN, WATCH_LOCAL_WHISPER_MODEL),
and the relaxed 25 MB note for the local path.
Smoke-tested end-to-end on Apple Silicon with mlx-community/whisper-large-v3-turbo;
output JSON parses to the same {start, end, text} shape the cloud path returns.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a third Whisper backend that shells to a local binary instead of POSTing audio to Groq or OpenAI. Useful for:
/watchworks on a plane / behind a corporate proxy.mlx_whisperrunswhisper-large-v3-turbofaster than the cloud round-trip on an M-series machine, with no per-call cost.The implementation reuses the existing audio-extraction step (mono 16 kHz mp3) and shells to a binary that takes the openai-whisper CLI shape:
mlx_whisperandopenai-whisperboth expose this exact CLI, so one code path handles both. The resulting<stem>.jsonmatches the Whisper APIverbose_jsonschema, so_segments_from_responseworks unchanged.How to use
Optional knobs:
WATCH_LOCAL_WHISPER_BIN=/path/to/binary— point at any compatible binary.WATCH_LOCAL_WHISPER_MODEL=large-v3— override the model id.Resolution order
When
--whisperisn't set, the priority is:WATCH_WHISPER_BACKENDenv /.env(groq | openai | local)GROQ_API_KEYOPENAI_API_KEYmlx_whisper→openai-whisper)So existing users see zero behaviour change — cloud keys still win when present. Local only kicks in when nothing else is configured, or when explicitly requested.
Changes
scripts/whisper.py— newresolve_backend()helper,_post_local()runner,_resolve_local_bin()probe.load_api_key()retained unchanged for back-compat with anyone importing it.scripts/watch.py—--whisperacceptslocal; usesresolve_backend; targeted hint when--whisper localfires without a binary on PATH.scripts/setup.py—_have_local_whisper()probe;--check/--json/ installer report the local backend; setup is "ready" when either a cloud key or a local binary is present.local_whisper_binadded to the--jsonsnapshot.hooks/scripts/check-setup.sh— SessionStart hook is silent when local Whisper is available, even with no cloud key.README.md/SKILL.md/CHANGELOG.md— documentation.Test plan
python3 scripts/whisper.py /tmp/test.mp3 --backend localend-to-end on Apple Silicon withmlx-community/whisper-large-v3-turbo— JSON parses, segments normalise to{start, end, text}.resolve_backend(None)returns("local", "/opt/homebrew/bin/mlx_whisper")when no cloud key is set.resolve_backend("groq")returns(None, None)when GROQ_API_KEY is unset (existing behaviour preserved).setup.py --checkexits 0 with no output when local binary is present and no cloud key is set.hooks/scripts/check-setup.shexits 0 with no output in the same situation (after first run).resolve_backendif you'd like.Notes for review
load_api_key()exported and unchanged so external callers don't break. New code usesresolve_backend().WATCH_LOCAL_WHISPER_BINenv knob accepts either a name on PATH or an absolute path, mirroring howWATCH_LOCAL_WHISPER_MODELworks.