Add Deepgram (nova-3) as a third transcription backend#13
Open
apicurius wants to merge 1 commit into
Open
Conversation
Whisper API uploads cap at 25 MB, which constrains long-video coverage even
with mono 16 kHz audio. Deepgram's /v1/listen has no per-request size limit
and exposes utterances directly, mapping cleanly onto the existing
{start, end, text} segment shape used by both the VTT parser and the Whisper
verbose_json adapter.
The Deepgram client follows the same retry/backoff envelope as the existing
Whisper client (4 attempts total, 2 of them on 429), but differs on three
wire-level points: Token (not Bearer) auth, raw audio body (not multipart),
and a results.utterances[] response shape with fallback to the full
alternative transcript when utterances are absent.
Backend selection extends the existing chain: Groq -> OpenAI -> Deepgram,
overridable via --whisper {groq,openai,deepgram}. setup.py now scaffolds
DEEPGRAM_API_KEY alongside the other placeholders and accepts it as
satisfying the preflight key check. Stderr messages and docs refer to
"transcription" rather than "Whisper" where the broader concept applies.
Bumps plugin.json to 0.2.0 (additive feature, no breaking changes to the
default backend behavior).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Deepgram as a third option alongside the existing Groq + OpenAI Whisper backends. Useful primarily because Deepgram's
/v1/listenhas no per-request size limit — Whisper APIs cap at 25 MB, which limits long-video coverage even with mono 16 kHz audio.Why nova-3 / utterances
results.utterances[]already gives us pre-segmented chunks withstart,end, andtranscript, mapping cleanly onto the existing{start, end, text}segment shape used by both the VTT parser (transcribe.parse_vtt) and the Whisperverbose_jsonadapter (_segments_from_response). No downstream changes needed intranscribe.filter_rangeorformat_transcript.smart_format=true&punctuate=true&detect_language=truekeeps behavior parity with Whisper's defaults (auto language, punctuated output).Wire-level differences from Whisper
The Deepgram client mirrors
_post_whisper's retry/backoff envelope (4 attempts, 2 of them on 429), but differs on three points:Token <key>, notBearer <key>.results.utterances[](orresults.channels[0].alternatives[0].transcriptas fallback).Pure stdlib — no
deepgram-sdkdependency, consistent with the existing Groq/OpenAI implementation.Backend selection
Preference order when multiple keys are set: Groq → OpenAI → Deepgram. Override with
--whisper {groq,openai,deepgram}.setup.pyscaffoldsDEEPGRAM_API_KEY=alongside the other placeholders and accepts it as satisfying the preflight key check.Stderr messages and docs refer to "transcription" / "speech-to-text" rather than "Whisper" where the broader concept applies — but the
--whisperCLI flag name is preserved for back-compat.Test plan
python3 -c "import ast; ast.parse(open('scripts/whisper.py').read())"— syntax cleanapi.deepgram.com/v1/listen, got back 26 segments aligned to the speaker's actual delivery._segments_from_deepgram_responsefalls back to the alternative transcript whenutterancesis absent (manual response stub).setup.py --checkreturns 0 with onlyDEEPGRAM_API_KEYset.setup.py --jsonreportswhisper_backend: "deepgram".Notes
plugin.jsonto0.2.0(additive feature, no breaking changes to default backend behavior).frames.py/download.pypaths.