Add Deepgram (nova-3) as a third transcription backend by apicurius · Pull Request #13 · bradautomates/claude-video

apicurius · 2026-05-04T13:44:55Z

Summary

Adds Deepgram as a third option alongside the existing Groq + OpenAI Whisper backends. Useful primarily because Deepgram's /v1/listen has no per-request size limit — Whisper APIs cap at 25 MB, which limits long-video coverage even with mono 16 kHz audio.

Why nova-3 / utterances

results.utterances[] already gives us pre-segmented chunks with start, end, and transcript, mapping cleanly onto the existing {start, end, text} segment shape used by both the VTT parser (transcribe.parse_vtt) and the Whisper verbose_json adapter (_segments_from_response). No downstream changes needed in transcribe.filter_range or format_transcript.

smart_format=true&punctuate=true&detect_language=true keeps behavior parity with Whisper's defaults (auto language, punctuated output).

Wire-level differences from Whisper

The Deepgram client mirrors _post_whisper's retry/backoff envelope (4 attempts, 2 of them on 429), but differs on three points:

Auth header is Token <key>, not Bearer <key>.
Body is the raw audio bytes — no multipart form.
Response shape is results.utterances[] (or results.channels[0].alternatives[0].transcript as fallback).

Pure stdlib — no deepgram-sdk dependency, consistent with the existing Groq/OpenAI implementation.

Backend selection

Preference order when multiple keys are set: Groq → OpenAI → Deepgram. Override with --whisper {groq,openai,deepgram}. setup.py scaffolds DEEPGRAM_API_KEY= alongside the other placeholders and accepts it as satisfying the preflight key check.

Stderr messages and docs refer to "transcription" / "speech-to-text" rather than "Whisper" where the broader concept applies — but the --whisper CLI flag name is preserved for back-compat.

Test plan

python3 -c "import ast; ast.parse(open('scripts/whisper.py').read())" — syntax clean
Smoke-test against an X.com video without captions: extracted audio, uploaded to api.deepgram.com/v1/listen, got back 26 segments aligned to the speaker's actual delivery.
Verified _segments_from_deepgram_response falls back to the alternative transcript when utterances is absent (manual response stub).
setup.py --check returns 0 with only DEEPGRAM_API_KEY set.
setup.py --json reports whisper_backend: "deepgram".

Notes

Bumped plugin.json to 0.2.0 (additive feature, no breaking changes to default backend behavior).
No new dependencies. No changes to the frames.py / download.py paths.

Whisper API uploads cap at 25 MB, which constrains long-video coverage even with mono 16 kHz audio. Deepgram's /v1/listen has no per-request size limit and exposes utterances directly, mapping cleanly onto the existing {start, end, text} segment shape used by both the VTT parser and the Whisper verbose_json adapter. The Deepgram client follows the same retry/backoff envelope as the existing Whisper client (4 attempts total, 2 of them on 429), but differs on three wire-level points: Token (not Bearer) auth, raw audio body (not multipart), and a results.utterances[] response shape with fallback to the full alternative transcript when utterances are absent. Backend selection extends the existing chain: Groq -> OpenAI -> Deepgram, overridable via --whisper {groq,openai,deepgram}. setup.py now scaffolds DEEPGRAM_API_KEY alongside the other placeholders and accepts it as satisfying the preflight key check. Stderr messages and docs refer to "transcription" rather than "Whisper" where the broader concept applies. Bumps plugin.json to 0.2.0 (additive feature, no breaking changes to the default backend behavior).

deebee37 mentioned this pull request Jun 17, 2026

Add macOS/Linux launcher script deebee37/claude-video#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Deepgram (nova-3) as a third transcription backend#13

Add Deepgram (nova-3) as a third transcription backend#13
apicurius wants to merge 1 commit into
bradautomates:mainfrom
apicurius:add-deepgram-backend

apicurius commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

apicurius commented May 4, 2026

Summary

Why nova-3 / utterances

Wire-level differences from Whisper

Backend selection

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant