Add Deepgram backend: transcribe audio over Whisper's 25 MB limit (45 min+ videos) by drlee91 · Pull Request #35 · bradautomates/claude-video

drlee91 · 2026-06-09T23:01:11Z

Problem

Groq and OpenAI Whisper both reject uploads over 25 MB with HTTP 413. At the 64 kbps mono extraction rate that's roughly 45 minutes of audio — so any longer caption-less video currently comes back frames-only, with no transcript possible at all:

[watch] audio: 27625 kB — uploading to groq Whisper…
[watch] whisper fallback failed: Whisper request failed: HTTP Error 413: Payload Too Large
       — {"error":{"message":"Request Entity Too Large",...,"code":"request_too_large"}}

This bites in practice: long tutorials, podcasts, talks, VODs — and YouTube caption pulls can also fail with 429 rate limits, which makes the audio fallback the only path for those videos.

Approach

Add Deepgram's pre-recorded API (nova-2) as a third backend. Its file limit is ~2 GB, so the 25 MB cliff disappears. Routing keeps current behavior unchanged for existing users:

Audio ≤ 24 MB: exactly as before — Groq preferred, OpenAI fallback. Deepgram additionally catches the case where Whisper errors out.
Audio > 24 MB: routed straight to Deepgram instead of failing with 413. Without a Deepgram key, Whisper is still attempted so the user sees the clear 413 error.
Standalone: if DEEPGRAM_API_KEY is the only configured key, Deepgram serves as the primary backend.
Override: --whisper deepgram forces it.

Design notes

Pure stdlib, matching whisper.py — urllib only, no SDK dependency. Same {start, end, text} segment shape (from Deepgram utterances, with paragraph/transcript fallbacks), same SystemExit error contract, same key resolution (env → ~/.config/watch/.env → ./.env).
detect_language=true keeps it language-agnostic like the Whisper path.
The module is named deepgram_backend (not deepgram) so it can never shadow or be shadowed by the real Deepgram SDK.
setup.py accepts DEEPGRAM_API_KEY as a valid transcription key, scaffolds a commented placeholder, and includes it in the install hints. --check/--json semantics are unchanged otherwise.
Considered chunked uploads to stay within Whisper's 25 MB instead, but that needs segment-boundary handling, per-chunk timestamp offsetting, and N sequential uploads — a second provider that follows the existing backend contract is simpler and also gives users a provider choice. Happy to adjust if you'd prefer the chunking route.

Verification

59-min YouTube video (caption pull 429'd, audio 27 MB): routed to Deepgram, returned 628 timestamped segments, full report, exit 0. Previously: frames-only.
Small file regression: 40 s local clip without flags → still transcribed via Groq (unchanged default).
Forced: --whisper deepgram with a Groq key present → Deepgram used.
Standalone: only DEEPGRAM_API_KEY configured → Deepgram auto-selected as primary; setup.py --json reports ready.
No keys at all: updated guidance message lists all three options; --check exits 3 as before.

Docs (README + SKILL.md) updated accordingly.

…limit Groq and OpenAI Whisper both reject uploads larger than 25 MB (HTTP 413). At the 64 kbps mono extraction rate that is roughly 45 minutes of audio, so any longer caption-less video currently comes back frames-only with no way to get a transcript. This adds Deepgram's pre-recorded API (nova-2) as a third backend: - Audio <= 24 MB: unchanged - Whisper (Groq preferred, OpenAI fallback). Deepgram now also catches the case where Whisper errors out. - Audio > 24 MB: routed straight to Deepgram, which accepts files up to ~2 GB, instead of failing with 413. - Standalone: when DEEPGRAM_API_KEY is the only key configured, Deepgram serves as the primary backend for all sizes. - --whisper deepgram forces it explicitly. Implementation follows the existing whisper.py conventions: pure stdlib (urllib, no SDK), same {start, end, text} segment shape, same SystemExit error contract, keys via env or ~/.config/watch/.env. The module is named deepgram_backend (not deepgram) so it can never shadow the real Deepgram SDK if one is installed. setup.py accepts DEEPGRAM_API_KEY as a valid transcription key, scaffolds a placeholder for it, and mentions it in the install hints. SKILL.md and README document the new backend. Tested end-to-end on Windows with a 59-minute YouTube video (27 MB audio, native captions unavailable due to YouTube 429): routed to Deepgram and returned 628 timestamped segments. Also verified: small file still prefers Groq; --whisper deepgram forces Deepgram with a Groq key present; Deepgram-only config auto-selects Deepgram; no key at all produces the updated guidance message.

drlee91 mentioned this pull request Jun 11, 2026

Support for longer videos and additional flag --json #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Deepgram backend: transcribe audio over Whisper's 25 MB limit (45 min+ videos)#35

Add Deepgram backend: transcribe audio over Whisper's 25 MB limit (45 min+ videos)#35
drlee91 wants to merge 1 commit into
bradautomates:mainfrom
drlee91:feat/deepgram-transcription

drlee91 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

drlee91 commented Jun 9, 2026

Problem

Approach

Design notes

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant