Skip to content

feat: add AssemblyAI as a third transcription backend (v0.2.0)#17

Open
cristianorj22 wants to merge 1 commit into
bradautomates:mainfrom
cristianorj22:feat/assemblyai-backend
Open

feat: add AssemblyAI as a third transcription backend (v0.2.0)#17
cristianorj22 wants to merge 1 commit into
bradautomates:mainfrom
cristianorj22:feat/assemblyai-backend

Conversation

@cristianorj22

Copy link
Copy Markdown

Summary

  • Adds AssemblyAI as a third transcription backend alongside Groq and OpenAI in scripts/whisper.py. Async client (upload → create job → poll up to 30 min), stdlib-only (no SDK dep) per the project's convention.
  • Auto-priority becomes cost-ascending: Groq ($0.04/h) → AssemblyAI ($0.27/h) → OpenAI (~$0.36/h). Single-key users see no behavior change. Override with --whisper {groq|assemblyai|openai}.
  • Strengths of AssemblyAI vs the existing options: stronger PT-BR transcription, auto language detection, and a much higher upload size ceiling than Groq/OpenAI's 25 MB limit (so long videos don't need chunking).

Files touched

  • scripts/whisper.py_post_assemblyai() (upload → create → poll), _segments_from_assemblyai() chunks per-word into ~5s segments to match Whisper's segment shape so downstream filter_range / format_transcript need no changes. Priority chain in load_api_key() extended.
  • scripts/setup.py — env template adds ASSEMBLYAI_API_KEY, _have_api_key() includes the new key in the same priority order, installer prompts and --check error message updated.
  • scripts/watch.py--whisper choices grew to groq|assemblyai|openai with cost-ascending help text.
  • SKILL.md — usage section, transcription-fallback section, failure-mode hint, privacy notes, and bundled-scripts list now cover all three backends.
  • CHANGELOG.md[0.2.0] entry.
  • .claude-plugin/plugin.json — version 0.1.20.2.0, description rewritten.

Test plan

  • python -m py_compile scripts/whisper.py scripts/setup.py scripts/watch.py — clean
  • python scripts/setup.py --json returns status: ready with new priority chain (single Groq key configured → whisper_backend: groq, no behavior regression)
  • End-to-end against a video without captions using --whisper assemblyai (needs an AssemblyAI API key)
  • bash scripts/build-skill.sh — release CI on tag push will run this; verify the bundle still respects the 200-file cap and single-SKILL.md invariants with the changes

Notes

  • AssemblyAI's auth header is the bare key (no Bearer prefix), and upload is raw bytes (not multipart). Kept the AssemblyAI client path separate from the existing Whisper multipart path to keep each provider's quirks isolated.
  • Polling: 3s interval, 30 min hard timeout. AssemblyAI typically completes in real-time-fraction (e.g., 30 min audio in 2-3 min); 30 min cap covers ~3 h videos with margin.
  • --no-whisper keeps working as a kill switch for the entire transcription path.

Third transcription option alongside Groq and OpenAI. Async client
(upload audio → create job → poll up to 30 min) wired into whisper.py
using stdlib only — no SDK dependency.

Auto-priority is cost-ascending: Groq (~\$0.04/h) → AssemblyAI
(~\$0.27/h) → OpenAI (~\$0.36/h). Users with a single key see no
behavior change; users with multiple keys now get the cheapest
available. Override with --whisper {groq|assemblyai|openai}.

AssemblyAI strengths: stronger PT-BR transcription, auto language
detection, much higher upload size ceiling than Groq/OpenAI's 25 MB.

setup.py env template, --check error message, installer prompts, and
SKILL.md (usage, fallback section, privacy notes, bundled-scripts list)
updated to cover the new backend.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cristianorj22 cristianorj22 force-pushed the feat/assemblyai-backend branch from fe3bba8 to c7eb776 Compare May 9, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant