Skip to content

feat: add --sub-lang to fetch non-English captions (skip paid Whisper)#30

Open
taeseok-kim-pm wants to merge 1 commit into
bradautomates:mainfrom
taeseok-kim-pm:feat/sub-lang-captions
Open

feat: add --sub-lang to fetch non-English captions (skip paid Whisper)#30
taeseok-kim-pm wants to merge 1 commit into
bradautomates:mainfrom
taeseok-kim-pm:feat/sub-lang-captions

Conversation

@taeseok-kim-pm

Copy link
Copy Markdown

Problem

scripts/download.py hardcodes the subtitle languages requested from yt-dlp:

"--sub-langs", "en,en-US,en-GB,en-orig",

For non-English videos (Korean, Japanese, etc.) this finds no captions, so /watch falls straight back to the paid Whisper API — even when the video has perfectly good free native captions. A user without a Groq/OpenAI key just gets "captions missing" and a frames-only report.

Fix

Make the caption languages configurable, defaulting to the current English list (fully backward compatible):

  • download() / download_url() — new sub_langs param (default DEFAULT_SUB_LANGS = "en,en-US,en-GB,en-orig")
  • _pick_subtitle() — prefers the requested languages in priority order (instead of always preferring .en)
  • watch.py — new --sub-lang flag, e.g. --sub-lang ko or --sub-lang ja,en
  • docs: README (options list + cost table) and SKILL.md flags

Backward compatibility

Default behavior is unchanged — omitting --sub-lang requests the same English variants as before.

Verification

On a Korean video that previously reported "captions missing":

python3 scripts/watch.py "<korean-video-url>" --sub-lang ko --no-whisper --start 0 --end 25

→ now returns Source: captions with the actual Korean transcript, and the free native-caption path is used instead of Whisper (no API key needed).

download.py hardcoded `--sub-langs en,en-US,en-GB,en-orig`, so non-English
videos (e.g. Korean) found no captions and fell straight back to the paid
Whisper API even when free native captions existed.

- download(): new `sub_langs` param (default unchanged: English variants)
- _pick_subtitle(): prefer the requested languages in priority order
- watch.py: new `--sub-lang` flag (e.g. `--sub-lang ko` or `ja,en`)
- docs: README options + cost table, SKILL.md flags

Backward compatible: default behavior is identical. Verified on a Korean
video — `--sub-lang ko` now yields native captions instead of 'captions missing'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant