feat: add --sub-lang to fetch non-English captions (skip paid Whisper)#30
Open
taeseok-kim-pm wants to merge 1 commit into
Open
feat: add --sub-lang to fetch non-English captions (skip paid Whisper)#30taeseok-kim-pm wants to merge 1 commit into
taeseok-kim-pm wants to merge 1 commit into
Conversation
download.py hardcoded `--sub-langs en,en-US,en-GB,en-orig`, so non-English videos (e.g. Korean) found no captions and fell straight back to the paid Whisper API even when free native captions existed. - download(): new `sub_langs` param (default unchanged: English variants) - _pick_subtitle(): prefer the requested languages in priority order - watch.py: new `--sub-lang` flag (e.g. `--sub-lang ko` or `ja,en`) - docs: README options + cost table, SKILL.md flags Backward compatible: default behavior is identical. Verified on a Korean video — `--sub-lang ko` now yields native captions instead of 'captions missing'.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
scripts/download.pyhardcodes the subtitle languages requested from yt-dlp:For non-English videos (Korean, Japanese, etc.) this finds no captions, so
/watchfalls straight back to the paid Whisper API — even when the video has perfectly good free native captions. A user without a Groq/OpenAI key just gets "captions missing" and a frames-only report.Fix
Make the caption languages configurable, defaulting to the current English list (fully backward compatible):
download()/download_url()— newsub_langsparam (defaultDEFAULT_SUB_LANGS = "en,en-US,en-GB,en-orig")_pick_subtitle()— prefers the requested languages in priority order (instead of always preferring.en)watch.py— new--sub-langflag, e.g.--sub-lang koor--sub-lang ja,enBackward compatibility
Default behavior is unchanged — omitting
--sub-langrequests the same English variants as before.Verification
On a Korean video that previously reported "captions missing":
→ now returns
Source: captionswith the actual Korean transcript, and the free native-caption path is used instead of Whisper (no API key needed).