Transform YouTube videos into searchable text β single-video Whisper transcription or batch playlist / channel auto-subs.
- CUDA-accelerated transcription with OpenAI Whisper
- Multi-language support with auto-detection
- Batch playlist / channel mode: download YouTube auto-subs (no GPU needed) for an entire playlist or channel; channel URLs auto-group output under one folder
- Optional timestamped segments: emit
_segments.jsonsidecar with Whisperstart/end/text - Optional visual evidence (V1): extract midpoint frames per segment for local files
- Multiple input sources: YouTube, Google Drive, local files
- Performance optimized:
- Automatic memory cleanup (no memory leaks)
- Auto temp file cleanup
# Clone the repository
git clone https://github.com/sergiomarquezdev/yt-transcriber.git
cd yt-transcriber
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
.\venv\Scripts\activate # Windows
# Install dependencies
pip install -e .
# Configure environment (optional β defaults work out of the box)
cp .env.example .env
# Transcribe your first video
yt-transcriber transcribe --url "https://www.youtube.com/watch?v=VIDEO_ID"- Python 3.12+
- FFmpeg - Required for audio processing
- CUDA 12.8 (Optional) - For GPU acceleration
Windows:
# Download from https://github.com/BtbN/FFmpeg-Builds/releases
# Extract to C:\ffmpeg and add to PATH
[Environment]::SetEnvironmentVariable("Path", $env:Path + ";C:\ffmpeg\bin", [EnvironmentVariableTarget]::User)macOS:
brew install ffmpegLinux:
sudo apt install ffmpeg # Ubuntu/DebianThe CLI exposes two subcommands: transcribe (single video / Drive / local file via Whisper) and playlist (batch auto-subs from a YouTube playlist or channel β no GPU).
# Single YouTube video (DEFAULT)
yt-transcriber transcribe --url "https://www.youtube.com/watch?v=VIDEO_ID"
# Force Spanish transcription
yt-transcriber transcribe --url "URL" --language es
# Local file
yt-transcriber transcribe --url "path/to/video.mp4"
# Emit timestamped segments sidecar JSON
yt-transcriber transcribe --url "URL_OR_LOCAL_FILE" --segments
# Extract visual evidence frames (implies --segments, local files only in V1)
yt-transcriber transcribe --url "path/to/video.mp4" --visual-evidence| Option | Short | Description |
|---|---|---|
--url |
-u |
YouTube URL, Google Drive URL, or local file path |
--language |
-l |
Language code (en, es) β auto-detect if omitted |
--segments / --no-segments |
Enable/disable _segments.json sidecar (CLI override; env fallback when omitted) |
|
--visual-evidence / --no-visual-evidence |
Enable/disable frame extraction per segment (local files only in V1; visual implies segments) | |
--ffmpeg-location |
Custom FFmpeg path |
Batch-downloads YouTube auto-generated subtitles (via yt-dlp, no Whisper / no GPU) for every video of a playlist or channel. Channel URLs (/@handle/videos) are auto-detected and every video gets nested under a single folder named after the channel.
# All videos of a playlist
yt-transcriber playlist --url "https://www.youtube.com/playlist?list=PLxxxx"
# Last 24 videos of a channel (auto-grouped under output/<channel>/)
yt-transcriber playlist --url "https://www.youtube.com/@SomeChannel/videos" --limit 24
# Choose a different auto-sub language
yt-transcriber playlist --url "URL" --language en| Option | Short | Description |
|---|---|---|
--url |
-u |
YouTube playlist or channel URL (required) |
--limit |
-n |
Process only the last N videos (default: all) |
--language |
-l |
Auto-sub language code (default: es) |
Each invocation writes to its own folder under OUTPUT_BASE_DIR (default output/):
output/<stem>/
βββ <stem>.txt # Plain transcript (always)
βββ <stem>_segments.json # Optional, when --segments / TRANSCRIPT_SEGMENTS_ENABLED
βββ <stem>_frame_<n>.jpg ... # Optional, when --visual-evidence on a local file
<stem> = {normalized_title}_vid_{video_id}_job_{timestamp}. Reruns produce a new folder; no overwrites.
Channel grouping (playlist mode): when the URL is a channel (yt-dlp returns a channel field), every video is nested under a slug of the channel name:
output/<channel_slug>/
βββ <stem_video_1>/<stem_video_1>.txt
βββ <stem_video_2>/<stem_video_2>.txt
βββ ...
Regular playlists (no channel field) keep the flat output/<stem>/ layout.
Create .env from template:
# Whisper Model
WHISPER_MODEL_NAME=base # tiny, base, small, medium, large
WHISPER_DEVICE=cuda # cuda or cpu
# Directories
TEMP_DOWNLOAD_DIR=temp_files/
OUTPUT_BASE_DIR=output/
# Optional transcript segments sidecar (default off)
TRANSCRIPT_SEGMENTS_ENABLED=false
# Optional visual evidence extraction (default off, local files only in V1)
VISUAL_EVIDENCE_ENABLED=false
VISUAL_EVIDENCE_MIN_SEGMENT_SECONDS=1.0
# Logging
LOG_LEVEL=INFO| Model | Speed | Accuracy | VRAM | Use Case |
|---|---|---|---|---|
tiny |
Fast | Low | ~1GB | Quick drafts |
base |
Good | Medium | ~1GB | Default - Balanced |
small |
Medium | Good | ~2GB | Better quality |
medium |
Slow | High | ~5GB | High accuracy |
large |
Slowest | Best | ~10GB | Best quality |
from yt_transcriber.cli import run_transcribe_command, run_playlist_command
# Single video / Drive / local file
transcript_path = run_transcribe_command(url="path/to/video.mp4")
# Playlist or channel (returns {"successful", "failed", "files"})
stats = run_playlist_command(
url="https://www.youtube.com/@SomeChannel/videos",
limit=24,
language="es",
)The application includes automatic optimizations for production use:
Memory Management:
- Whisper model auto-loads/unloads per video (prevents memory leaks)
- Automatic garbage collection after each transcription
Resource Cleanup:
- Temp files auto-deleted after processing (even on errors)
- No manual cleanup needed
FFmpeg:
- Timeout protection (5min) prevents hung processes
FFmpeg not found:
# Use direct path
yt-transcriber transcribe --url "URL" --ffmpeg-location "C:\ffmpeg\bin\ffmpeg.exe"CUDA not available:
# Check installation
python -c "import torch; print(torch.cuda.is_available())"
# Fall back to CPU in .env
WHISPER_DEVICE=cpuOut of memory:
# Use smaller model in .env
WHISPER_MODEL_NAME=tinyMIT License - see LICENSE
- OpenAI Whisper - AI transcription (via
faster-whisper/ CTranslate2) - yt-dlp - Video downloading and auto-sub extraction
Made with care by Sergio Marquez