YouTube Video Transcriber & Summarizer

Transform YouTube videos into searchable text — single-video Whisper transcription or batch playlist / channel auto-subs.

Features

CUDA-accelerated transcription with OpenAI Whisper
Multi-language support with auto-detection
Batch playlist / channel mode: download YouTube auto-subs (no GPU needed) for an entire playlist or channel; channel URLs auto-group output under one folder
Optional timestamped segments: emit _segments.json sidecar with Whisper start/end/text
Optional visual evidence (V1): extract midpoint frames per segment for local files
Multiple input sources: YouTube, Google Drive, local files
Performance optimized:
- Automatic memory cleanup (no memory leaks)
- Auto temp file cleanup

Quick Start

# Clone the repository
git clone https://github.com/sergiomarquezdev/yt-transcriber.git
cd yt-transcriber

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/macOS
.\venv\Scripts\activate   # Windows

# Install dependencies
pip install -e .

# Configure environment (optional — defaults work out of the box)
cp .env.example .env

# Transcribe your first video
yt-transcriber transcribe --url "https://www.youtube.com/watch?v=VIDEO_ID"

Prerequisites

Python 3.12+
FFmpeg - Required for audio processing
CUDA 12.8 (Optional) - For GPU acceleration

FFmpeg Installation

Windows:

# Download from https://github.com/BtbN/FFmpeg-Builds/releases
# Extract to C:\ffmpeg and add to PATH
[Environment]::SetEnvironmentVariable("Path", $env:Path + ";C:\ffmpeg\bin", [EnvironmentVariableTarget]::User)

macOS:

brew install ffmpeg

Linux:

sudo apt install ffmpeg  # Ubuntu/Debian

Usage

The CLI exposes two subcommands: transcribe (single video / Drive / local file via Whisper) and playlist (batch auto-subs from a YouTube playlist or channel — no GPU).

`transcribe`

# Single YouTube video (DEFAULT)
yt-transcriber transcribe --url "https://www.youtube.com/watch?v=VIDEO_ID"

# Force Spanish transcription
yt-transcriber transcribe --url "URL" --language es

# Local file
yt-transcriber transcribe --url "path/to/video.mp4"

# Emit timestamped segments sidecar JSON
yt-transcriber transcribe --url "URL_OR_LOCAL_FILE" --segments

# Extract visual evidence frames (implies --segments, local files only in V1)
yt-transcriber transcribe --url "path/to/video.mp4" --visual-evidence

Option	Short	Description
`--url`	`-u`	YouTube URL, Google Drive URL, or local file path
`--language`	`-l`	Language code (`en`, `es`) — auto-detect if omitted
`--segments` / `--no-segments`		Enable/disable `_segments.json` sidecar (CLI override; env fallback when omitted)
`--visual-evidence` / `--no-visual-evidence`		Enable/disable frame extraction per segment (local files only in V1; visual implies segments)
`--ffmpeg-location`		Custom FFmpeg path

`playlist`

Batch-downloads YouTube auto-generated subtitles (via yt-dlp, no Whisper / no GPU) for every video of a playlist or channel. Channel URLs (/@handle/videos) are auto-detected and every video gets nested under a single folder named after the channel.

# All videos of a playlist
yt-transcriber playlist --url "https://www.youtube.com/playlist?list=PLxxxx"

# Last 24 videos of a channel (auto-grouped under output/<channel>/)
yt-transcriber playlist --url "https://www.youtube.com/@SomeChannel/videos" --limit 24

# Choose a different auto-sub language
yt-transcriber playlist --url "URL" --language en

Option	Short	Description
`--url`	`-u`	YouTube playlist or channel URL (required)
`--limit`	`-n`	Process only the last N videos (default: all)
`--language`	`-l`	Auto-sub language code (default: `es`)

Output

Each invocation writes to its own folder under OUTPUT_BASE_DIR (default output/):

output/<stem>/
├── <stem>.txt                  # Plain transcript (always)
├── <stem>_segments.json        # Optional, when --segments / TRANSCRIPT_SEGMENTS_ENABLED
└── <stem>_frame_<n>.jpg ...    # Optional, when --visual-evidence on a local file

<stem> = {normalized_title}_vid_{video_id}_job_{timestamp}. Reruns produce a new folder; no overwrites.

Channel grouping (playlist mode): when the URL is a channel (yt-dlp returns a channel field), every video is nested under a slug of the channel name:

output/<channel_slug>/
├── <stem_video_1>/<stem_video_1>.txt
├── <stem_video_2>/<stem_video_2>.txt
└── ...

Regular playlists (no channel field) keep the flat output/<stem>/ layout.

Configuration

Create .env from template:

# Whisper Model
WHISPER_MODEL_NAME=base    # tiny, base, small, medium, large
WHISPER_DEVICE=cuda        # cuda or cpu

# Directories
TEMP_DOWNLOAD_DIR=temp_files/
OUTPUT_BASE_DIR=output/

# Optional transcript segments sidecar (default off)
TRANSCRIPT_SEGMENTS_ENABLED=false

# Optional visual evidence extraction (default off, local files only in V1)
VISUAL_EVIDENCE_ENABLED=false
VISUAL_EVIDENCE_MIN_SEGMENT_SECONDS=1.0

# Logging
LOG_LEVEL=INFO

Model Selection

Model	Speed	Accuracy	VRAM	Use Case
`tiny`	Fast	Low	~1GB	Quick drafts
`base`	Good	Medium	~1GB	Default - Balanced
`small`	Medium	Good	~2GB	Better quality
`medium`	Slow	High	~5GB	High accuracy
`large`	Slowest	Best	~10GB	Best quality

Programmatic Usage

from yt_transcriber.cli import run_transcribe_command, run_playlist_command

# Single video / Drive / local file
transcript_path = run_transcribe_command(url="path/to/video.mp4")

# Playlist or channel (returns {"successful", "failed", "files"})
stats = run_playlist_command(
    url="https://www.youtube.com/@SomeChannel/videos",
    limit=24,
    language="es",
)

Performance & Resource Management

The application includes automatic optimizations for production use:

Memory Management:

Whisper model auto-loads/unloads per video (prevents memory leaks)
Automatic garbage collection after each transcription

Resource Cleanup:

Temp files auto-deleted after processing (even on errors)
No manual cleanup needed

FFmpeg:

Timeout protection (5min) prevents hung processes

Troubleshooting

FFmpeg not found:

# Use direct path
yt-transcriber transcribe --url "URL" --ffmpeg-location "C:\ffmpeg\bin\ffmpeg.exe"

CUDA not available:

# Check installation
python -c "import torch; print(torch.cuda.is_available())"

# Fall back to CPU in .env
WHISPER_DEVICE=cpu

Out of memory:

# Use smaller model in .env
WHISPER_MODEL_NAME=tiny

License

MIT License - see LICENSE

Acknowledgments

OpenAI Whisper - AI transcription (via faster-whisper / CTranslate2)
yt-dlp - Video downloading and auto-sub extraction

Made with care by Sergio Marquez

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
core		core
docs/superpowers		docs/superpowers
plans		plans
tests		tests
yt_transcriber		yt_transcriber
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
launch_tui.bat		launch_tui.bat
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube Video Transcriber & Summarizer

Features

Quick Start

Prerequisites

FFmpeg Installation

Usage

`transcribe`

`playlist`

Output

Configuration

Model Selection

Programmatic Usage

Performance & Resource Management

Troubleshooting

License

Acknowledgments

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YouTube Video Transcriber & Summarizer

Features

Quick Start

Prerequisites

FFmpeg Installation

Usage

transcribe

playlist

Output

Configuration

Model Selection

Programmatic Usage

Performance & Resource Management

Troubleshooting

License

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

`transcribe`

`playlist`