Skip to content

sergiomarquezdev/yt-transcriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

51 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

YouTube Video Transcriber & Summarizer

Python 3.12+ License PyTorch

Transform YouTube videos into searchable text β€” single-video Whisper transcription or batch playlist / channel auto-subs.

Features

  • CUDA-accelerated transcription with OpenAI Whisper
  • Multi-language support with auto-detection
  • Batch playlist / channel mode: download YouTube auto-subs (no GPU needed) for an entire playlist or channel; channel URLs auto-group output under one folder
  • Optional timestamped segments: emit _segments.json sidecar with Whisper start/end/text
  • Optional visual evidence (V1): extract midpoint frames per segment for local files
  • Multiple input sources: YouTube, Google Drive, local files
  • Performance optimized:
    • Automatic memory cleanup (no memory leaks)
    • Auto temp file cleanup

Quick Start

# Clone the repository
git clone https://github.com/sergiomarquezdev/yt-transcriber.git
cd yt-transcriber

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/macOS
.\venv\Scripts\activate   # Windows

# Install dependencies
pip install -e .

# Configure environment (optional β€” defaults work out of the box)
cp .env.example .env

# Transcribe your first video
yt-transcriber transcribe --url "https://www.youtube.com/watch?v=VIDEO_ID"

Prerequisites

  • Python 3.12+
  • FFmpeg - Required for audio processing
  • CUDA 12.8 (Optional) - For GPU acceleration

FFmpeg Installation

Windows:

# Download from https://github.com/BtbN/FFmpeg-Builds/releases
# Extract to C:\ffmpeg and add to PATH
[Environment]::SetEnvironmentVariable("Path", $env:Path + ";C:\ffmpeg\bin", [EnvironmentVariableTarget]::User)

macOS:

brew install ffmpeg

Linux:

sudo apt install ffmpeg  # Ubuntu/Debian

Usage

The CLI exposes two subcommands: transcribe (single video / Drive / local file via Whisper) and playlist (batch auto-subs from a YouTube playlist or channel β€” no GPU).

transcribe

# Single YouTube video (DEFAULT)
yt-transcriber transcribe --url "https://www.youtube.com/watch?v=VIDEO_ID"

# Force Spanish transcription
yt-transcriber transcribe --url "URL" --language es

# Local file
yt-transcriber transcribe --url "path/to/video.mp4"

# Emit timestamped segments sidecar JSON
yt-transcriber transcribe --url "URL_OR_LOCAL_FILE" --segments

# Extract visual evidence frames (implies --segments, local files only in V1)
yt-transcriber transcribe --url "path/to/video.mp4" --visual-evidence
Option Short Description
--url -u YouTube URL, Google Drive URL, or local file path
--language -l Language code (en, es) β€” auto-detect if omitted
--segments / --no-segments Enable/disable _segments.json sidecar (CLI override; env fallback when omitted)
--visual-evidence / --no-visual-evidence Enable/disable frame extraction per segment (local files only in V1; visual implies segments)
--ffmpeg-location Custom FFmpeg path

playlist

Batch-downloads YouTube auto-generated subtitles (via yt-dlp, no Whisper / no GPU) for every video of a playlist or channel. Channel URLs (/@handle/videos) are auto-detected and every video gets nested under a single folder named after the channel.

# All videos of a playlist
yt-transcriber playlist --url "https://www.youtube.com/playlist?list=PLxxxx"

# Last 24 videos of a channel (auto-grouped under output/<channel>/)
yt-transcriber playlist --url "https://www.youtube.com/@SomeChannel/videos" --limit 24

# Choose a different auto-sub language
yt-transcriber playlist --url "URL" --language en
Option Short Description
--url -u YouTube playlist or channel URL (required)
--limit -n Process only the last N videos (default: all)
--language -l Auto-sub language code (default: es)

Output

Each invocation writes to its own folder under OUTPUT_BASE_DIR (default output/):

output/<stem>/
β”œβ”€β”€ <stem>.txt                  # Plain transcript (always)
β”œβ”€β”€ <stem>_segments.json        # Optional, when --segments / TRANSCRIPT_SEGMENTS_ENABLED
└── <stem>_frame_<n>.jpg ...    # Optional, when --visual-evidence on a local file

<stem> = {normalized_title}_vid_{video_id}_job_{timestamp}. Reruns produce a new folder; no overwrites.

Channel grouping (playlist mode): when the URL is a channel (yt-dlp returns a channel field), every video is nested under a slug of the channel name:

output/<channel_slug>/
β”œβ”€β”€ <stem_video_1>/<stem_video_1>.txt
β”œβ”€β”€ <stem_video_2>/<stem_video_2>.txt
└── ...

Regular playlists (no channel field) keep the flat output/<stem>/ layout.

Configuration

Create .env from template:

# Whisper Model
WHISPER_MODEL_NAME=base    # tiny, base, small, medium, large
WHISPER_DEVICE=cuda        # cuda or cpu

# Directories
TEMP_DOWNLOAD_DIR=temp_files/
OUTPUT_BASE_DIR=output/

# Optional transcript segments sidecar (default off)
TRANSCRIPT_SEGMENTS_ENABLED=false

# Optional visual evidence extraction (default off, local files only in V1)
VISUAL_EVIDENCE_ENABLED=false
VISUAL_EVIDENCE_MIN_SEGMENT_SECONDS=1.0

# Logging
LOG_LEVEL=INFO

Model Selection

Model Speed Accuracy VRAM Use Case
tiny Fast Low ~1GB Quick drafts
base Good Medium ~1GB Default - Balanced
small Medium Good ~2GB Better quality
medium Slow High ~5GB High accuracy
large Slowest Best ~10GB Best quality

Programmatic Usage

from yt_transcriber.cli import run_transcribe_command, run_playlist_command

# Single video / Drive / local file
transcript_path = run_transcribe_command(url="path/to/video.mp4")

# Playlist or channel (returns {"successful", "failed", "files"})
stats = run_playlist_command(
    url="https://www.youtube.com/@SomeChannel/videos",
    limit=24,
    language="es",
)

Performance & Resource Management

The application includes automatic optimizations for production use:

Memory Management:

  • Whisper model auto-loads/unloads per video (prevents memory leaks)
  • Automatic garbage collection after each transcription

Resource Cleanup:

  • Temp files auto-deleted after processing (even on errors)
  • No manual cleanup needed

FFmpeg:

  • Timeout protection (5min) prevents hung processes

Troubleshooting

FFmpeg not found:

# Use direct path
yt-transcriber transcribe --url "URL" --ffmpeg-location "C:\ffmpeg\bin\ffmpeg.exe"

CUDA not available:

# Check installation
python -c "import torch; print(torch.cuda.is_available())"

# Fall back to CPU in .env
WHISPER_DEVICE=cpu

Out of memory:

# Use smaller model in .env
WHISPER_MODEL_NAME=tiny

License

MIT License - see LICENSE

Acknowledgments

  • OpenAI Whisper - AI transcription (via faster-whisper / CTranslate2)
  • yt-dlp - Video downloading and auto-sub extraction

Made with care by Sergio Marquez

About

πŸ› οΈ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.

Topics

Resources

Stars

Watchers

Forks

Contributors