Skip to content

SyntX34/SubtitleTool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 SubtitleGenerator

A high-performance C++ subtitle generator built on whisper.cpp, with automatic GPU acceleration and CPU fallback, multi-format audio/video input via FFmpeg, and built-in translation to English.

Build License: MIT Platforms Releases Stars Downloads


✨ Features

  • πŸš€ GPU-accelerated, CPU-safe β€” uses CUDA on Windows/Linux or Metal on macOS automatically when available, and falls back to a SIMD-optimized CPU path with no extra setup if it isn't.
  • 🎯 Accurate timing β€” segment-level timestamps with confidence scores per segment.
  • 🌍 96-language transcription + translation β€” transcribe in the spoken language, or translate directly into English (see Language & Translation).
  • πŸ“ Multiple output formats β€” SRT, WebVTT, ASS (Advanced SubStation Alpha), JSON, and plain TXT.
  • 🎞️ Multi-format input via FFmpeg β€” WAV natively, plus MP3/MP4/MKV/FLAC/OGG/AAC/MOV/WEBM/and more when built with FFmpeg.
  • βš™οΈ Configurable β€” adjustable chunking, segment merging/splitting, confidence filtering, filler-word removal.
  • πŸ–₯️ Live hardware + backend report β€” prints CPU, core count, RAM, and the active compute backend (CUDA / Metal / CPU) on every run.
  • ⏹️ Graceful Ctrl+C handling β€” stop at any time; subtitles produced so far are still saved.
  • πŸ“Š Performance stats β€” real-time factor, average confidence, average segment length.

πŸ“¦ Requirements

Windows Linux macOS
Compiler MSVC 2019+ GCC 7+ / Clang 6+ Clang (Xcode CLT)
CMake 3.15+ 3.15+ 3.15+
GPU backend CUDA Toolkit (optional) CUDA Toolkit (optional) Metal (built into the OS)
FFmpeg (optional, for non-WAV input) via vcpkg apt install libavformat-dev ... brew install ffmpeg

GPU acceleration is detected at build time. If the CUDA toolkit (or Metal on macOS) is present when you run CMake, the binary is built with GPU support; otherwise it falls back to CPU. To enable GPU acceleration, install the CUDA toolkit from nvidia.com and rebuild.


πŸš€ Quick Start

1. Clone with submodules

git clone --recursive https://github.com/SyntX34/subtitle-generator.git
cd subtitle-generator

Already cloned without --recursive?

git submodule update --init --recursive

2. Download a model

# Linux / macOS
./scripts/download_model.sh base.en

# Windows (PowerShell)
.\scripts\download_model.ps1 -Model base.en
Model Size Notes
tiny.en ~75 MB Fastest, least accurate
base.en ~145 MB Good default for English
small.en ~465 MB Better accuracy, still fast
medium.en ~1.5 GB High accuracy, slower
large-v3 ~3 GB Best accuracy, multilingual, needs a GPU for real-time use

Drop the .en suffix (e.g. base, small) for the multilingual variants if you need non-English transcription or translation.

3. Build

cmake -B build -S .
cmake --build build --config Release

CMake auto-detects your platform's GPU backend. To force a CPU-only build (e.g. for a smaller, dependency-free binary):

cmake -B build -S . -DFORCE_CPU=ON

4. Run

./build/SubtitleTool audio.wav -m models/ggml-base.en.bin -o subtitles.srt

πŸ› οΈ Usage

SubtitleTool <audio_or_video_file> [OPTIONS]

MODEL
  -m, --model <path>      Whisper model (default: models/ggml-base.en.bin)

OUTPUT
  -o, --output <path>     Output file (default: <input>.<format>)
  -f, --format <fmt>      srt | vtt | ass | json | txt  (default: srt)
      --all-formats       Save all formats at once

LANGUAGE & TRANSLATION
  -l, --lang <code>       Source language spoken in the audio (default: auto)
      --translate         Translate the transcription into English
      --list-languages    Print every supported language code and exit

PROCESSING
  -t, --threads <n>       Worker threads (default: 4)
      --chunk <secs>      Audio chunk size in seconds (default: 30)
      --min-conf <0-1>    Drop segments below this confidence
      --no-filler         Remove filler words (um, uh, er, ...)

MISC
  -s, --stream            Print subtitles live as they're generated
  -v, --verbose           Verbose whisper output
  -h, --help              Show the full help message

🌍 Language & Translation

SubtitleTool separates what language is spoken from what language you want the output in:

Flag Meaning
-l <code> The language spoken in the audio. Use auto to let whisper detect it, or set it explicitly if you already know it β€” this is faster and more accurate than auto-detection.
--translate Translate the transcription into English. whisper.cpp can only translate into English β€” it can't translate directly between two non-English languages.

Example β€” Spanish video, English subtitles:

SubtitleTool pelicula.mp4 -l es --translate -o subtitles_en.srt

Example β€” Japanese video, Japanese subtitles (no translation):

SubtitleTool anime.mkv -l ja -o subtitles_ja.srt

Example β€” unknown language, transcribed in its original language:

SubtitleTool interview.wav -l auto -o subtitles.srt

To translate from a language that isn't English into a third language (e.g. Spanish β†’ French), run SubtitleTool once to get an English transcript or translation, then run that text through a text translation tool β€” whisper.cpp's translation path is English-only by design.

See every supported language code:

SubtitleTool --list-languages

πŸ“‚ Project Layout

subtitle-generator/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ main.cpp                 # CLI entry point, hardware/banner reporting
β”‚   β”œβ”€β”€ SubtitleGenerator.{hpp,cpp}
β”‚   β”œβ”€β”€ AudioDecoder.{hpp,cpp}    # WAV (built-in) + FFmpeg (optional)
β”‚   β”œβ”€β”€ TimestampFormatter.{hpp,cpp}
β”‚   └── dl_windows.cpp            # Windows dynamic-library helpers
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ download_model.sh
β”‚   └── download_model.ps1
β”œβ”€β”€ third_party/
β”‚   └── whisper.cpp/              # git submodule
β”œβ”€β”€ .github/workflows/build.yml   # Windows + Linux + macOS CI/release
└── CMakeLists.txt

🧩 Building Without FFmpeg

FFmpeg is optional. Without it, only .wav input is supported, but the build has zero external dependencies:

cmake -B build -S . -DUSE_FFMPEG=OFF

πŸ“ˆ Project Activity

Star history

Metric Badge
⭐ Stars Stars
🍴 Forks Forks
πŸ“₯ Total downloads Downloads
πŸ“₯ Latest release Latest

πŸ“Š Detailed download counts per release and platform are available on the Releases page.


🀝 Contributing

Issues and pull requests are welcome. Please run the existing build matrix locally (or check the GitHub Actions results on your PR) before requesting review.


πŸ“œ License

MIT β€” see LICENSE.


πŸ‘€ Author

SyntX

About

A high-performance C++ subtitle generator built on whisper.cpp, with automatic GPU acceleration and CPU fallback, multi-format audio/video input via FFmpeg, and built-in translation to English.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors