A high-performance C++ subtitle generator built on whisper.cpp, with automatic GPU acceleration and CPU fallback, multi-format audio/video input via FFmpeg, and built-in translation to English.
- π GPU-accelerated, CPU-safe β uses CUDA on Windows/Linux or Metal on macOS automatically when available, and falls back to a SIMD-optimized CPU path with no extra setup if it isn't.
- π― Accurate timing β segment-level timestamps with confidence scores per segment.
- π 96-language transcription + translation β transcribe in the spoken language, or translate directly into English (see Language & Translation).
- π Multiple output formats β SRT, WebVTT, ASS (Advanced SubStation Alpha), JSON, and plain TXT.
- ποΈ Multi-format input via FFmpeg β WAV natively, plus MP3/MP4/MKV/FLAC/OGG/AAC/MOV/WEBM/and more when built with FFmpeg.
- βοΈ Configurable β adjustable chunking, segment merging/splitting, confidence filtering, filler-word removal.
- π₯οΈ Live hardware + backend report β prints CPU, core count, RAM, and the active compute backend (CUDA / Metal / CPU) on every run.
- βΉοΈ Graceful Ctrl+C handling β stop at any time; subtitles produced so far are still saved.
- π Performance stats β real-time factor, average confidence, average segment length.
| Windows | Linux | macOS | |
|---|---|---|---|
| Compiler | MSVC 2019+ | GCC 7+ / Clang 6+ | Clang (Xcode CLT) |
| CMake | 3.15+ | 3.15+ | 3.15+ |
| GPU backend | CUDA Toolkit (optional) | CUDA Toolkit (optional) | Metal (built into the OS) |
| FFmpeg (optional, for non-WAV input) | via vcpkg | apt install libavformat-dev ... |
brew install ffmpeg |
GPU acceleration is detected at build time. If the CUDA toolkit (or Metal on macOS) is present when you run CMake, the binary is built with GPU support; otherwise it falls back to CPU. To enable GPU acceleration, install the CUDA toolkit from nvidia.com and rebuild.
git clone --recursive https://github.com/SyntX34/subtitle-generator.git
cd subtitle-generatorAlready cloned without --recursive?
git submodule update --init --recursive# Linux / macOS
./scripts/download_model.sh base.en
# Windows (PowerShell)
.\scripts\download_model.ps1 -Model base.en| Model | Size | Notes |
|---|---|---|
tiny.en |
~75 MB | Fastest, least accurate |
base.en |
~145 MB | Good default for English |
small.en |
~465 MB | Better accuracy, still fast |
medium.en |
~1.5 GB | High accuracy, slower |
large-v3 |
~3 GB | Best accuracy, multilingual, needs a GPU for real-time use |
Drop the .en suffix (e.g. base, small) for the multilingual variants if you need non-English transcription or translation.
cmake -B build -S .
cmake --build build --config ReleaseCMake auto-detects your platform's GPU backend. To force a CPU-only build (e.g. for a smaller, dependency-free binary):
cmake -B build -S . -DFORCE_CPU=ON./build/SubtitleTool audio.wav -m models/ggml-base.en.bin -o subtitles.srtSubtitleTool <audio_or_video_file> [OPTIONS]
MODEL
-m, --model <path> Whisper model (default: models/ggml-base.en.bin)
OUTPUT
-o, --output <path> Output file (default: <input>.<format>)
-f, --format <fmt> srt | vtt | ass | json | txt (default: srt)
--all-formats Save all formats at once
LANGUAGE & TRANSLATION
-l, --lang <code> Source language spoken in the audio (default: auto)
--translate Translate the transcription into English
--list-languages Print every supported language code and exit
PROCESSING
-t, --threads <n> Worker threads (default: 4)
--chunk <secs> Audio chunk size in seconds (default: 30)
--min-conf <0-1> Drop segments below this confidence
--no-filler Remove filler words (um, uh, er, ...)
MISC
-s, --stream Print subtitles live as they're generated
-v, --verbose Verbose whisper output
-h, --help Show the full help message
SubtitleTool separates what language is spoken from what language you want the output in:
| Flag | Meaning |
|---|---|
-l <code> |
The language spoken in the audio. Use auto to let whisper detect it, or set it explicitly if you already know it β this is faster and more accurate than auto-detection. |
--translate |
Translate the transcription into English. whisper.cpp can only translate into English β it can't translate directly between two non-English languages. |
Example β Spanish video, English subtitles:
SubtitleTool pelicula.mp4 -l es --translate -o subtitles_en.srtExample β Japanese video, Japanese subtitles (no translation):
SubtitleTool anime.mkv -l ja -o subtitles_ja.srtExample β unknown language, transcribed in its original language:
SubtitleTool interview.wav -l auto -o subtitles.srtTo translate from a language that isn't English into a third language (e.g. Spanish β French), run SubtitleTool once to get an English transcript or translation, then run that text through a text translation tool β whisper.cpp's translation path is English-only by design.
See every supported language code:
SubtitleTool --list-languagessubtitle-generator/
βββ src/
β βββ main.cpp # CLI entry point, hardware/banner reporting
β βββ SubtitleGenerator.{hpp,cpp}
β βββ AudioDecoder.{hpp,cpp} # WAV (built-in) + FFmpeg (optional)
β βββ TimestampFormatter.{hpp,cpp}
β βββ dl_windows.cpp # Windows dynamic-library helpers
βββ scripts/
β βββ download_model.sh
β βββ download_model.ps1
βββ third_party/
β βββ whisper.cpp/ # git submodule
βββ .github/workflows/build.yml # Windows + Linux + macOS CI/release
βββ CMakeLists.txt
FFmpeg is optional. Without it, only .wav input is supported, but the build has zero external dependencies:
cmake -B build -S . -DUSE_FFMPEG=OFF| Metric | Badge |
|---|---|
| β Stars | |
| π΄ Forks | |
| π₯ Total downloads | |
| π₯ Latest release |
π Detailed download counts per release and platform are available on the Releases page.
Issues and pull requests are welcome. Please run the existing build matrix locally (or check the GitHub Actions results on your PR) before requesting review.
MIT β see LICENSE.
SyntX
- GitHub: github.com/SyntX34
- Steam: steamcommunity.com/id/SyntX34
- Discord:
nh_syntx - Discord server: discord.novazombie.com
- Instagram: instagram.com/dfa.nh_syntx