🎬 SubtitleGenerator

A high-performance C++ subtitle generator built on whisper.cpp, with automatic GPU acceleration and CPU fallback, multi-format audio/video input via FFmpeg, and built-in translation to English.

✨ Features

🚀 GPU-accelerated, CPU-safe — uses CUDA on Windows/Linux or Metal on macOS automatically when available, and falls back to a SIMD-optimized CPU path with no extra setup if it isn't.
🎯 Accurate timing — segment-level timestamps with confidence scores per segment.
🌍 96-language transcription + translation — transcribe in the spoken language, or translate directly into English (see Language & Translation).
📝 Multiple output formats — SRT, WebVTT, ASS (Advanced SubStation Alpha), JSON, and plain TXT.
🎞️ Multi-format input via FFmpeg — WAV natively, plus MP3/MP4/MKV/FLAC/OGG/AAC/MOV/WEBM/and more when built with FFmpeg.
⚙️ Configurable — adjustable chunking, segment merging/splitting, confidence filtering, filler-word removal.
🖥️ Live hardware + backend report — prints CPU, core count, RAM, and the active compute backend (CUDA / Metal / CPU) on every run.
⏹️ Graceful Ctrl+C handling — stop at any time; subtitles produced so far are still saved.
📊 Performance stats — real-time factor, average confidence, average segment length.

📦 Requirements

	Windows	Linux	macOS
Compiler	MSVC 2019+	GCC 7+ / Clang 6+	Clang (Xcode CLT)
CMake	3.15+	3.15+	3.15+
GPU backend	CUDA Toolkit (optional)	CUDA Toolkit (optional)	Metal (built into the OS)
FFmpeg (optional, for non-WAV input)	via vcpkg	`apt install libavformat-dev ...`	`brew install ffmpeg`

GPU acceleration is detected at build time. If the CUDA toolkit (or Metal on macOS) is present when you run CMake, the binary is built with GPU support; otherwise it falls back to CPU. To enable GPU acceleration, install the CUDA toolkit from nvidia.com and rebuild.

🚀 Quick Start

1. Clone with submodules

git clone --recursive https://github.com/SyntX34/subtitle-generator.git
cd subtitle-generator

Already cloned without --recursive?

git submodule update --init --recursive

2. Download a model

# Linux / macOS
./scripts/download_model.sh base.en

# Windows (PowerShell)
.\scripts\download_model.ps1 -Model base.en

Model	Size	Notes
`tiny.en`	~75 MB	Fastest, least accurate
`base.en`	~145 MB	Good default for English
`small.en`	~465 MB	Better accuracy, still fast
`medium.en`	~1.5 GB	High accuracy, slower
`large-v3`	~3 GB	Best accuracy, multilingual, needs a GPU for real-time use

Drop the .en suffix (e.g. base, small) for the multilingual variants if you need non-English transcription or translation.

3. Build

cmake -B build -S .
cmake --build build --config Release

CMake auto-detects your platform's GPU backend. To force a CPU-only build (e.g. for a smaller, dependency-free binary):

cmake -B build -S . -DFORCE_CPU=ON

4. Run

./build/SubtitleTool audio.wav -m models/ggml-base.en.bin -o subtitles.srt

🛠️ Usage

SubtitleTool <audio_or_video_file> [OPTIONS]

MODEL
  -m, --model <path>      Whisper model (default: models/ggml-base.en.bin)

OUTPUT
  -o, --output <path>     Output file (default: <input>.<format>)
  -f, --format <fmt>      srt | vtt | ass | json | txt  (default: srt)
      --all-formats       Save all formats at once

LANGUAGE & TRANSLATION
  -l, --lang <code>       Source language spoken in the audio (default: auto)
      --translate         Translate the transcription into English
      --list-languages    Print every supported language code and exit

PROCESSING
  -t, --threads <n>       Worker threads (default: 4)
      --chunk <secs>      Audio chunk size in seconds (default: 30)
      --min-conf <0-1>    Drop segments below this confidence
      --no-filler         Remove filler words (um, uh, er, ...)

MISC
  -s, --stream            Print subtitles live as they're generated
  -v, --verbose           Verbose whisper output
  -h, --help              Show the full help message

🌍 Language & Translation

SubtitleTool separates what language is spoken from what language you want the output in:

Flag	Meaning
`-l <code>`	The language spoken in the audio. Use `auto` to let whisper detect it, or set it explicitly if you already know it — this is faster and more accurate than auto-detection.
`--translate`	Translate the transcription into English. whisper.cpp can only translate into English — it can't translate directly between two non-English languages.

Example — Spanish video, English subtitles:

SubtitleTool pelicula.mp4 -l es --translate -o subtitles_en.srt

Example — Japanese video, Japanese subtitles (no translation):

SubtitleTool anime.mkv -l ja -o subtitles_ja.srt

Example — unknown language, transcribed in its original language:

SubtitleTool interview.wav -l auto -o subtitles.srt

To translate from a language that isn't English into a third language (e.g. Spanish → French), run SubtitleTool once to get an English transcript or translation, then run that text through a text translation tool — whisper.cpp's translation path is English-only by design.

See every supported language code:

SubtitleTool --list-languages

📂 Project Layout

subtitle-generator/
├── src/
│   ├── main.cpp                 # CLI entry point, hardware/banner reporting
│   ├── SubtitleGenerator.{hpp,cpp}
│   ├── AudioDecoder.{hpp,cpp}    # WAV (built-in) + FFmpeg (optional)
│   ├── TimestampFormatter.{hpp,cpp}
│   └── dl_windows.cpp            # Windows dynamic-library helpers
├── scripts/
│   ├── download_model.sh
│   └── download_model.ps1
├── third_party/
│   └── whisper.cpp/              # git submodule
├── .github/workflows/build.yml   # Windows + Linux + macOS CI/release
└── CMakeLists.txt

🧩 Building Without FFmpeg

FFmpeg is optional. Without it, only .wav input is supported, but the build has zero external dependencies:

cmake -B build -S . -DUSE_FFMPEG=OFF

📈 Project Activity

Metric	Badge
⭐ Stars
🍴 Forks
📥 Total downloads
📥 Latest release

📊 Detailed download counts per release and platform are available on the Releases page.

🤝 Contributing

Issues and pull requests are welcome. Please run the existing build matrix locally (or check the GitHub Actions results on your PR) before requesting review.

📜 License

MIT — see LICENSE.

👤 Author

SyntX

GitHub: github.com/SyntX34
Steam: steamcommunity.com/id/SyntX34
Discord: nh_syntx
Discord server: discord.novazombie.com
Instagram: instagram.com/dfa.nh_syntx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 SubtitleGenerator

✨ Features

📦 Requirements

🚀 Quick Start

1. Clone with submodules

2. Download a model

3. Build

4. Run

🛠️ Usage

🌍 Language & Translation

📂 Project Layout

🧩 Building Without FFmpeg

📈 Project Activity

🤝 Contributing

📜 License

👤 Author

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
scripts		scripts
src		src
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🎬 SubtitleGenerator

✨ Features

📦 Requirements

🚀 Quick Start

1. Clone with submodules

2. Download a model

3. Build

4. Run

🛠️ Usage

🌍 Language & Translation

📂 Project Layout

🧩 Building Without FFmpeg

📈 Project Activity

🤝 Contributing

📜 License

👤 Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages