Super Captions

A privacy-preserving web app that auto-captions videos:

The user picks a video locally.
Audio is extracted in the browser with ffmpeg.wasm — the video itself never leaves the device.
The audio is sent to 榛果繽紛樂's OpenAI-compatible Whisper Gateway (e.g. whisper-gateway:5000).
The user previews the result side-by-side: video on the left, per-segment captions on the right, fully synchronised with the timeline.
Captions can be edited per-segment, with optional speaker diarisation and per-speaker styling (background, text border, text fill, font, size, etc.).
On export, captions are burned back into the video via Canvas + MediaRecorder, again entirely in-browser.

Built with Next.js 15 (App Router), TypeScript, Tailwind, shadcn/ui, zustand, and ffmpeg.wasm.

Local development

npm install
npm run dev

Set WHISPER_GATEWAY_URL to a reachable gateway — from the host the gateway is on port 5148, see .env.example.

Docker

Image is built on a pinned oven/bun:1.3.13-alpine — bump deliberately, never :latest. The container joins an existing infra-net network and resolves the gateway at whisper-gateway:5000:

docker compose up -d --build

The Dockerfile uses Bun for install, build, and runtime (bun run server.js on Next.js standalone output). The lockfile is bun.lock; package-lock.json is kept for local npm workflows.

Architecture notes

/api/transcribe is a thin server-side proxy that forwards the multipart audio upload to ${WHISPER_GATEWAY_URL}/v1/audio/transcriptions. It exists so the browser never needs to talk to the gateway directly (and so CORS doesn't bite).
lib/ffmpeg-client.ts loads @ffmpeg/core from a CDN and converts the user's video to a 16 kHz mono MP3 entirely on the client. The Next config sets the COOP/COEP headers required for SharedArrayBuffer.
lib/caption-render.ts is the single source of truth for how a caption is drawn — both the live preview overlay and the export pipeline call into it, so what you see is what you get.
lib/export-video.ts renders frames from the <video> element to an offscreen canvas at 30 fps, captures the canvas via captureStream(), and merges the original audio track. The result is a video/webm file you can download.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.claude		.claude
app		app
components		components
lib		lib
public		public
store		store
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
components.json		components.json
docker-compose.yml		docker-compose.yml
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Super Captions

Local development

Docker

Architecture notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Super Captions

Local development

Docker

Architecture notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages