A privacy-preserving web app that auto-captions videos:
- The user picks a video locally.
- Audio is extracted in the browser with
ffmpeg.wasm— the video itself never leaves the device. - The audio is sent to 榛果繽紛樂's OpenAI-compatible Whisper Gateway (e.g.
whisper-gateway:5000). - The user previews the result side-by-side: video on the left, per-segment captions on the right, fully synchronised with the timeline.
- Captions can be edited per-segment, with optional speaker diarisation and per-speaker styling (background, text border, text fill, font, size, etc.).
- On export, captions are burned back into the video via Canvas + MediaRecorder, again entirely in-browser.
Built with Next.js 15 (App Router), TypeScript, Tailwind, shadcn/ui, zustand, and ffmpeg.wasm.
npm install
npm run devSet WHISPER_GATEWAY_URL to a reachable gateway — from the host the gateway is on port 5148, see .env.example.
Image is built on a pinned oven/bun:1.3.13-alpine — bump deliberately, never :latest. The container joins an existing infra-net network and resolves the gateway at whisper-gateway:5000:
docker compose up -d --buildThe Dockerfile uses Bun for install, build, and runtime (bun run server.js on Next.js standalone output). The lockfile is bun.lock; package-lock.json is kept for local npm workflows.
/api/transcribeis a thin server-side proxy that forwards the multipart audio upload to${WHISPER_GATEWAY_URL}/v1/audio/transcriptions. It exists so the browser never needs to talk to the gateway directly (and so CORS doesn't bite).lib/ffmpeg-client.tsloads@ffmpeg/corefrom a CDN and converts the user's video to a 16 kHz mono MP3 entirely on the client. The Next config sets the COOP/COEP headers required forSharedArrayBuffer.lib/caption-render.tsis the single source of truth for how a caption is drawn — both the live preview overlay and the export pipeline call into it, so what you see is what you get.lib/export-video.tsrenders frames from the<video>element to an offscreen canvas at 30 fps, captures the canvas viacaptureStream(), and merges the original audio track. The result is avideo/webmfile you can download.