Self-hosted REST API for removing ad segments from audio and video files using Fireworks or Mistral transcription, OpenAI Responses, FFmpeg, Hono, Bun, and Postgres.
POST /processaccepts an uploaded audio or video file.- The API computes a SHA-256 hash of the uploaded file.
- If that hash already exists in Postgres, the API reuses the saved ad timestamps and skips transcription plus LLM extraction.
- Otherwise, the configured transcription provider transcribes the file.
- OpenAI extracts ad segments from the transcript.
- The API matches those phrases back onto transcript timestamps and stores the full transcription plus the exact ad timestamps in Postgres.
- FFmpeg removes the matching ranges and returns the edited file as a download.
The app supports two transcription providers. Fireworks uses whisper-v3-turbo. Mistral uses voxtral-mini-latest, which the Mistral docs describe as the transcription-only service on the audio transcription endpoint and currently map to voxtral-mini-2602 for transcription.
- Bun 1.3+
- FFmpeg
- Postgres
OPENAI_API_KEYTRANSCRIPTION_PROVIDERFIREWORKS_API_KEYwhenTRANSCRIPTION_PROVIDER=fireworksMISTRAL_API_KEYwhenTRANSCRIPTION_PROVIDER=mistral
Copy the example env file:
cp .env.example .envSet the required values in .env:
OPENAI_API_KEYTRANSCRIPTION_PROVIDERFIREWORKS_API_KEYorMISTRAL_API_KEYDATABASE_URL
Install dependencies:
bun installRun the API in watch mode:
bun run devRun the API without hot reload:
bun run startThe API listens on http://localhost:7070 by default.
Create an API key before calling the API:
bun run create-api-key "local client"If the API is running in Docker, you can also create a key from inside the container:
docker compose exec api bun run create-api-key "local client"The command prints a key in this format:
abcd1234_secret
The abcd1234 part is a random public identifier. The full key is only shown once and is stored hashed in the same Postgres database as the rest of the app.
The compose file only starts the Bun/Hono API. It does not provision Postgres.
Your .env must contain a full external DATABASE_URL, and the container will use that exact connection string.
Start the API container:
docker compose up -d --buildIf your external database is running on your host machine, localhost inside the container will not point to the host. In that case, use a host-reachable address in DATABASE_URL such as host.docker.internal where appropriate for your setup.
Returns a health summary for the service and important dependencies:
curl http://localhost:7070/healthThe response includes:
- overall service status
- Postgres connectivity
- FFmpeg availability
- whether
OPENAI_API_KEYis configured - which transcription provider is active and whether its API key is configured
- rate limit configuration status
Send multipart form data with an audio or video file under the file field:
curl -H "Authorization: Bearer YOUR_API_KEY" -F "file=@audio.mp3" -OJ http://localhost:7070/processVideo files work too:
curl -H "Authorization: Bearer YOUR_API_KEY" -F "file=@video.mp4" -OJ http://localhost:7070/processThe API extracts the audio track for transcription and ad detection, then trims those same time ranges from the original audio or video file. The response is the edited file as an attachment, with [trimmed] inserted before the extension.
Returns processing history as JSON:
curl -H "Authorization: Bearer YOUR_API_KEY" http://localhost:7070/historyDeletes one history row:
curl -H "Authorization: Bearer YOUR_API_KEY" -X DELETE http://localhost:7070/history/1API keys are stored in the same Postgres database as history, in an api_keys table.
You can manage API keys either from your host machine or from inside the running Docker container. In both cases the commands use the same Postgres database configured by DATABASE_URL.
Create a key from the host:
bun run create-api-key "my client"Create a key from the container:
docker compose exec api bun run create-api-key "my client"List keys from the host:
bun run list-api-keysList keys from the container:
docker compose exec api bun run list-api-keysRevoke a key by its public identifier from the host:
bun run revoke-api-key abcd1234Revoke a key by its public identifier from the container:
docker compose exec api bun run revoke-api-key abcd1234Rotate a key from the host:
bun run rotate-api-key abcd1234Rotate a key from the container:
docker compose exec api bun run rotate-api-key abcd1234See .env.example for the full list. The main variables are:
OPENAI_API_KEYTRANSCRIPTION_PROVIDERFIREWORKS_API_KEYMISTRAL_API_KEYOPENAI_MODELREASONING_EFFORTDATABASE_URLMAX_REQUEST_BODY_SIZE_MBRATE_LIMIT_ENABLEDRATE_LIMIT_WINDOW_SECONDSRATE_LIMIT_MAX_REQUESTSFFMPEG_TIMEOUT_MSFASTER_FFMPEG_ENABLEDPORT
The API uses a simple per-key in-memory rate limiter.
The limiter runs after API key authentication and counts requests separately for each API key. If a key goes over the limit, the API returns 429 Too Many Requests.
Configure it in .env:
RATE_LIMIT_ENABLED=true
RATE_LIMIT_WINDOW_SECONDS=60
RATE_LIMIT_MAX_REQUESTS=1This example means each API key can make up to 1 request per 60-second window.
The limiter is stored in memory inside the running app process. If you restart the app, the counters reset. If you run multiple app containers, each container keeps its own counters.
- The app automatically recreates the
historytable if its schema does not match the expected columns. - The app also automatically recreates the
api_keystable if its schema does not match the expected columns. - Cached entries are keyed by a SHA-256 file hash.
- History and cache entries are isolated per API key.
- Audio and video uploads are both supported on
/process. Video uploads are transcribed from their audio track and then trimmed as full video files. /healthis public and returns200when the service is healthy or503when an important dependency check fails.TRANSCRIPTION_PROVIDER=fireworksuseswhisper-v3-turboand supports timestamp-based trimming.TRANSCRIPTION_PROVIDER=mistralusesvoxtral-mini-latestonPOST /v1/audio/transcriptionsand requeststimestamp_granularities=["word"]so the app can match transcript phrases back to audio timestamps.FASTER_FFMPEG_ENABLED=trueuses a faster FFmpeg stream-copy path, which is less precise at cut boundaries than the fallback precise trim mode (settings the variable tofalseuses filter-based re-encoding)- No db migration scripts are used.
GET /historyis intentionally excluded from request access logs.- The app avoids logging full Fireworks/OpenAI payloads such as complete transcripts to avoid cluttering logs with huge transcripts.
This project is licensed under the MIT License. See LICENSE.