AssemblyAI Transcription Studio

Full-featured web application for audio/video transcription powered by AssemblyAI Universal-2 model. Supports 99 languages, speaker diarization, PII redaction, AI analysis via Google Gemini, and exports to 36+ formats.

Features

Transcription

AssemblyAI Universal-2 model — 99 languages with automatic detection
Speaker diarization (up to 10 speakers, multichannel support)
PII redaction (50+ entity types) with audio and text methods
Code-switching for multilingual audio
Custom vocabulary and formatting options

Input Sources

File upload (drag-and-drop) — MP3, WAV, MP4, MKV, AVI, MOV, and more
YouTube and 1000+ sites via yt-dlp
Direct audio URL
Automatic video-to-audio conversion via FFmpeg

AI Analysis (Google Gemini)

Auto-generated titles and summaries
Topic extraction and sentiment analysis
Key moments and speaker identification
Language-aware responses

Export Formats (36+)

Text: Standard, verbatim, bilingual, literary, paragraphs
Documents: DOCX (standard, verbatim, bilingual, literary, interview)
Tables: DOCX and PDF with timestamps and speaker labels
PDF: Full Unicode/Cyrillic support via DejaVu Sans
Subtitles: SRT and VTT with translation support
Data: CSV (sentence-level and word-level with confidence scores)
ZIP: Bundle all formats in a single download

Translation

80+ target languages
Formal and informal styles
Bilingual export (original + translation side by side)

Tech Stack

Layer	Technology
Backend	Python 3.11+, FastAPI, Uvicorn
APIs	AssemblyAI REST API (direct HTTP), Google Gemini, yt-dlp
Documents	python-docx, fpdf2 (PDF with Unicode)
Media	FFmpeg, yt-dlp
Frontend	Vanilla JavaScript, HTML5, CSS3 (no build tools)
Deployment	Docker, docker-compose

Quick Start

With Docker (recommended)

# Clone the repo
git clone https://github.com/Trust-1-eng/transcription-studio.git
cd transcription-studio

# Set API keys
echo "ASSEMBLYAI_API_KEY=your_key_here" > .env
echo "GEMINI_API_KEY=your_key_here" >> .env

# Run
docker compose up --build

Open http://localhost:8000

Manual Setup

# Install system dependencies (macOS)
brew install ffmpeg
pip install yt-dlp

# Install Python packages
pip install -r requirements.txt

# Set environment variables
export ASSEMBLYAI_API_KEY=your_key_here
export GEMINI_API_KEY=your_key_here  # optional

# Run
python main.py

Architecture

app/
├── config.py              # API keys, paths, constants
├── dependencies.py        # Shared state: caches, locks
├── assemblyai_client.py   # AssemblyAI REST API client (no SDK)
├── gemini_service.py      # Gemini API with caching and retries
├── utils.py               # Font download, audio helpers, formatting
├── routes/
│   ├── core.py            # Upload, transcribe, poll, gemini endpoints
│   ├── export.py          # 40+ export endpoints + ZIP bundle
│   └── alignment.py       # Subtitle alignment for edited transcripts
└── exporters/
    ├── text.py            # Plain text format generators
    ├── document.py        # DOCX generators (6 styles)
    ├── table.py           # Table exports (DOCX + PDF)
    ├── pdf.py             # PDF with Unicode font support
    └── subtitles.py       # SRT/VTT generators

Design decisions:

Direct HTTP to AssemblyAI API (no SDK) for full parameter control
In-memory caching with TTL (5 min transcripts, 10 min Gemini)
Thread-safe state management with locks
Automatic temp file cleanup (2-hour retention)
Font auto-download on first run

Environment Variables

Variable	Required	Description
`ASSEMBLYAI_API_KEY`	Yes	AssemblyAI API key
`GEMINI_API_KEY`	No	Google Gemini API key (for AI analysis)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
docs		docs
features/alignment		features/alignment
static		static
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
GEMINI.md		GEMINI.md
LICENSE		LICENSE
MANIFEST.md		MANIFEST.md
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
package.json		package.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AssemblyAI Transcription Studio

Features

Tech Stack

Quick Start

With Docker (recommended)

Manual Setup

Architecture

Environment Variables

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AssemblyAI Transcription Studio

Features

Tech Stack

Quick Start

With Docker (recommended)

Manual Setup

Architecture

Environment Variables

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages