Who Spoke When

title	Who Spoke When
emoji	🎙️
colorFrom	blue
colorTo	indigo
sdk	docker
app_file	app/main.py
pinned	false

Who Spoke When

Speaker diarization service and web app: upload audio and get who spoke when segments.

The project now runs with a hybrid pipeline:

Preferred: pyannote/speaker-diarization-3.1 (best quality)
Fallback: VAD + ECAPA-TDNN embeddings + agglomerative clustering

What You Get

FastAPI backend (/diarize, /diarize/url, /health)
Web UI (/) for file upload and timeline view
CLI demo (demo.py)
Automatic fallback if pyannote models are unavailable

Project Structure

app/
  main.py         FastAPI app and endpoints
  pipeline.py     Hybrid diarization pipeline
models/
  embedder.py     ECAPA-TDNN embedding extractor
  clusterer.py    Speaker clustering logic
utils/
  audio.py        Audio and export helpers
static/
  index.html      Web UI
Dockerfile
requirements.txt
README.md

Quick Start (Local)

1. Create and activate a virtual environment

Windows PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Linux/macOS:

python -m venv .venv
source .venv/bin/activate

2. Install dependencies

pip install -r requirements.txt

3. (Recommended) Set Hugging Face token

pyannote models are gated. Create a token at https://huggingface.co/settings/tokens.

Windows PowerShell:

$env:HF_TOKEN="your_token_here"

Linux/macOS:

export HF_TOKEN="your_token_here"

4. Run API server

uvicorn app.main:app --host 0.0.0.0 --port 8000

Open:

UI: http://localhost:8000
API docs: http://localhost:8000/docs

Web UI Notes

The UI now defaults to same-origin API (/diarize), so it works on Hugging Face Spaces.
If you manually set a custom endpoint, ensure it allows CORS and is reachable from browser.

Hugging Face Spaces Deployment

Requirements

Space created (Docker SDK)
Space secret HF_TOKEN configured
Terms accepted for:
- https://huggingface.co/pyannote/voice-activity-detection
- https://huggingface.co/pyannote/speaker-diarization-3.1

Push code

Push main branch to your Space repo remote:

git push huggingface main

If push fails with unauthorized:

Use a token with Write role (not Read)
Confirm token owner has access to the target namespace

API

`GET /health`

Returns service health and device.

`POST /diarize`

Upload an audio file.

Form fields:

file: audio file
num_speakers (optional): force known number of speakers

Example:

curl -X POST http://localhost:8000/diarize \
  -F "file=@meeting.mp3" \
  -F "num_speakers=2"

`POST /diarize/url`

Diarize audio from a remote URL.

Example:

curl -X POST "http://localhost:8000/diarize/url?audio_url=https://example.com/sample.wav"

CLI Usage

python demo.py --audio meeting.wav
python demo.py --audio meeting.wav --speakers 2
python demo.py --audio meeting.wav --output result.json --rttm result.rttm --srt result.srt

Configuration (Environment Variables)

Variable	Default	Description
`HF_TOKEN`	unset	Hugging Face token for gated pyannote models
`CACHE_DIR`	temp model cache path	Model download/cache directory
`USE_PYANNOTE_DIARIZATION`	`true`	Enable full pyannote diarization first
`PYANNOTE_DIARIZATION_MODEL`	`pyannote/speaker-diarization-3.1`	pyannote diarization model id

How the Pipeline Works

Load and normalize audio
Try full pyannote diarization (best quality)
If unavailable/fails, fallback to:
- VAD (pyannote VAD or energy VAD)
- Sliding windows
- ECAPA embeddings
- Agglomerative clustering
Merge adjacent same-speaker segments

Troubleshooting

1) UI shows `Error: Failed to fetch`

Likely wrong API endpoint. Use same-origin /diarize in deployed UI.

2) Logs show pyannote download/auth warnings

You need:

valid HF_TOKEN
accepted model terms on both pyannote model pages

3) Poor speaker separation

Provide num_speakers when known
Ensure clean audio (minimal background noise)
Prefer pyannote path (set token + accept terms)

4) `500` during embedding load

This is usually model download/cache/auth mismatch. Confirm HF_TOKEN, cache path write access, and internet connectivity.

Limitations

Overlapped speech may still be imperfect in fallback mode
Quality depends on audio clarity, language mix, and noise
Very short utterances are harder to classify reliably

License

Add your preferred license file (LICENSE) if this project is public.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Who Spoke When

What You Get

Project Structure

Quick Start (Local)

1. Create and activate a virtual environment

2. Install dependencies

3. (Recommended) Set Hugging Face token

4. Run API server

Web UI Notes

Hugging Face Spaces Deployment

Requirements

Push code

API

`GET /health`

`POST /diarize`

`POST /diarize/url`

CLI Usage

Configuration (Environment Variables)

How the Pipeline Works

Troubleshooting

1) UI shows `Error: Failed to fetch`

2) Logs show pyannote download/auth warnings

3) Poor speaker separation

4) `500` during embedding load

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
app		app
models		models
static		static
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
demo.py		demo.py
deploy_hf.py		deploy_hf.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Who Spoke When

What You Get

Project Structure

Quick Start (Local)

1. Create and activate a virtual environment

2. Install dependencies

3. (Recommended) Set Hugging Face token

4. Run API server

Web UI Notes

Hugging Face Spaces Deployment

Requirements

Push code

API

GET /health

POST /diarize

POST /diarize/url

CLI Usage

Configuration (Environment Variables)

How the Pipeline Works

Troubleshooting

1) UI shows Error: Failed to fetch

2) Logs show pyannote download/auth warnings

3) Poor speaker separation

4) 500 during embedding load

Limitations

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /health`

`POST /diarize`

`POST /diarize/url`

1) UI shows `Error: Failed to fetch`

4) `500` during embedding load

Packages