I wanted a way to grade class participation fairly. Professors do it from memory, which biases toward whoever talks loudest. If you enroll each student against a 30–45 s voice clip, then feed in a lecture recording, you can in theory get a per-student speaking-time breakdown instead of a vibes-based grade.
This repo gets you most of the way there. It wraps the Azure Speaker Recognition text-independent API: upload enrollment audio, upload a verification clip, get back which enrolled voices are present and at what confidence. The piece that's not done yet is time-aligned segmentation (splitting the lecture into short chunks so you can sum per-speaker minutes). Easy to add on top of what's here.
No voice samples shipped. Biometric data shouldn't live in a public repo. Drop your own enrollment clips into audio/enrollment/, verification clips into audio/verification/ (both gitignored), and you can run it.
audio/enrollment/*.m4a audio/verification/*.wav
│ │
▼ ▼
convert (pydub → 16 kHz mono wav) convert
│ │
▼ │
enroll ── POST profile ──┐ │
enroll ── upload wav ────┤ │
Azure │
│ │
▼ ▼
profiles.csv ──► verify
│
▼
verification_results.csv
(name, score, audio_length)
- enroll creates one Azure profile per person, uploads their clip, and saves
(name, profile_id, status, audio_length_s, speech_length_s)todata/profiles.csv. - verify takes a folder of clips, queries each enrolled profile for a match, and keeps only
Acceptresults with a score above Azure's threshold.
Requires Python 3.10+ and ffmpeg on your path (pydub depends on it).
git clone https://github.com/tomvlt1/speaker-verification.git
cd speaker-verification
python3 -m venv venv
source venv/bin/activate
pip install -e .
cp .env.example .env # then fill in your Azure key + regionGet an Azure Speech key: portal.azure.com → create a Speech resource → Keys and Endpoint. The free tier is fine for testing.
Drop enrollment audio (one file per speaker) into audio/enrollment/, verification audio into audio/verification/. Filename stem becomes the speaker name (trailing _p is stripped).
# 1. normalize m4a/mp3/... → 16 kHz mono wav, then write an enrollment csv
python -m speaker_verification convert \
--audio-dir audio/enrollment \
--build-csv enrollment_data.csv
# 2. enroll each speaker against Azure
python -m speaker_verification enroll
# 3. verify a folder of new recordings against the enrolled profiles
python -m speaker_verification convert --audio-dir audio/verification
python -m speaker_verification verifyResults land in data/verification_results.csv.
# nuke all enrolled profiles on the Azure subscription (with a confirmation prompt)
python -m speaker_verification reset
python -m speaker_verification reset --yes # skip promptdata/profiles.csv:
| name | profile_id | status | audio_length_s | speech_length_s |
|---|---|---|---|---|
| alice | 5e2a… | Enrolled | 37.9 | 34.9 |
data/verification_results.csv:
| audio_file | audio_length_s | name | profile_id | score |
|---|---|---|---|---|
| lecture_01.wav | 823.4 | alice | 5e2a… | 0.87 |
| lecture_01.wav | 823.4 | bob | 88e7… | 0.77 |
Multiple accept rows per file are possible. Azure verifies independently against each profile, so in a lecture with four enrolled students you can get four matches on the same clip.
- Azure subscription required. This is a thin client; the actual model is Microsoft's. Free tier is rate-limited (20 transactions per second, 10,000 per month at time of writing).
- Text-independent verification needs ≥20 s of speech per enrollment for reliable scores. Azure returns
Enrollingrather thanEnrolledif you're below that. - No per-speaker speaking time yet.
verifytells you which enrolled speakers appear in a clip but not for how long. To get that, you'd need to split the input into short segments and runverifyon each (easy to add, happy to accept PRs). - Binary Accept/Reject. Azure returns a confidence score 0–1, but there's no timestamp alignment; it's a global judgment on the whole clip.
- Privacy. Enrollment audio contains biometric data. Don't commit it; the
.gitignorehere already protectsaudio/**/*.
src/speaker_verification/
__main__.py # argparse dispatcher
api.py # single HTTP helper for Azure calls
config.py # env vars (.env loader) + project paths
convert.py # ffmpeg → 16 kHz mono wav + enrollment csv builder
enroll.py # create profiles + upload enrollment clips
verify.py # verify unknown audio against all profiles
reset.py # delete every profile on this subscription
audio/
enrollment/ # your clips go here (gitignored)
verification/ # your clips go here (gitignored)
data/ # generated csvs land here (gitignored)