Spec: speaker diarization on meeting Others track by sebsto · Pull Request #64 · sebsto/wispr

sebsto · 2026-05-29T13:34:31Z

Summary

Adds a design spec to extend the v1.10.0 meeting transcription feature with real speaker diarization on the system-audio ("Others") track only, so the live transcript shows distinct labels (Speaker 1, Speaker 2, …) for remote participants instead of a single "Others" badge.

Engine: FluidAudio's SortformerDiarizer (already a dependency at 0.13.4 — streaming, runs on ANE, ~11% DER, up to 4 speaker slots).
Scope: only the system-audio track. Mic stays "You" — preserves the privacy-friendly source-based split for your own audio.
Speaker count: auto-detect, no UI input.
Timing: real-time (matches existing live-transcript UX).

The spec follows the existing .kiro/specs/<feature>/design.md convention used by meeting-transcription, hands-free-dictation, etc. No code changes in this PR — design only.

Test plan

Reviewer confirms the proposed MeetingSpeaker enum reshape (.you / .others(speakerIndex: Int?)) is acceptable
Reviewer agrees Sortformer's 4-speaker cap is fine for v1 (graceful "Speaker 4+" degradation)
Reviewer agrees with running diarization in real-time (vs. post-process on stop)
Verify FluidAudio 0.13.4 indeed exposes SortformerDiarizer (spot-checked in repo before authoring)

Extends the v1.10.0 meeting feature so the system-audio track is split into Speaker 1..N via FluidAudio's SortformerDiarizer. Mic stays "You"; diarization is real-time, auto-detects speakers, no UI input.

Copilot

Pull request overview

Documentation-only PR adding a design spec for extending the v1.10.0 meeting transcription feature with real speaker diarization on the system-audio ("Others") track using FluidAudio's SortformerDiarizer. No code changes are introduced.

Changes:

Adds a new design document under .kiro/specs/meeting-speaker-diarization/design.md following the existing per-feature spec convention.
Describes data-model reshape (MeetingSpeaker enum with associated value), a new MeetingDiarizer actor, wire-in points in MeetingStateManager, UI updates, settings toggle, and test plan.
Lists representative files to edit and a manual verification procedure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

amazon-q-developer

Summary

This design spec for speaker diarization on meeting transcripts is well-structured and follows the existing .kiro/specs/ convention. The approach of using FluidAudio's SortformerDiarizer for system-audio-only diarization is sound. However, three critical implementation issues must be addressed before implementation:

Critical Issues (Must Fix)

Index-to-label mapping inconsistency (line 36): The spec shows both 0-based internal indices and 1-based display labels ("Speaker 1" for index 0) without clear documentation of the transform, risking debugging confusion.
Diarization cold-start race condition (lines 67-69): Transcription can complete before diarization processes audio, causing early chunks to show nil speakerIndex. Need explicit handling or grace period documentation.
Hardcoded sample rate assumption (line 73): The 16000 Hz sample rate is hardcoded but may not match actual MeetingAudioEngine output, causing timeline misalignment.

Additional Concern

Auto-enabling toggle behavior (line 95): Automatically changing meetingDiarizationEnabled from false→true after model download creates surprising UX.

Once these issues are resolved, the spec provides a solid foundation for implementation. The test strategy, verification plan, and out-of-scope boundaries are all well-defined.

You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.

- Make 0-based vs 1-based index convention explicit (storage 0-based, display layer adds +1). - Document the legitimate first-chunks `nil` window from Sortformer's warmup; renderer handles `.others(nil)` as plain "Others". - Move sample rate to a single MeetingAudioEngine constant and have the engine attach `startTime` directly to each yielded chunk so the consumer no longer divides sample counts. - Default `meetingDiarizationEnabled` to false with explicit opt-in; drop the surprising auto-enable on model download.

Add spec for speaker diarization on meeting "Others" track

96c0a69

Extends the v1.10.0 meeting feature so the system-audio track is split into Speaker 1..N via FluidAudio's SortformerDiarizer. Mic stays "You"; diarization is real-time, auto-detects speakers, no UI input.

Copilot AI review requested due to automatic review settings May 29, 2026 13:34

Copilot started reviewing on behalf of sebsto May 29, 2026 13:34 View session

Copilot AI reviewed May 29, 2026

View reviewed changes

amazon-q-developer Bot reviewed May 29, 2026

View reviewed changes

Comment thread .kiro/specs/meeting-speaker-diarization/design.md Outdated

Comment thread .kiro/specs/meeting-speaker-diarization/design.md Outdated

Comment thread .kiro/specs/meeting-speaker-diarization/design.md Outdated

Comment thread .kiro/specs/meeting-speaker-diarization/design.md Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spec: speaker diarization on meeting Others track#64

Spec: speaker diarization on meeting Others track#64
sebsto wants to merge 2 commits into
mainfrom
spec/meeting-speaker-diarization

sebsto commented May 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

amazon-q-developer Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sebsto commented May 29, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

amazon-q-developer Bot left a comment

Choose a reason for hiding this comment

Summary

Critical Issues (Must Fix)

Additional Concern

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants