Skip to content

Spec: speaker diarization on meeting Others track#64

Open
sebsto wants to merge 2 commits into
mainfrom
spec/meeting-speaker-diarization
Open

Spec: speaker diarization on meeting Others track#64
sebsto wants to merge 2 commits into
mainfrom
spec/meeting-speaker-diarization

Conversation

@sebsto

@sebsto sebsto commented May 29, 2026

Copy link
Copy Markdown
Owner

Summary

Adds a design spec to extend the v1.10.0 meeting transcription feature with real speaker diarization on the system-audio ("Others") track only, so the live transcript shows distinct labels (Speaker 1, Speaker 2, …) for remote participants instead of a single "Others" badge.

  • Engine: FluidAudio's SortformerDiarizer (already a dependency at 0.13.4 — streaming, runs on ANE, ~11% DER, up to 4 speaker slots).
  • Scope: only the system-audio track. Mic stays "You" — preserves the privacy-friendly source-based split for your own audio.
  • Speaker count: auto-detect, no UI input.
  • Timing: real-time (matches existing live-transcript UX).

The spec follows the existing .kiro/specs/<feature>/design.md convention used by meeting-transcription, hands-free-dictation, etc. No code changes in this PR — design only.

Test plan

  • Reviewer confirms the proposed MeetingSpeaker enum reshape (.you / .others(speakerIndex: Int?)) is acceptable
  • Reviewer agrees Sortformer's 4-speaker cap is fine for v1 (graceful "Speaker 4+" degradation)
  • Reviewer agrees with running diarization in real-time (vs. post-process on stop)
  • Verify FluidAudio 0.13.4 indeed exposes SortformerDiarizer (spot-checked in repo before authoring)

Extends the v1.10.0 meeting feature so the system-audio track is split
into Speaker 1..N via FluidAudio's SortformerDiarizer. Mic stays "You";
diarization is real-time, auto-detects speakers, no UI input.
Copilot AI review requested due to automatic review settings May 29, 2026 13:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Documentation-only PR adding a design spec for extending the v1.10.0 meeting transcription feature with real speaker diarization on the system-audio ("Others") track using FluidAudio's SortformerDiarizer. No code changes are introduced.

Changes:

  • Adds a new design document under .kiro/specs/meeting-speaker-diarization/design.md following the existing per-feature spec convention.
  • Describes data-model reshape (MeetingSpeaker enum with associated value), a new MeetingDiarizer actor, wire-in points in MeetingStateManager, UI updates, settings toggle, and test plan.
  • Lists representative files to edit and a manual verification procedure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@amazon-q-developer amazon-q-developer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This design spec for speaker diarization on meeting transcripts is well-structured and follows the existing .kiro/specs/ convention. The approach of using FluidAudio's SortformerDiarizer for system-audio-only diarization is sound. However, three critical implementation issues must be addressed before implementation:

Critical Issues (Must Fix)

  1. Index-to-label mapping inconsistency (line 36): The spec shows both 0-based internal indices and 1-based display labels ("Speaker 1" for index 0) without clear documentation of the transform, risking debugging confusion.

  2. Diarization cold-start race condition (lines 67-69): Transcription can complete before diarization processes audio, causing early chunks to show nil speakerIndex. Need explicit handling or grace period documentation.

  3. Hardcoded sample rate assumption (line 73): The 16000 Hz sample rate is hardcoded but may not match actual MeetingAudioEngine output, causing timeline misalignment.

Additional Concern

  • Auto-enabling toggle behavior (line 95): Automatically changing meetingDiarizationEnabled from false→true after model download creates surprising UX.

Once these issues are resolved, the spec provides a solid foundation for implementation. The test strategy, verification plan, and out-of-scope boundaries are all well-defined.


You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.

Comment thread .kiro/specs/meeting-speaker-diarization/design.md Outdated
Comment thread .kiro/specs/meeting-speaker-diarization/design.md Outdated
Comment thread .kiro/specs/meeting-speaker-diarization/design.md Outdated
Comment thread .kiro/specs/meeting-speaker-diarization/design.md Outdated
- Make 0-based vs 1-based index convention explicit (storage 0-based,
  display layer adds +1).
- Document the legitimate first-chunks `nil` window from Sortformer's
  warmup; renderer handles `.others(nil)` as plain "Others".
- Move sample rate to a single MeetingAudioEngine constant and have the
  engine attach `startTime` directly to each yielded chunk so the
  consumer no longer divides sample counts.
- Default `meetingDiarizationEnabled` to false with explicit opt-in;
  drop the surprising auto-enable on model download.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants