Skip to content

feat: add meeting transcription feature for macOS#789

Open
aj47 wants to merge 5 commits into
mainfrom
feature/meeting-transcription-macos
Open

feat: add meeting transcription feature for macOS#789
aj47 wants to merge 5 commits into
mainfrom
feature/meeting-transcription-macos

Conversation

@aj47

@aj47 aj47 commented Dec 26, 2025

Copy link
Copy Markdown
Owner

Summary

Add meeting transcription capability using system audio capture via ScreenCaptureKit. This allows recording meetings with mic and/or desktop audio, transcribing in real-time every 30 seconds using existing STT providers (OpenAI/Groq).

Features

  • System audio capture using macos-system-audio-recorder npm package (Swift/ScreenCaptureKit)
  • Flexible audio sources: Mic-only, system-only, or both
  • Real-time transcription: Audio buffered and transcribed every 30 seconds
  • Meeting history: View, search, rename, and delete past meetings
  • Transcript viewer: Full transcript with time-stamped segments showing audio source

Technical Details

New Files

  • apps/desktop/src/main/meeting-recorder.ts - Main process service for audio capture and transcription
  • apps/desktop/src/renderer/src/pages/meetings.tsx - React UI for meeting recording

Modified Files

  • apps/desktop/src/shared/types.ts - Added Meeting-related types
  • apps/desktop/src/main/tipc.ts - Added IPC handlers for meeting operations
  • apps/desktop/src/renderer/src/router.tsx - Added /meetings route
  • apps/desktop/src/renderer/src/components/app-layout.tsx - Added Meetings nav link

Dependencies

  • Added macos-system-audio-recorder@0.0.1 - Swift binary for ScreenCaptureKit audio capture

Platform Support

⚠️ macOS only - Requires macOS 12.3+ for ScreenCaptureKit. Users on other platforms will see a warning message.

Testing

  • TypeScript compiles without errors
  • All existing tests pass (38/38)

Screenshots

The Meetings page is accessible from the sidebar under Settings and provides:

  1. Audio source selector (Both/Mic/System)
  2. Start/Stop recording with live timer
  3. Meeting history grouped by date
  4. Detail dialog with full transcript and segments

Add meeting transcription capability using system audio capture via ScreenCaptureKit.
This allows recording meetings with mic and/or desktop audio, transcribing in real-time
every 30 seconds using existing STT providers (OpenAI/Groq).

Features:
- System audio capture using macos-system-audio-recorder npm package
- Support for mic-only, system-only, or both audio sources
- Real-time transcription with time-stamped segments
- Meeting history with search, rename, and delete
- Full transcript view with segment details

Note: This feature is macOS-only (requires macOS 12.3+ for ScreenCaptureKit)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@augmentcode

augmentcode Bot commented Dec 26, 2025

Copy link
Copy Markdown
🤖 Augment PR Summary

Summary: This PR adds a macOS-only “Meetings” feature that records meeting audio (mic, system audio, or both) and transcribes it in near real-time using the app’s existing STT providers.

Changes:

  • Introduces a new main-process MeetingRecorderService that captures system audio via ScreenCaptureKit (through macos-system-audio-recorder), buffers audio, and transcribes every 30 seconds.
  • Adds IPC procedures in apps/desktop/src/main/tipc.ts for starting/stopping recording, streaming microphone PCM chunks from the renderer, and CRUD operations on meeting history.
  • Adds a new Meetings page (/meetings) with recording controls, meeting history list, rename/delete actions, and a transcript viewer dialog.
  • Extends shared types (Meeting, MeetingTranscriptSegment, state/config types) to support the new feature.
  • Registers the new route and adds a navigation entry in the desktop sidebar.
  • Adds dependency macos-system-audio-recorder@0.0.1 for system audio capture on macOS 12.3+.

Technical Notes: Transcription requests are guarded against overlap, use request timeouts, and audio buffers are bounded to prevent unbounded growth; meeting data is persisted as JSON under the app’s data directory.

🤖 Was this summary useful? React with 👍 or 👎

@augmentcode augmentcode Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

// Mutations
const startMutation = useMutation({
mutationFn: (audioSource: MeetingAudioSource) =>
tipcClient.startMeetingRecording({ audioSource }),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI offers microphone/both sources, but this page never captures mic audio nor calls tipcClient.addMeetingMicrophoneData, so those modes likely won’t include microphone input in the transcript.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

}

private startTranscriptionLoop(): void {
this.transcriptionTimer = setInterval(async () => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setInterval(async () => ...) can overlap if transcribeBufferedAudio() takes longer than 30s, which can lead to concurrent writes/segment ordering issues (and potentially race with stopRecording()). Consider ensuring transcription runs are non-reentrant.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

- Make transcription loop non-reentrant to prevent overlapping runs
- Add microphone capture in renderer for mic/both audio sources
- Wait for in-progress transcription before stopping recording

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@aj47

aj47 commented Dec 26, 2025

Copy link
Copy Markdown
Owner Author

augment review

@augmentcode augmentcode Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

)}
</Button>

{process.platform !== "darwin" && (

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process.platform may be undefined in the renderer (this app generally uses build-time process.env.IS_MAC), which would crash this page when rendered. Consider switching this check to process.env.IS_MAC (or another renderer-safe platform flag).

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

const audioData = Buffer.concat(chunks)

// Create WAV header
const wavHeader = this.createWavHeader(audioData.length, SAMPLE_RATE, CHANNELS, BYTES_PER_SAMPLE * 8)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WAV header is always written as 48kHz/mono/16-bit, but macos-system-audio-recorder can output different sample rates / channel counts / bit depths via getAudioDetails(). If these don’t match the actual PCM stream, the WAV will be invalid and transcription quality can degrade or fail.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

const groqBaseUrl = config.groqBaseUrl || "https://api.groq.com/openai/v1"
const openaiBaseUrl = config.openaiBaseUrl || "https://api.openai.com/v1"

const response = await fetch(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch() here has no timeout/abort; if the STT request hangs, isTranscribing can remain true and stopRecording() will wait indefinitely in its while (this.isTranscribing) loop. Consider adding an AbortController with a reasonable timeout for transcription requests.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

- Replace process.platform with process.env.IS_MAC in renderer for platform detection
- Use actual audio details from macos-system-audio-recorder getAudioDetails() instead of hardcoded values
- Add AbortController with 60s timeout for transcription API requests to prevent hanging

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@aj47

aj47 commented Dec 26, 2025

Copy link
Copy Markdown
Owner Author

augment review

@augmentcode augmentcode Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.


logApp(`[MeetingRecorder] Started recording meeting ${meetingId}`)
return meeting
} catch (error) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an error happens after startSystemAudioRecording() succeeds, this catch resets flags but doesn’t stop this.systemRecorder, so system capture could continue running in the background. Consider adding cleanup for the recorder/buffers on this failure path.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

startTime: this.systemAudioBuffer.startTime,
audioDetails: { ...this.systemAudioBuffer.audioDetails },
})
this.systemAudioBuffer.data = []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clears the buffered audio before attempting transcription; if transcribeAudio() fails, that chunk is dropped and the meeting transcript can become permanently incomplete. Consider retaining the buffer until transcription succeeds (or otherwise enabling retry) to avoid silent data loss.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

}

// Send to main process
tipcClient.addMeetingMicrophoneData({ audioData: int16Data.buffer })

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tipcClient.addMeetingMicrophoneData(...) returns a Promise but it’s not awaited/handled; if IPC rejects, this can surface as an unhandled rejection in the renderer during recording. Consider attaching a .catch (or otherwise handling errors) to keep failures contained.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

- Add cleanupRecordingResources() to properly clean up system recorder on error
- Retain audio buffer until transcription succeeds to avoid data loss
- Add .catch() to tipcClient.addMeetingMicrophoneData to handle promise rejections

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@aj47

aj47 commented Dec 26, 2025

Copy link
Copy Markdown
Owner Author

augment review

@augmentcode augmentcode Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

}

source.connect(processor)
processor.connect(audioContext.destination)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processor.connect(audioContext.destination) will typically route the microphone signal to the user’s speakers, which can be distracting and can create an echo/feedback loop (especially if “system audio” capture is enabled). Consider ensuring the mic capture path is “silent” while still keeping the processing callback running.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

}
} catch (error) {
// Keep the buffer data for retry on next interval
logApp(`[MeetingRecorder] Transcription error for ${source} (will retry):`, error)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the buffer is retained on failures, a prolonged outage (or missing API key) can cause the WAV payload to grow without bound and eventually exceed upstream STT size limits, making retries fail indefinitely. Consider bounding/splitting buffered audio when retrying to avoid permanent transcript loss/memory growth.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

…audio echo

- Add MAX_BUFFER_SIZE_BYTES (25MB) to prevent unbounded audio buffer growth
- Discard oldest audio chunks when buffer limit is reached
- Route microphone audio through silent gain node to prevent echo/feedback
- Add getBufferSize helper for calculating buffer usage

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@aj47

aj47 commented Dec 27, 2025

Copy link
Copy Markdown
Owner Author

augment review

@augmentcode augmentcode Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant