feat(CORE-236): add voice input (STT) and read-aloud (TTS) by nraffa · Pull Request #91 · tabiya-tech/compass-zambia-fork

nraffa · 2026-04-08T00:37:40Z

Summary

Voice Input (STT): Mic button on chat input — speak instead of type, with real-time transcription
Read Aloud (TTS): Speaker button on AI messages — reads them aloud using Google Cloud TTS (Chirp 3 HD voices), with browser speechSynthesis as fallback
Both features behind config toggles (speechToText.enabled, textToSpeech.enabled), disabled by default
IaC: both GLOBAL_ENABLE_SPEECH_TO_TEXT and GLOBAL_ENABLE_TEXT_TO_SPEECH wired to Cloud Run for deployment

Voice Input (STT)

Hybrid approach: browser Speech Recognition for real-time text display + Google Cloud STT v2 for final transcription
Backend endpoint: POST /speech-to-text/transcribe (stateless proxy, multipart/form-data)
Browser support: real-time text in Chrome, Edge, Safari (~90% of users). Firefox falls back to cloud-only with brief "Transcribing..." state
60-second max recording with auto-stop, security filter on transcribed text

Read Aloud (TTS)

Backend endpoint: POST /text-to-speech/synthesize (JSON {text, language} → MP3 audio)
Google Cloud TTS with Chirp 3 HD voices for natural-sounding speech
Browser speechSynthesis as fallback if backend is unavailable (marked with TODO for future removal)
Loading spinner while audio is being synthesized, LRU audio cache (20 entries) to avoid redundant API calls
MP3 format for universal browser compatibility (including Safari)
Language fallback: ny-ZM → en-US (Chichewa not supported by any TTS provider)

Add a new POST /speech-to-text/transcribe endpoint that accepts audio uploads and returns transcribed text via Google Cloud Speech-to-Text v2.Feature is behind a configuration toggle (disabled by default), following the same pattern as CV upload.

Add mic button to chat input that uses Web Speech API for real-time interim text display while recording audio via MediaRecorder. On stop, audio is sent to the backend STT endpoint for final transcription. Existing text in the input is preserved and new transcription is appended. Feature is behind the GLOBAL_ENABLE_SPEECH_TO_TEXT toggle.

…, formatting

Enable speech.googleapis.com in backend required services, grant roles/speech.client to the backend service account, and pass GOOGLE_CLOUD_PROJECT env var to Cloud Run.

Add speaker button to AI chat messages that reads them aloud using Google Cloud TTS (Chirp 3 HD voices), with browser speechSynthesis as fallback if the backend is unavailable. - Backend: POST /text-to-speech/synthesize endpoint mirroring the STT module. - Frontend: TextToSpeechService with LRU audio cache, loading spinner state. - IaC: wire GLOBAL_ENABLE_SPEECH_TO_TEXT and GLOBAL_ENABLE_TEXT_TO_SPEECH env vars to Cloud Run for full deployment of voice-assisted features.

nraffa added 5 commits April 9, 2026 11:03

fix(frontend): fix review findings — security filter, race conditions…

a639378

…, formatting

chore(iac): add Speech-to-Text API enablement and IAM to IaC

c3605ad

Enable speech.googleapis.com in backend required services, grant roles/speech.client to the backend service account, and pass GOOGLE_CLOUD_PROJECT env var to Cloud Run.

nraffa force-pushed the feat/CORE-236-voice-input-stt branch from 3951c8c to b10fc79 Compare April 9, 2026 01:08

nraffa changed the title ~~feat(CORE-236): add voice input with real-time transcription~~ feat(CORE-236): add voice input (STT) and read-aloud (TTS) Apr 9, 2026

nraffa marked this pull request as draft May 28, 2026 07:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(CORE-236): add voice input (STT) and read-aloud (TTS)#91

feat(CORE-236): add voice input (STT) and read-aloud (TTS)#91
nraffa wants to merge 5 commits into
mainfrom
feat/CORE-236-voice-input-stt

nraffa commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nraffa commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Voice Input (STT)

Read Aloud (TTS)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nraffa commented Apr 8, 2026 •

edited

Loading