feat(CORE-236): add voice input (STT) and read-aloud (TTS)#91
Draft
nraffa wants to merge 5 commits into
Draft
Conversation
Add a new POST /speech-to-text/transcribe endpoint that accepts audio uploads and returns transcribed text via Google Cloud Speech-to-Text v2.Feature is behind a configuration toggle (disabled by default), following the same pattern as CV upload.
Add mic button to chat input that uses Web Speech API for real-time interim text display while recording audio via MediaRecorder. On stop, audio is sent to the backend STT endpoint for final transcription. Existing text in the input is preserved and new transcription is appended. Feature is behind the GLOBAL_ENABLE_SPEECH_TO_TEXT toggle.
Enable speech.googleapis.com in backend required services, grant roles/speech.client to the backend service account, and pass GOOGLE_CLOUD_PROJECT env var to Cloud Run.
Add speaker button to AI chat messages that reads them aloud using Google Cloud TTS (Chirp 3 HD voices), with browser speechSynthesis as fallback if the backend is unavailable. - Backend: POST /text-to-speech/synthesize endpoint mirroring the STT module. - Frontend: TextToSpeechService with LRU audio cache, loading spinner state. - IaC: wire GLOBAL_ENABLE_SPEECH_TO_TEXT and GLOBAL_ENABLE_TEXT_TO_SPEECH env vars to Cloud Run for full deployment of voice-assisted features.
3951c8c to
b10fc79
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
speechToText.enabled,textToSpeech.enabled), disabled by defaultGLOBAL_ENABLE_SPEECH_TO_TEXTandGLOBAL_ENABLE_TEXT_TO_SPEECHwired to Cloud Run for deploymentVoice Input (STT)
POST /speech-to-text/transcribe(stateless proxy, multipart/form-data)Read Aloud (TTS)
POST /text-to-speech/synthesize(JSON{text, language}→ MP3 audio)speechSynthesisas fallback if backend is unavailable (marked with TODO for future removal)ny-ZM→en-US(Chichewa not supported by any TTS provider)