Talk to Claude Code, Gemini CLI or Antigravity CLI aka agy and hear them talk back. This project adds seamless Speech-to-Text (STT) and Text-to-Speech (TTS) capabilities via a Model Context Protocol (MCP) server.
π° Read the story: True voice mode for Claude Code
- ποΈ Speech-to-Text (STT): Dictate your prompts instead of typing.
- π Text-to-Speech (TTS): Hear the model's responses read aloud.
- π Conversational Loop: Use the
/sttscommand for a continuous voice-driven session. - π Persistent Daemon: Fast startup using a reusable Chrome window.
- π οΈ Cross-Platform: Works with both Claude Code and Gemini CLI.
- π History: Recall past prompts and responses from a dropdown above each panel, or with
Alt+β/Alt+β.
stts uses a background daemon to manage a persistent Chrome/Chromium window:
- MCP Server: Exposes
sttandttstools to the AI model. Talks to the daemon over plain HTTP β one short request per call, no polling, no per-call subprocess spawn. - Daemon: A local HTTP + WebSocket server on port
15986that controls a Chrome instance in "app mode". Stores its profile under$TMPDIR/cc-gc-stts-user-data-dir. - Browser UI β Daemon: A single persistent WebSocket at
/wscarries every per-turn message. The daemon pushes arequestframe the moment the model callssttortts; the page pushes backcomplete/cancel/closewhen the user is done. - Browser UI: Uses the native Web Speech API for recognition and synthesis. Free at the wallet β note that on Linux Chrome routes recognition audio through Google's servers, so this is not a fully offline pipeline.
- Smart Auto-Advance: In the
/sttsvoice loop, if you simply listen through the response without touching anything, the loop advances automatically the moment speech ends. Only if you press Stop or Play (or say "stop it" / "play it") does the page wait for a manual Got it! so you stay in control of replays. - Automatic Lifecycle: The daemon starts on demand and shuts down when the Chrome window is closed.
- Port-collision aware: If port
15986is held by a non-stts process, the launcher fails fast with a clear error instead of timing out.
npm install
npm run buildclaude plugins marketplace add https://github.com/sandipchitale/cc-gc-stts.git
claude plugin install sttsgemini extensions install --consent https://github.com/sandipchitale/cc-gc-stts.gitagy plugin install --consent https://github.com/sandipchitale/cc-gc-stts.gitRun the voice-driven loop where you speak, the model processes, and the response is read back to you:
- Claude Code:
/stts - Gemini CLI:
/stts - Antigravity CLI:
/stts
You can also ask the model to "use the stt tool" or "speak this using tts" directly in your prompts.
Both STT and TTS modes support voice-activated commands for a hands-free experience.
| Command | Action |
|---|---|
send prompt |
Submits your dictated text |
cancel prompt |
Aborts the current recording |
new paragraph |
Inserts a line break |
got it |
(TTS mode) Acknowledges the response and continues β only required if you used Stop or Play during playback; otherwise the loop auto-advances |
stop it |
(TTS mode) Stops the current playback (after this, Got it! is required to advance) |
play it |
(TTS mode) Replays the response (after this, Got it! is required to advance) |
Note: Many more punctuation and formatting commands are supported (e.g.,
insert comma,select all,undo it). Toggle the side panel to see the full list.
Keyboard Shortcuts:
Ctrl+R: Toggle recording/playback side panel.Enter: Send prompt (Talk side).Escape: Stop recording or close the commands panel.Alt+β/Alt+β: Cycle through prompt or response history when the textarea is focused.
Each panel has a History bar above its textarea:
- Talk stores every submitted prompt; Listen stores every response received from the model.
- Pick an entry from the dropdown to load it into the textarea β fully editable. Hit Enter / Send to resubmit a prompt, or Play to replay a response.
Alt+βwalks back through history;Alt+βwalks forward (your in-progress draft is preserved and restored at the bottom of the stack).- History persists across sessions in
localStorage, capped at 50 entries per side. Consecutive duplicates are not stored. - Use the Clear button to wipe one side's history.
Claude Code:
claude plugins marketplace add "$PWD"
claude plugin install sttsGemini CLI:
gemini extensions install --consent "$PWD"Antigravity CLI:
agy plugin install "$PWD"The daemon usually runs automatically, but you can manually stop it by closing the Chrome window or:
curl -X POST http://127.0.0.1:15986/api/shutdownsrc/stts-mcp-server.tsβ MCP server exposing thesttandttstools. Calls the daemon HTTP API directly.src/stts-daemon.tsβ local HTTP + WebSocket server on port15986that owns the Chrome window.src/daemon-client.tsβ shared HTTP client used by the MCP server and the CLI.src/stts.tsβ standalone CLI (stts stt/stts tts) for manual use and diagnostics.src/stts_ui.htmlβ the Web Speech API UI rendered inside the Chrome window. Connects to the daemon over WebSocket at/ws.
| Path | Method | Used by | Purpose |
|---|---|---|---|
/ |
GET | Chrome | Serves the UI HTML |
/api/ping |
GET | daemon-client | Health check (ok body confirms it's our daemon, not a foreign process) |
/request |
POST | daemon-client | MCP/CLI submits an stt or tts request; response body carries the result |
/api/shutdown |
POST | UI / CLI | Cleanly stops the daemon and Chrome |
/ws |
WebSocket | Browser UI | Single persistent channel β daemon pushes request frames; browser pushes ready / complete / cancel / close |
- Node.js: v18 or higher.
- Chrome/Chromium: Must be installed and discoverable.
- Microphone: Required for STT functionality.
MIT β Sandip Chitale
The following prompt is self-contained: handed to a capable coding agent (Claude Code, Gemini CLI, etc.) in an empty repository, it should produce an implementation equivalent to the one in this project.
Build a voice-loop plugin for Claude Code and Gemini CLI called
stts. It exposes two MCP tools βstt(capture a spoken prompt and return the transcript) andtts(read a string aloud) β plus a slash command/sttsthat loopsstt β answer β ttsuntil the user is silent. Target Node.js 18+, TypeScript, esbuild for bundling.Architecture. Three processes:
- MCP server (stdio) β registers
sttandttswith the official@modelcontextprotocol/sdk. Each tool call POSTs JSON to a local daemon and returns the daemon's response.- Daemon β a single Node process listening on a fixed loopback port (use
15986). It serves both an HTTP API and a WebSocket endpoint at/wsfrom the samehttp.Server, and it owns one persistent Chrome/Chromium window launched viachrome-launcherin--app=mode pointed athttp://127.0.0.1:15986/. The daemon's HTTP routes are:GET /(the UI HTML),GET /api/ping(health probe β bodyokidentifies "our" daemon vs. a foreign process holding the port),POST /request(the MCP/CLI submission, body is{ mode: 'stt' | 'tts', ... }, response body is the result),POST /api/shutdown. Long-poll endpoints are explicitly not used. The daemon keeps at most one in-flightPendingrequest; a second/requestwhile another is open returns409.- Browser page β a single HTML file the daemon serves. On load it opens a WebSocket to
/wsand sends{ "type": "ready" }. The daemon pushes{ "type": "request", "config": {...} }frames; the page replies with{ "type": "complete", "text": "..." },{ "type": "cancel" }, or{ "type": "close" }. The browser auto-reconnects on socket close. The page must reset to an idle UI on connect and re-activate when arequestframe arrives.Daemon-client. Provide a shared module used by both the MCP server and a small CLI. It must: (a) ping the daemon; (b) if absent, spawn it detached with
unref(); (c) if the port is held by something foreign, fail with a clear error; (d) POST/requestand return the parsedtextfield.Browser UI. A two-panel page β Talk (STT) and Listen (TTS) β using only the browser's Web Speech API. Behaviors:
- STT panel uses
webkitSpeechRecognitionwithcontinuous = true,interimResults = true. Buttons: Send, Cancel, Dictate (toggle), Commands (panel), End conversation. Voice commands inserted into recognized text trigger UI actions:send prompt,cancel prompt, plus punctuation/formatting helpers (new line,new paragraph,insert comma,select all,undo it, etc.).- TTS panel uses
speechSynthesis. Buttons: Play, Stop, Got it!, Refresh. Voice commands while listening:play it,stop it,got it. On a fresh request setuserInteracted = false. Stop and Play setuserInteracted = true.- Smart auto-advance: when
currentUtterance.onendfires in oneshot mode (config.oneshot === true) anduserInteracted === false, the page treats it like Got it! and immediately sends{ type: 'close' }. Otherwise it shows "Finished. Play again or click Got it! to continue." and waits for an explicit gesture.- On first load,
speechSynthesis.getVoices()may return empty β nudge it with a zero-volume dummySpeechSynthesisUtteranceand listen forvoiceschanged. When activating TTS, retrysynth.speak(...)up to ~20Γ100 ms while voices are still loading, then fall back to a 5-second polling window before giving up.- Maintain per-side history (prompts, responses) in
localStorage, capped at 50 entries each, withAlt+β/Alt+βcycling and a Clear button. Consecutive duplicates are not stored.
/sttsslash command. A markdown command file whose body instructs the model: "Call thestttool. If the response is empty, outputDone.and stop. Otherwise treat the response as a prompt, answer it, pass the answer to thettstool. Repeat. While the loop runs, do not output anything else." The MCPttstool sendsoneshot: trueso the page applies smart auto-advance.Plugin packaging. Provide
.claude-plugin/plugin.jsonregisteringstts-mcpas an stdio MCP server pointing at the bundled daemon entrypoint, plus a Gemini extension manifest mirroring it.package.jsondeclares dependencies@modelcontextprotocol/sdk,chrome-launcher,commander,ws. Build with esbuild:bundle: true,platform: 'node',format: 'esm',target: 'node18', output todist/as.mjs, copystts_ui.htmlalongside.Cross-cutting requirements. Persist Chrome under
${tmpdir}/cc-gc-stts-user-data-dirso cookies/voices/microphone permissions survive restarts. Disable the daemon's HTTP timeouts (requestTimeout,headersTimeout,timeout,keepAliveTimeoutall0) so a parked/requestcannot be killed by Node. OnEADDRINUSEexit cleanly. On Chrome process exit, null out the WebSocket reference and resolve any pending request with''. Write everything in TypeScript with strict typing for the request/config shapes.


