Not a promise list — a running record of what's in, what's queued, and what's explicitly out.
- Kokoro TTS with voice/speed/language config
- faster-whisper STT with
listen,listen --copy,listen --paste - Claude Code
Stophook with per-session pinning readbackCLI:say,stop,listen,voices,models,pin,unpin,list,config,install-hook,uninstall-hook,update-hook,doctor- Markdown / code-block / URL stripping before TTS
- Config at
~/.config/readback/config install.shthat doesn't touch existing installations elsewhere on disk
- Cursor hook support. Cursor uses a similar hook system to Claude Code. Should be a 50-line addition.
- Codex CLI hook support. Same — another LLM CLI with a hook mechanism.
- More voice presets.
readback preset reading/preset glance/preset focus— named combinations of voice + speed + volume. readback say --voice X --speed Y— per-invocation overrides without touching config.- Per-session config overrides. Pin session X with voice A, session Y with voice B.
- A test suite. Right now there isn't one. A few bats-core tests for the CLI and pytest tests for the Python engines would cover most regressions.
- Global push-to-talk hotkey for
listen— press a key anywhere in macOS, speak, release, get the transcript pasted into the focused app. This is what Superwhisper spent years on. I'd need Hammerspoon or Karabiner integration. Unclear if it fits the "small and auditable" scope. - Streaming STT. Text appears as you speak instead of at the end of recording.
faster-whispersupports this. Needs a UI decision — print to stderr as partial? Update a TUI? - Smarter markdown cleaning. Current regex-based stripper misses some edge cases. A proper markdown parser would handle nested structures better.
- Audio output routing. Pick which output device to use (for people with multiple speakers / headphones).
- Whisper model auto-fallback. Try small, fall back to base, fall back to tiny if loading fails.
- GUI application. This is a CLI. Adding a GUI would change the maintenance burden, the audit surface, and the audience.
- Proprietary cloud voices. ElevenLabs, Google TTS, Azure — all great, but they take audio off your machine. Not what this is for.
- Full voice command system. "Open this file, scroll down, run the tests." That's Talon. Use Talon.
- Windows support. The whole stack is macOS-shaped right now (
pbcopy,osascript, Homebrew). Porting would double the maintenance surface. - Linux support. Would be valuable, but I don't daily-drive Linux. Accepting PRs from people who do.
- Publishing as a Python package on PyPI. Would require more packaging ceremony. If the project grows, revisit.
- A Homebrew formula. Same — nice-to-have but not worth the overhead for a single-user-count project.
Open an issue on the repo or fork and send a PR. The bar for new features is: does it stay small, does it preserve the "read it in an afternoon" audit budget, and does it help the accessibility core use case?