Skip to content

Latest commit

 

History

History
44 lines (33 loc) · 3.19 KB

File metadata and controls

44 lines (33 loc) · 3.19 KB

Roadmap

Not a promise list — a running record of what's in, what's queued, and what's explicitly out.

V1 (shipped)

  • Kokoro TTS with voice/speed/language config
  • faster-whisper STT with listen, listen --copy, listen --paste
  • Claude Code Stop hook with per-session pinning
  • readback CLI: say, stop, listen, voices, models, pin, unpin, list, config, install-hook, uninstall-hook, update-hook, doctor
  • Markdown / code-block / URL stripping before TTS
  • Config at ~/.config/readback/config
  • install.sh that doesn't touch existing installations elsewhere on disk

Queued for V1.1

  • Cursor hook support. Cursor uses a similar hook system to Claude Code. Should be a 50-line addition.
  • Codex CLI hook support. Same — another LLM CLI with a hook mechanism.
  • More voice presets. readback preset reading / preset glance / preset focus — named combinations of voice + speed + volume.
  • readback say --voice X --speed Y — per-invocation overrides without touching config.
  • Per-session config overrides. Pin session X with voice A, session Y with voice B.
  • A test suite. Right now there isn't one. A few bats-core tests for the CLI and pytest tests for the Python engines would cover most regressions.

Considered for V2

  • Global push-to-talk hotkey for listen — press a key anywhere in macOS, speak, release, get the transcript pasted into the focused app. This is what Superwhisper spent years on. I'd need Hammerspoon or Karabiner integration. Unclear if it fits the "small and auditable" scope.
  • Streaming STT. Text appears as you speak instead of at the end of recording. faster-whisper supports this. Needs a UI decision — print to stderr as partial? Update a TUI?
  • Smarter markdown cleaning. Current regex-based stripper misses some edge cases. A proper markdown parser would handle nested structures better.
  • Audio output routing. Pick which output device to use (for people with multiple speakers / headphones).
  • Whisper model auto-fallback. Try small, fall back to base, fall back to tiny if loading fails.

Explicitly out of scope (probably forever)

  • GUI application. This is a CLI. Adding a GUI would change the maintenance burden, the audit surface, and the audience.
  • Proprietary cloud voices. ElevenLabs, Google TTS, Azure — all great, but they take audio off your machine. Not what this is for.
  • Full voice command system. "Open this file, scroll down, run the tests." That's Talon. Use Talon.
  • Windows support. The whole stack is macOS-shaped right now (pbcopy, osascript, Homebrew). Porting would double the maintenance surface.
  • Linux support. Would be valuable, but I don't daily-drive Linux. Accepting PRs from people who do.
  • Publishing as a Python package on PyPI. Would require more packaging ceremony. If the project grows, revisit.
  • A Homebrew formula. Same — nice-to-have but not worth the overhead for a single-user-count project.

How to suggest something

Open an issue on the repo or fork and send a PR. The bar for new features is: does it stay small, does it preserve the "read it in an afternoon" audit budget, and does it help the accessibility core use case?