Skip to content

Latest commit

 

History

History
59 lines (37 loc) · 4.82 KB

File metadata and controls

59 lines (37 loc) · 4.82 KB

Why readback exists

This is not a neutral-framing doc. It's the "why."

The immediate trigger

I spend four to eight hours a day reading LLM responses. Long ones. Thousand-word technical explanations, code reviews, architecture discussions, debugging sessions. I love the work. My eyes do not.

At some point the eye strain crossed a threshold where the cost of reading one more 2,000-word Claude response outweighed the value of the response. I started skimming. Skimming costs me accuracy and costs me learning. What I wanted was to listen to responses while I did something else with my eyes — look at code, close my eyes, walk around the room, stare out a window.

The first thing I tried was macOS's built-in "Speak Selection." It's fine for a paragraph. It's bad for a chapter. The default voices are flat, and the premium voices are better but still not great for long-form technical content. And it requires manual selection every single time — highlight, shortcut, wait, un-highlight, repeat. That's not a flow; that's another kind of fatigue.

The second thing I tried was paid tools (Superwhisper, ElevenLabs via scripts, etc.). They work but the local voice quality wasn't there, the LLM integration wasn't there, and my paying-for-another-subscription budget was already exhausted.

The third thing — this repo — is what I actually wanted: an open, local, scriptable voice layer that integrates with the LLM tools I already use, that costs nothing beyond the initial install, and that I can audit line by line.

Who this helps beyond me

If you're here because you landed on the repo searching for "LLM text to speech" or "eye strain screen reader," here's the honest list of who I think this helps and who it doesn't.

This helps:

  • People with chronic eye strain or visual fatigue from long-form screen reading. Not "my eyes feel a bit tired" — I mean the kind of pain that costs you work hours. Listening to responses while looking at a neutral target (a wall, your editor's structure, your keyboard) genuinely gives your eyes a break.
  • People with low vision who can read but find it effortful. Voice output reduces the cognitive load of decoding text on top of parsing content.
  • People with RSI or typing limits who want voice input specifically for typing long prompts. readback listen --paste lets you speak a paragraph and have it appear in whatever text field has focus.
  • People who process audio faster than text. Some people genuinely retain more when listening. If that's you, this is a speed-up, not a slow-down.
  • People who work better moving around. Reading a long response pins you to a screen. Listening to one lets you pace, stretch, look at a whiteboard, or just sit back.

This does not help (and this matters):

  • People who are blind or severely visually impaired. You need a real screen reader (VoiceOver, JAWS, NVDA) that understands the full state of your OS, your editor, your browser — not a CLI that reads LLM responses. readback is a complement to real accessibility tools, never a replacement.
  • People who need dictation as their primary input method. You need Dragon, Talon, Superwhisper, or a full voice-control system with custom commands and grammars. readback listen is a convenience tool for occasional voice-to-text; it's not a voice-driven OS.
  • People who want cloud-quality voices that match ElevenLabs or Google Wavenet. Kokoro is the best local option I've found, and it's excellent, but a cloud voice will still beat it on natural prosody for some speakers.

If you're in the second list, please use the right tool for your actual need. readback is a productivity helper for sighted/partially-sighted people who want voice as an option — it is not a substitute for proper accessibility software.

What I want this to become

Honest scope: I want readback to stay small, auditable, and focused on the LLM-workflow niche. I don't want to build a screen reader. I don't want to build a voice OS. I want a tiny CLI that:

  1. Makes long LLM responses audible without friction.
  2. Makes long prompts typable-by-voice without friction.
  3. Never sends audio off your machine.
  4. Can be read end to end by one person in an afternoon.

If it stays that size and helps ten people with chronic eye strain ship more work, it's successful.

What would make this not the right tool

If you need any of these, readback is wrong for you:

  • A graphical interface.
  • Real-time streaming STT (you want transcripts to appear as you speak).
  • Custom voice commands that trigger actions in other apps.
  • Accessibility compliance for a production application.
  • A voice that matches a specific brand or persona.
  • Language support beyond what Kokoro ships with.

These are all real needs. They're just not this project's scope, and pretending otherwise would make readback worse at the thing it is good at.