Skip to content

feat(voice): streamed gapless PCM assistant speech via AudioWorklet#1644

Merged
oscharko merged 3 commits into
devfrom
feat/keiko-voice-streaming-playout
Jun 28, 2026
Merged

feat(voice): streamed gapless PCM assistant speech via AudioWorklet#1644
oscharko merged 3 commits into
devfrom
feat/keiko-voice-streaming-playout

Conversation

@oscharko

Copy link
Copy Markdown
Contributor

Summary

Replaces the buffered whole-clip <audio> assistant-speech playout (nothing audible until the entire
clip is synthesized + transferred) with a start-on-first-chunk AudioWorklet PCM path, keeping the
buffered path as a strict fallback so it can never regress. Inspired by the canonical Azure realtime
worklet pattern, adapted and hardened for Keiko (not copied). Also closes the audit's outstanding
no-webaudio-scheduling-architecture finding.

What changed

  • keiko-playback-worklet.js (new, packages/keiko-ui/public/): a hardened playback
    AudioWorkletProcessor — pre-allocated Float32 ring buffer (no per-message spread, no per-quantum
    allocation), a prime/jitter threshold so the first chunks never underrun, underrun→silence (never a
    glitch), nullinstant flush (sub-frame barge-in), and a frames-played position report.
  • Gateway: requestTextToSpeechStream returns the provider audio as a bounded ReadableStream
    (same auth / egress seam / error coding / size cap), not a fully-buffered clip.
  • BFF: POST /api/voice/speak/stream streams raw PCM (audio/pcm) via the STREAMING sentinel with
    abort-on-disconnect + write backpressure. The buffered /api/voice/speak route is untouched.
  • Client: an injectable AudioWorklet PCM sink (AudioContext @ 24 kHz). The engine tries it first and
    falls back to the buffered opus path when WebAudio is unavailable (e.g. under test) or it fails to
    start. Barge-in flushes the worklet immediately. PCM little-endian decoding carries a sample split
    across network chunks.

Verification (reproduced locally)

  • Full typecheck (incl. tests) + lint: PASS. Root export-surface contract updated → zero drift.
  • build:ui ships the worklet to dist/ui/static/keiko-playback-worklet.js (served same-origin under
    script-src 'self', no inline hash; intact + no leaked window/document).
  • gateway 844 · ui 3620 tests pass; new tests cover the adapter stream (bounded + error mapping),
    the BFF stream route (bytes + abort + error-before-headers + capability gate), the PCM byte→sample
    decoder (incl. split-sample carry), and the useAssistantSpeech streaming/fallback/barge-in wiring.
  • Live Azure Foundry e2e (real creds): streams raw 24 kHz mono PCM with ~0.85s time-to-first-audio
    vs the buffered path waiting for the full ~1.1s+ clip; gapless + instant barge-in.

Invariants

Model Gateway / BFF boundary intact · no new runtime npm deps (AudioWorklet is native) · no raw audio
persisted · browser never receives the provider key · capability-gated · no globals.css change. The
buffered path is the universal fallback, so unsupported browsers/tests are unaffected.

Deferred follow-ups (documented)

Precise interrupt offset via the worklet position (barge-in is already instant via flush), and STT
verbose_json — both orthogonal and minor; deferred to keep this PR focused on the playout path.

🤖 Generated with Claude Code

oscharko and others added 3 commits June 28, 2026 01:20
Replace the buffered whole-clip <audio> playout (nothing audible until the
entire clip is synthesized + transferred) with a start-on-first-chunk path,
keeping the buffered path as a strict fallback so this can never regress.

- keiko-playback-worklet.js: a hardened playback AudioWorkletProcessor — a
  pre-allocated Float32 ring buffer (no per-message spread, no per-quantum
  allocation), a prime/jitter threshold so the first chunks never underrun,
  underrun→silence (never a glitch), `null`→instant flush (sub-frame
  barge-in), and a frames-played position report.
- gateway: requestTextToSpeechStream returns the provider audio as a bounded
  ReadableStream (same auth/egress/error-coding/size cap), not a buffered clip.
- BFF: POST /api/voice/speak/stream streams raw PCM (audio/pcm) via the
  STREAMING sentinel with abort-on-disconnect + write backpressure; the
  buffered /api/voice/speak route is unchanged.
- client: an injectable AudioWorklet PCM sink (AudioContext @ 24kHz). The
  engine tries it first and falls back to the buffered opus path when WebAudio
  is unavailable (e.g. under test) or it fails to start. Barge-in flushes the
  worklet immediately. PCM little-endian decoding carries a sample split
  across network chunks.

Verified: gateway/server/ui suites pass; build:ui ships the worklet to
dist/ui/static (served same-origin under script-src 'self', no inline hash);
root export-surface contract updated (zero drift); live Azure e2e streams raw
24kHz PCM with ~0.85s time-to-first-audio-byte. Precise interrupt offset and
STT verbose_json are documented follow-ups.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ailure

Make the streaming sink strictly fallback-safe: an error before audio starts
(provider/network error, empty 200) now returns false so useAssistantSpeech
runs the buffered path, instead of surfacing a failure. This keeps a turn from
being lost when only the buffered route works (e.g. a smoke test that stubs
/api/voice/speak but not the stream route) and is the safer production default.
Only a mid-stream failure after playback has begun degrades to text.

Harden the worklet for short/empty streams: the "end" marker forces any
sub-prime remainder to play out and completes once drained (no dependence on
having crossed the prime threshold); a no-audio stream is reported as an error
by the client rather than hanging.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The prior push did not trigger a CI run (GitHub synchronize glitch); the
parent commit's full suite is green and this only adds an empty commit to
re-attach the required checks to HEAD. Squashed away on merge.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@oscharko oscharko merged commit b28276e into dev Jun 28, 2026
13 checks passed
@oscharko oscharko deleted the feat/keiko-voice-streaming-playout branch June 28, 2026 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant