feat(voice): streamed gapless PCM assistant speech via AudioWorklet by oscharko · Pull Request #1644 · oscharko-dev/Keiko

oscharko · 2026-06-27T23:20:58Z

Summary

Replaces the buffered whole-clip <audio> assistant-speech playout (nothing audible until the entire
clip is synthesized + transferred) with a start-on-first-chunk AudioWorklet PCM path, keeping the
buffered path as a strict fallback so it can never regress. Inspired by the canonical Azure realtime
worklet pattern, adapted and hardened for Keiko (not copied). Also closes the audit's outstanding
no-webaudio-scheduling-architecture finding.

What changed

keiko-playback-worklet.js (new, packages/keiko-ui/public/): a hardened playback
AudioWorkletProcessor — pre-allocated Float32 ring buffer (no per-message spread, no per-quantum
allocation), a prime/jitter threshold so the first chunks never underrun, underrun→silence (never a
glitch), null→instant flush (sub-frame barge-in), and a frames-played position report.
Gateway: requestTextToSpeechStream returns the provider audio as a bounded ReadableStream
(same auth / egress seam / error coding / size cap), not a fully-buffered clip.
BFF: POST /api/voice/speak/stream streams raw PCM (audio/pcm) via the STREAMING sentinel with
abort-on-disconnect + write backpressure. The buffered /api/voice/speak route is untouched.
Client: an injectable AudioWorklet PCM sink (AudioContext @ 24 kHz). The engine tries it first and
falls back to the buffered opus path when WebAudio is unavailable (e.g. under test) or it fails to
start. Barge-in flushes the worklet immediately. PCM little-endian decoding carries a sample split
across network chunks.

Verification (reproduced locally)

Full typecheck (incl. tests) + lint: PASS. Root export-surface contract updated → zero drift.
build:ui ships the worklet to dist/ui/static/keiko-playback-worklet.js (served same-origin under
script-src 'self', no inline hash; intact + no leaked window/document).
gateway 844 · ui 3620 tests pass; new tests cover the adapter stream (bounded + error mapping),
the BFF stream route (bytes + abort + error-before-headers + capability gate), the PCM byte→sample
decoder (incl. split-sample carry), and the useAssistantSpeech streaming/fallback/barge-in wiring.
Live Azure Foundry e2e (real creds): streams raw 24 kHz mono PCM with ~0.85s time-to-first-audio
vs the buffered path waiting for the full ~1.1s+ clip; gapless + instant barge-in.

Invariants

Model Gateway / BFF boundary intact · no new runtime npm deps (AudioWorklet is native) · no raw audio
persisted · browser never receives the provider key · capability-gated · no globals.css change. The
buffered path is the universal fallback, so unsupported browsers/tests are unaffected.

Deferred follow-ups (documented)

Precise interrupt offset via the worklet position (barge-in is already instant via flush), and STT
verbose_json — both orthogonal and minor; deferred to keep this PR focused on the playout path.

🤖 Generated with Claude Code

Replace the buffered whole-clip <audio> playout (nothing audible until the entire clip is synthesized + transferred) with a start-on-first-chunk path, keeping the buffered path as a strict fallback so this can never regress. - keiko-playback-worklet.js: a hardened playback AudioWorkletProcessor — a pre-allocated Float32 ring buffer (no per-message spread, no per-quantum allocation), a prime/jitter threshold so the first chunks never underrun, underrun→silence (never a glitch), `null`→instant flush (sub-frame barge-in), and a frames-played position report. - gateway: requestTextToSpeechStream returns the provider audio as a bounded ReadableStream (same auth/egress/error-coding/size cap), not a buffered clip. - BFF: POST /api/voice/speak/stream streams raw PCM (audio/pcm) via the STREAMING sentinel with abort-on-disconnect + write backpressure; the buffered /api/voice/speak route is unchanged. - client: an injectable AudioWorklet PCM sink (AudioContext @ 24kHz). The engine tries it first and falls back to the buffered opus path when WebAudio is unavailable (e.g. under test) or it fails to start. Barge-in flushes the worklet immediately. PCM little-endian decoding carries a sample split across network chunks. Verified: gateway/server/ui suites pass; build:ui ships the worklet to dist/ui/static (served same-origin under script-src 'self', no inline hash); root export-surface contract updated (zero drift); live Azure e2e streams raw 24kHz PCM with ~0.85s time-to-first-audio-byte. Precise interrupt offset and STT verbose_json are documented follow-ups. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ailure Make the streaming sink strictly fallback-safe: an error before audio starts (provider/network error, empty 200) now returns false so useAssistantSpeech runs the buffered path, instead of surfacing a failure. This keeps a turn from being lost when only the buffered route works (e.g. a smoke test that stubs /api/voice/speak but not the stream route) and is the safer production default. Only a mid-stream failure after playback has begun degrades to text. Harden the worklet for short/empty streams: the "end" marker forces any sub-prime remainder to play out and completes once drained (no dependence on having crossed the prime threshold); a no-audio stream is reported as an error by the client rather than hanging. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The prior push did not trigger a CI run (GitHub synchronize glitch); the parent commit's full suite is green and this only adds an empty commit to re-attach the required checks to HEAD. Squashed away on merge. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

oscharko and others added 3 commits June 28, 2026 01:20

oscharko merged commit b28276e into dev Jun 28, 2026
13 checks passed

oscharko deleted the feat/keiko-voice-streaming-playout branch June 28, 2026 00:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(voice): streamed gapless PCM assistant speech via AudioWorklet#1644

feat(voice): streamed gapless PCM assistant speech via AudioWorklet#1644
oscharko merged 3 commits into
devfrom
feat/keiko-voice-streaming-playout

oscharko commented Jun 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

oscharko commented Jun 27, 2026

Summary

What changed

Verification (reproduced locally)

Invariants

Deferred follow-ups (documented)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant