Skip to content

feat(build): word-level transcript → per-beat compose prompt (#204)#237

Merged
kiyeonjeon21 merged 2 commits into
mainfrom
feat/word-sync-transcript
Jun 22, 2026
Merged

feat(build): word-level transcript → per-beat compose prompt (#204)#237
kiyeonjeon21 merged 2 commits into
mainfrom
feat/word-sync-transcript

Conversation

@kiyeonjeon21

Copy link
Copy Markdown
Contributor

Closes the vibe build half of #204 — word-synced caption / kinetic-type animations.

What

The LLM composer now receives Whisper word-level narration timings so it can sync word reveals to speech. The deterministic vibe scene add emit path already did this; vibe build did not.

How

  • New _shared/transcribe-narration.tstranscribeNarrationWords() (provider-agnostic Whisper word-level transcription of generated narration), readBeatTranscript(), beatTranscriptRelPath(). The inline transcribe block in vibe scene add is refactored to reuse it (single source of truth).
  • Asset stage dispatchTranscript() — after narration, transcribe to assets/transcript-<beat>.json when narration exists and an OpenAI key is configured. Cached (skipped when the file exists and narration wasn't freshly regenerated); best-effort (no key / no words → skip, never fails the build). Opt out with --skip-transcript.
  • Prompt injection (formatTranscriptSection) — compact, deterministic timing section with a token-budget guard: word-level [start, "word"] table at/below 120 words; phrase-level approximate anchors above. Folded into the user prompt → the compose cache key invalidates when timings change. Carries the existing visual-sync-only guard (no <audio>/SFX).
  • Host-agent path (compose-prompts) — exposes transcriptPath and feeds the same timings into its prompt.

Behavior

  • Default on when narration + OpenAI key exist; --skip-transcript opts out. No key → graceful skip (narration still plays, no word-sync). Transcription is low/negligible cost (~$0.002/beat).

Tests

  • formatTranscriptSection: no / short / oversized transcript + clamping/rounding.
  • buildUserPrompt: section omitted without transcript, word-level with, cache-key changes when transcript changes.
  • asset stage: writes transcript-<beat>.json when words exist; --skip-transcript writes nothing.
  • readBeatTranscript / path helper / non-fatal failure path.
  • vibe build --describe schema snapshot updated for --skip-transcript.

Notes / follow-ups

  • Dry-run does not itemise the (~$0.002/beat) transcription cost as a separate line — intentional, below plan rounding noise.
  • LLM prompt-tuning for how aggressively it uses the timings is a separate quality pass.

Closes the `vibe build` half of word-sync animations (#204): the LLM
composer now receives Whisper word-level narration timings so it can sync
caption / kinetic-type reveals to speech. The deterministic `scene add`
emit path already supported this; build did not.

- Add `_shared/transcribe-narration.ts`: `transcribeNarrationWords()`
  (provider-agnostic Whisper word-level transcription of generated
  narration) + `readBeatTranscript()` + `beatTranscriptRelPath()`. Refactor
  the inline transcribe block in `vibe scene add` to reuse it.
- Asset stage gains `dispatchTranscript()`: after narration, transcribe to
  `assets/transcript-<beat>.json` when narration exists and an OpenAI key is
  configured. Cached (skipped when the file exists and narration was not
  freshly regenerated); best-effort (missing key / no words → skip, never
  fails the build). Gated by `--skip-transcript`.
- `buildUserPrompt` injects a compact, deterministic timing section with a
  token-budget guard (`formatTranscriptSection`): word-level table at/below
  120 words, phrase-level "approximate" anchors above. Folded into the user
  prompt, so the compose cache key invalidates when timings change. Carries
  the visual-sync-only guard (no `<audio>`/SFX).
- Host-agent path (`compose-prompts`) exposes `transcriptPath` and feeds the
  same timings into its prompt.
- Tests: no/short/oversized transcript, cache invalidation, asset-stage
  generation + `--skip-transcript`, transcript read/validation.
@vercel

vercel Bot commented Jun 22, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vibeframe Ready Ready Preview, Comment Jun 22, 2026 11:08am

Request Review

@kiyeonjeon21 kiyeonjeon21 merged commit 6972357 into main Jun 22, 2026
6 checks passed
@kiyeonjeon21 kiyeonjeon21 deleted the feat/word-sync-transcript branch June 22, 2026 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant