Skip to content

refactor(producer): extract extractVideosStage + add materializeSymlinks param#725

Merged
jrusso1020 merged 1 commit into
mainfrom
05-12-refactor_producer_extract_extractvideosstage_add_materializesymlinks_param
May 12, 2026
Merged

refactor(producer): extract extractVideosStage + add materializeSymlinks param#725
jrusso1020 merged 1 commit into
mainfrom
05-12-refactor_producer_extract_extractvideosstage_add_materializesymlinks_param

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 commented May 12, 2026

What

Phase 1 PR 1.4 of the distributed-render refactor. Moves the video frame extraction sub-stage of executeRenderJob into packages/producer/src/services/render/stages/extractVideosStage.ts. The sequencer now calls runExtractVideosStage at the same code point with identical inputs and outputs.

Source range moved: renderOrchestrator.ts:1979-2100 from before this PR (the block between the "── Stage 2: Video frame extraction ──" comment and the start of "── HDR auto-detection ──").

Also adds a new materializeSymlinks: boolean parameter (default false) on extractVideosStage AND on materializeExtractedFramesForCompiledDir. When true, the helper invokes cpSync(recursive) instead of symlinkSync so the staged frames are real files inside compiledDir — required for distributed plan() output where the planDir must be self-contained across machines (symlinks don't survive S3 / GCS round-trips). Default false preserves the in-process renderer's symlink behavior.

Why

Continues the Phase 1 mechanical extraction that splits the ~2000-line executeRenderJob into 8 stage functions plus a thin sequencer. Phase 1 ships zero new functionality — it's purely a reviewable refactor that sets up the codebase for follow-on determinism hardening and the new distributed primitives.

The materializeSymlinks parameter is additive — there are no in-tree callers that pass true yet, so in-process behavior is preserved exactly. The implementation is tested so it's not dead code (one of the lessons from the previous stack's reviews).

How

  • New file extractVideosStage.ts exports runExtractVideosStage(input) → ExtractVideosStageResult. The function body is the existing video-extraction code lifted verbatim. Returns 11 outputs that the downstream HDR auto-detection block, capture stages, and HDR composite path all consume.
  • materializeExtractedFramesForCompiledDir:
    • Adds cpSync to the injected MaterializeFileSystem type and to the module-scope materializeFileSystem const.
    • Adds materializeSymlinks?: boolean to MaterializeExtractedFramesOptions.
    • Branches between cpSync(..., { recursive: true }) and symlinkSync(...) based on the option.
  • Sequencer (renderOrchestrator.ts):
    • Calls runExtractVideosStage(...) at the same code point.
    • Destructures the 11 result fields into the same local names the downstream code expects.
    • Removes the now-orphaned imports extractAllVideoFrames, resolveProjectRelativeSrc, createFrameLookupTable, FrameLookupTable, detectTransfer, isHdrColorSpace, extractMediaMetadata, VideoColorSpace (all flagged by oxlint after the extraction).

Preserved invariants

  • composition.audios is still mutated in place to add audio entries auto-discovered from video files via ffprobe.
  • perfStages.videoExtractMs is set at the same end-of-stage point with the same value.
  • materializeExtractedFramesForCompiledDir is called exactly once when extractionResult.extracted is non-empty.
  • force-sdr mode still skips ALL ffprobe overhead (both video and image color-space probes).
  • The HDR auto-detection block (the "── HDR auto-detection ──" section that consumes the stage's outputs) remains inline in the sequencer unchanged.

Test plan

  • bunx oxlint packages/producer/src/services/render/stages/extractVideosStage.ts packages/producer/src/services/renderOrchestrator.ts packages/producer/src/services/renderOrchestrator.test.ts — clean.
  • bunx oxfmt --check on the same files — clean.
  • bun run --filter @hyperframes/producer typecheck — clean.
  • bun run --filter @hyperframes/producer build — clean.
  • bun test packages/producer/src/services/ — 176 pass, 1 pre-existing unrelated failure (writeCompiledArtifacts — rejects a maliciously crafted key) that also fails on main.
  • New unit test for the materializeSymlinks: true cpSync code path; existing symlink test updated with a parallel guard that cpSync is NOT invoked when the option is false.
  • docker run hyperframes-producer:test --sequential font-variant-numeric many-cuts variables-prod sub-composition-video — 4/4 PASS with audio correlations 1.000 / 0.994 / 0.975 / 0.947. sub-composition-video is the fixture that exercises the video-extraction path end-to-end.
  • Full regression matrix on CI via the regression workflow.

🤖 Generated with Claude Code

…nks param

Move the video frame extraction sub-stage out of `executeRenderJob` into
`services/render/stages/extractVideosStage.ts`. The stage covers HDR
color-space pre-detection for videos and images, the
`extractAllVideoFrames` call, frame-lookup-table construction, video
readiness skip-id collection, video metadata hints, and the auto-detect
of audio tracks from video files.

Hard constraints preserved verbatim:
- `composition.audios` is still mutated in place to add audio entries
  auto-discovered from video files via ffprobe.
- `perfStages.videoExtractMs` is set at the same end-of-stage point.
- `materializeExtractedFramesForCompiledDir` is still called once when
  `extractionResult.extracted` is non-empty.
- `force-sdr` mode still skips ALL ffprobe overhead.

New for distributed mode (`materializeSymlinks: boolean`, default false):
- Plumbs through to `materializeExtractedFramesForCompiledDir` via a new
  option of the same name. When `true`, the helper invokes
  `cpSync(recursive)` instead of `symlinkSync` so the staged frames are
  real files inside `compiledDir`. Symlinks don't survive S3 / GCS
  round-trips, so distributed `plan()` will pass `true` once it lands.
  Default `false` preserves the in-process renderer's symlink behavior.
- New unit test covers the copy path and asserts `symlinkSync` is NOT
  invoked; the existing symlink test was updated with the parallel
  guard that `cpSync` is NOT invoked.

Removes the imports the orchestrator no longer needs after the
extraction: `extractAllVideoFrames`, `resolveProjectRelativeSrc`,
`createFrameLookupTable`, `FrameLookupTable`, `detectTransfer`,
`isHdrColorSpace`, `extractMediaMetadata`, `VideoColorSpace` (oxlint
flagged each).

Verified:
- `bunx oxlint` + `bunx oxfmt --check` clean
- `bun run --filter @hyperframes/producer typecheck` + `build` clean
- `bun test packages/producer/src/services/` — 176 pass, 1 pre-existing
  unrelated failure
- `docker run hyperframes-producer:test` against `font-variant-numeric`,
  `many-cuts`, `variables-prod`, `sub-composition-video` — 4/4 PASS with
  correlations 1.000 / 0.994 / 0.947 / 0.975 (sub-composition-video is
  the one that exercises video frame extraction end-to-end)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: approve.

PR 1.4 of the stack. Pure extraction of the video-extract sub-stage plus an additive materializeSymlinks param for distributed mode.

Verified the extraction is byte-clean against renderOrchestrator.ts pre-#725:

  • Probe ordering preserved: video probe → image probe → extractAllVideoFramesmaterializeExtractedFramesForCompiledDir → frame lookup → readiness skip IDs → metadata hints → composition.audios mutation. All inside the same if (composition.videos.length > 0) guard.
  • composition.audios mutation preserved verbatim — the auto-discovered audio entries still mutate the orchestrator-owned object. (This is the existing mutation that mirrors the cfg.forceScreenshot smell flagged on the foundation stack; not new debt.)
  • force-sdr short-circuit on both probe loops preserved.
  • perfStages.videoExtractMs semantics preserved: the stage's stage2Start covers the full window (probe loops + extract + materialize), same as pre-extraction. Both the videos-present and videos-absent branches now collapse to a single Date.now() - stage2Start at the end of the stage, which matches the pre-#725 behavior exactly (the old else branch did the same thing).
  • 11 destructured return fields line up with the 11 locals the downstream HDR auto-detection block consumes — verified by name match on the sequencer side.

Tests: the new cpSync path test asserts symlinkSync is NOT invoked (and vice versa for the symlink path) — that's exactly the parity-guard pattern the foundation stack established. Good.

Important (non-blocking, carry-forward):

  • packages/producer/src/services/render/stages/extractVideosStage.ts:213-232 — the stage still mutates composition.audios in place via composition.audios.push(...). The stage interface declares composition: CompositionMetadata and the JSDoc on ExtractVideosStageInput acknowledges the mutation, but for retry/replay safety this should eventually move into a result field (e.g. discoveredAudios: AudioElement[]) the sequencer merges. Same shape as the cfg.forceScreenshot debt on the foundation stack — flagged for the chunk-rendering PR where retry-safety becomes load-bearing, not blocking here.

Nits:

  • extractVideosStage.ts:33-44 — imports RenderJob, collectVideoMetadataHints, collectVideoReadinessSkipIds, materializeExtractedFramesForCompiledDir from ../../renderOrchestrator.js. Same runtime cycle pattern the capture stages (#730/#731/#733) introduce; #737 only solves the updateJobStatus end of it. Worth a tracking line in the stage's header comment so the next PR consolidates materializeExtractedFramesForCompiledDir + the two collect* helpers into shared.ts alongside updateJobStatus.
  • materializeSymlinks defaults to false. If a single composition's frames could ever be cached and re-materialized with different values, the cached outputDir would collide. Not a problem today (one renderer, one mode per render), but worth noting in the planHash invariant once distributed plan-cache lands — the param is a stage input that affects on-disk layout.

Praise: the new param is gated behind a default so in-tree callers are unchanged, the new code path is covered by a real test (avoiding the "added but unused" landmine from earlier stacks), and the 8 dropped orchestrator imports (extractAllVideoFrames, resolveProjectRelativeSrc, createFrameLookupTable, FrameLookupTable, detectTransfer, isHdrColorSpace, extractMediaMetadata, VideoColorSpace) are all flagged by oxlint — the diff hygiene is clean.

— Vai

@jrusso1020 jrusso1020 merged commit 34d732e into main May 12, 2026
42 of 51 checks passed
@jrusso1020 jrusso1020 deleted the 05-12-refactor_producer_extract_extractvideosstage_add_materializesymlinks_param branch May 12, 2026 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants