Skip to content

feat(laguna): pager-blob warm-restore survives eviction#451

Draft
dusterbloom wants to merge 1 commit into
Luce-Org:mainfrom
dusterbloom:feat/laguna-warm-restore
Draft

feat(laguna): pager-blob warm-restore survives eviction#451
dusterbloom wants to merge 1 commit into
Luce-Org:mainfrom
dusterbloom:feat/laguna-warm-restore

Conversation

@dusterbloom

@dusterbloom dusterbloom commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Port map (Step 1)

qwen35 source of truth → laguna gap:

qwen35 laguna gap action
PrefixSnapshot.is_pooled + kvflash_blob LagunaCacheSnapshot had neither Add both fields
snapshot_save_pooled_at / snapshot_save (pooled branch): kvflash_pager_.serialize()snap.kvflash_blob snapshot_save refused with !is_identity() guard Replace refusal with serialize path
restore_and_generate_impl pooled branch: kvflash_pager_.deserialize() + suffix-only prefill restore_and_generate_impl always did full diff re-prefill over identity-restored GPU tensors Add pooled branch with deserialize + restore-consume
common/kvflash_pager.h: serialize(max_chunks) / deserialize() / ledger / pinning origin/main kvflash_pager.h lacked these (added in perf/qwen35-decode-fuse-elementwise) Forward-port the header; it is arch-agnostic

QK scorer (KvFlashQkPool) is phase 2 — laguna has no KvFlashQkPool member and the attention layer path (laguna_target_graph.cpp) does not capture Q/K vectors. This PR is the prerequisite snapshot/restore infra only.

What is implemented

  • LagunaCacheSnapshot: is_pooled + kvflash_blob fields.
  • LagunaBackend::snapshot_save: when pool_relocated (i.e. cur_pos > pool_tokens or !is_identity()), serialize blob into snap.kvflash_blob and set is_pooled=true. Non-pooled (identity) path unchanged.
  • generate_impl inline snap: same pool-relocated branch — blob save with chunk-aligned max_chunks.
  • restore_and_generate_impl: pooled branch uses kvflash_pager_.deserialize() + suffix-only laguna_step for [snap_pos, N); exact-hit re-embeds last token. Non-pooled branch untouched.
  • common/kvflash_pager.h: forward-port serialize(max_chunks), deserialize(), ledger section (per-chunk was_resident + score), critical-chunk pinning, and chunk_host_ptr / k_seg_bytes / v_seg_bytes accessors.

Validated

  • -fsyntax-only clean on laguna_backend.cpp (single-TU, g++-11, all production flags).

TODO (deferred — resource contention)

  • Full CUDA build + relink.
  • NIAH recall gate under warm-restore (same phase3 harness as qwen35moe).
  • laguna_internal.h snapshot free: clear kvflash_blob in laguna_snapshot_free to avoid holding stale host bytes after free (2-line follow-up).

Composition note

This PR is a prerequisite for adding the QK scorer (phase 2). The QK pool_chunk_host fix from PR #446 does not apply to laguna directly — laguna has no KvFlashQkPool; that is the phase-2 addition once this blob infra is validated.

Review in cubic

…ter eviction

LagunaCacheSnapshot gains is_pooled + kvflash_blob fields mirroring
PrefixSnapshot in qwen35. snapshot_save() serializes the pager blob when
the pool has relocated chunks instead of refusing. restore_and_generate_impl()
deserializes the blob and prefills only the suffix [snap_pos, prompt_len) —
the restore-consume path that survives eviction. Non-pooled (identity) saves
use the existing GPU-tensor copy unchanged.

kvflash_pager.h: forward-port serialize/deserialize/ledger/pinning from the
perf/qwen35-decode-fuse-elementwise branch so laguna can call them. The header
is arch-agnostic; the impl is shared with qwen35 once merged to main.

Validated: -fsyntax-only clean (laguna_backend.cpp, single-TU).
Full CUDA build + NIAH recall gate deferred (resource contention).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant