feat(laguna): pager-blob warm-restore survives eviction#451
Draft
dusterbloom wants to merge 1 commit into
Draft
Conversation
…ter eviction LagunaCacheSnapshot gains is_pooled + kvflash_blob fields mirroring PrefixSnapshot in qwen35. snapshot_save() serializes the pager blob when the pool has relocated chunks instead of refusing. restore_and_generate_impl() deserializes the blob and prefills only the suffix [snap_pos, prompt_len) — the restore-consume path that survives eviction. Non-pooled (identity) saves use the existing GPU-tensor copy unchanged. kvflash_pager.h: forward-port serialize/deserialize/ledger/pinning from the perf/qwen35-decode-fuse-elementwise branch so laguna can call them. The header is arch-agnostic; the impl is shared with qwen35 once merged to main. Validated: -fsyntax-only clean (laguna_backend.cpp, single-TU). Full CUDA build + NIAH recall gate deferred (resource contention).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Port map (Step 1)
qwen35 source of truth → laguna gap:
PrefixSnapshot.is_pooled+kvflash_blobLagunaCacheSnapshothad neithersnapshot_save_pooled_at/snapshot_save(pooled branch):kvflash_pager_.serialize()→snap.kvflash_blobsnapshot_saverefused with!is_identity()guardrestore_and_generate_implpooled branch:kvflash_pager_.deserialize()+ suffix-only prefillrestore_and_generate_implalways did full diff re-prefill over identity-restored GPU tensorscommon/kvflash_pager.h:serialize(max_chunks)/deserialize()/ ledger / pinningkvflash_pager.hlacked these (added inperf/qwen35-decode-fuse-elementwise)QK scorer (
KvFlashQkPool) is phase 2 — laguna has noKvFlashQkPoolmember and the attention layer path (laguna_target_graph.cpp) does not capture Q/K vectors. This PR is the prerequisite snapshot/restore infra only.What is implemented
LagunaCacheSnapshot:is_pooled+kvflash_blobfields.LagunaBackend::snapshot_save: whenpool_relocated(i.e.cur_pos > pool_tokensor!is_identity()), serialize blob intosnap.kvflash_bloband setis_pooled=true. Non-pooled (identity) path unchanged.generate_implinline snap: same pool-relocated branch — blob save with chunk-alignedmax_chunks.restore_and_generate_impl: pooled branch useskvflash_pager_.deserialize()+ suffix-onlylaguna_stepfor[snap_pos, N); exact-hit re-embeds last token. Non-pooled branch untouched.common/kvflash_pager.h: forward-portserialize(max_chunks),deserialize(), ledger section (per-chunkwas_resident+score), critical-chunk pinning, andchunk_host_ptr/k_seg_bytes/v_seg_bytesaccessors.Validated
-fsyntax-onlyclean onlaguna_backend.cpp(single-TU, g++-11, all production flags).TODO (deferred — resource contention)
laguna_internal.hsnapshot free: clearkvflash_blobinlaguna_snapshot_freeto avoid holding stale host bytes after free (2-line follow-up).Composition note
This PR is a prerequisite for adding the QK scorer (phase 2). The QK
pool_chunk_hostfix from PR #446 does not apply to laguna directly — laguna has noKvFlashQkPool; that is the phase-2 addition once this blob infra is validated.