feat(laguna): pager-blob warm-restore survives eviction by dusterbloom · Pull Request #451 · Luce-Org/lucebox-hub

dusterbloom · 2026-06-25T16:27:00Z

Port map (Step 1)

qwen35 source of truth → laguna gap:

qwen35	laguna gap	action
`PrefixSnapshot.is_pooled` + `kvflash_blob`	`LagunaCacheSnapshot` had neither	Add both fields
`snapshot_save_pooled_at` / `snapshot_save` (pooled branch): `kvflash_pager_.serialize()` → `snap.kvflash_blob`	`snapshot_save` refused with `!is_identity()` guard	Replace refusal with serialize path
`restore_and_generate_impl` pooled branch: `kvflash_pager_.deserialize()` + suffix-only prefill	`restore_and_generate_impl` always did full diff re-prefill over identity-restored GPU tensors	Add pooled branch with deserialize + restore-consume
`common/kvflash_pager.h`: `serialize(max_chunks)` / `deserialize()` / ledger / pinning	origin/main `kvflash_pager.h` lacked these (added in `perf/qwen35-decode-fuse-elementwise`)	Forward-port the header; it is arch-agnostic

QK scorer (KvFlashQkPool) is phase 2 — laguna has no KvFlashQkPool member and the attention layer path (laguna_target_graph.cpp) does not capture Q/K vectors. This PR is the prerequisite snapshot/restore infra only.

What is implemented

LagunaCacheSnapshot: is_pooled + kvflash_blob fields.
LagunaBackend::snapshot_save: when pool_relocated (i.e. cur_pos > pool_tokens or !is_identity()), serialize blob into snap.kvflash_blob and set is_pooled=true. Non-pooled (identity) path unchanged.
generate_impl inline snap: same pool-relocated branch — blob save with chunk-aligned max_chunks.
restore_and_generate_impl: pooled branch uses kvflash_pager_.deserialize() + suffix-only laguna_step for [snap_pos, N); exact-hit re-embeds last token. Non-pooled branch untouched.
common/kvflash_pager.h: forward-port serialize(max_chunks), deserialize(), ledger section (per-chunk was_resident + score), critical-chunk pinning, and chunk_host_ptr / k_seg_bytes / v_seg_bytes accessors.

Validated

-fsyntax-only clean on laguna_backend.cpp (single-TU, g++-11, all production flags).

TODO (deferred — resource contention)

Full CUDA build + relink.
NIAH recall gate under warm-restore (same phase3 harness as qwen35moe).
laguna_internal.h snapshot free: clear kvflash_blob in laguna_snapshot_free to avoid holding stale host bytes after free (2-line follow-up).

Composition note

This PR is a prerequisite for adding the QK scorer (phase 2). The QK pool_chunk_host fix from PR #446 does not apply to laguna directly — laguna has no KvFlashQkPool; that is the phase-2 addition once this blob infra is validated.

…ter eviction LagunaCacheSnapshot gains is_pooled + kvflash_blob fields mirroring PrefixSnapshot in qwen35. snapshot_save() serializes the pager blob when the pool has relocated chunks instead of refusing. restore_and_generate_impl() deserializes the blob and prefills only the suffix [snap_pos, prompt_len) — the restore-consume path that survives eviction. Non-pooled (identity) saves use the existing GPU-tensor copy unchanged. kvflash_pager.h: forward-port serialize/deserialize/ledger/pinning from the perf/qwen35-decode-fuse-elementwise branch so laguna can call them. The header is arch-agnostic; the impl is shared with qwen35 once merged to main. Validated: -fsyntax-only clean (laguna_backend.cpp, single-TU). Full CUDA build + NIAH recall gate deferred (resource contention).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(laguna): pager-blob warm-restore survives eviction#451

feat(laguna): pager-blob warm-restore survives eviction#451
dusterbloom wants to merge 1 commit into
Luce-Org:mainfrom
dusterbloom:feat/laguna-warm-restore

dusterbloom commented Jun 25, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dusterbloom commented Jun 25, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Port map (Step 1)

What is implemented

Validated

TODO (deferred — resource contention)

Composition note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dusterbloom commented Jun 25, 2026 •

edited by cubic-dev-ai Bot

Loading