Summary
The embedded-runtime extractor trusts a cache directory on the presence of its .complete stamp alone, without verifying the expected binaries are actually in it. A version-keyed cache dir that was populated by a build whose manifest lacked boxlite-guest (e.g. a partial/variant build, or an SDK built in a different worktree) is then reused by a later process that does need the guest — and the missing-binary failure surfaces much later, at box start, instead of at extraction.
Where (as of 07fa30f9)
Fast path returns on stamp existence with no content check:
// src/boxlite/src/runtime/embedded.rs:104
let stamp = dir.join(".complete");
if stamp.exists() {
let now = filetime::FileTime::now();
let _ = filetime::set_file_mtime(&stamp, now);
return Ok(Self { dir }); // <- trusts the dir without verifying boxlite-guest is present
}
Release builds key the cache dir by version only (v{VERSION}/, see the module header at src/boxlite/src/runtime/embedded.rs:9-13), so two builds sharing a version but differing in embedded contents collide on the same path. The first writer's .complete then satisfies every later reader.
The stamp is written after all manifest files within a single extraction (embedded.rs:118-132), so a same-build extraction is internally consistent — the gap is specifically cross-build reuse of a version-keyed dir.
Impact
Low for published releases (all users on a version have identical binaries — the stated assumption). It bites dev / multi-worktree setups where SDKs built from different trees share ~/.local/share/boxlite/runtimes/v{VERSION}/: a guest-less .complete dir is trusted and box.start() later fails to find boxlite-guest.
Expected
Have the fast path verify the dir actually contains the expected runtime binaries (at minimum boxlite-guest) before trusting .complete; on mismatch, re-extract (or fail loudly at get() with a clear message) rather than returning a known-incomplete dir.
Current workaround (downstream)
apps/infra-local pre-scans the runtime cache and skips any .complete dir missing boxlite-guest, pinning a known-good one via BOXLITE_RUNTIME_DIR:
apps/infra-local/boxlite_local/config.py — pick_runtime_dir / resolve_runtime_dir.
This is exactly the verification we think belongs in the extractor's fast path.
Summary
The embedded-runtime extractor trusts a cache directory on the presence of its
.completestamp alone, without verifying the expected binaries are actually in it. A version-keyed cache dir that was populated by a build whose manifest lackedboxlite-guest(e.g. a partial/variant build, or an SDK built in a different worktree) is then reused by a later process that does need the guest — and the missing-binary failure surfaces much later, at box start, instead of at extraction.Where (as of
07fa30f9)Fast path returns on stamp existence with no content check:
Release builds key the cache dir by version only (
v{VERSION}/, see the module header atsrc/boxlite/src/runtime/embedded.rs:9-13), so two builds sharing a version but differing in embedded contents collide on the same path. The first writer's.completethen satisfies every later reader.The stamp is written after all manifest files within a single extraction (
embedded.rs:118-132), so a same-build extraction is internally consistent — the gap is specifically cross-build reuse of a version-keyed dir.Impact
Low for published releases (all users on a version have identical binaries — the stated assumption). It bites dev / multi-worktree setups where SDKs built from different trees share
~/.local/share/boxlite/runtimes/v{VERSION}/: a guest-less.completedir is trusted andbox.start()later fails to findboxlite-guest.Expected
Have the fast path verify the dir actually contains the expected runtime binaries (at minimum
boxlite-guest) before trusting.complete; on mismatch, re-extract (or fail loudly atget()with a clear message) rather than returning a known-incomplete dir.Current workaround (downstream)
apps/infra-localpre-scans the runtime cache and skips any.completedir missingboxlite-guest, pinning a known-good one viaBOXLITE_RUNTIME_DIR:apps/infra-local/boxlite_local/config.py—pick_runtime_dir/resolve_runtime_dir.This is exactly the verification we think belongs in the extractor's fast path.