Skip to content

runtime: embedded-runtime fast path trusts .complete without verifying boxlite-guest is present #786

Description

@DorianZheng

Summary

The embedded-runtime extractor trusts a cache directory on the presence of its .complete stamp alone, without verifying the expected binaries are actually in it. A version-keyed cache dir that was populated by a build whose manifest lacked boxlite-guest (e.g. a partial/variant build, or an SDK built in a different worktree) is then reused by a later process that does need the guest — and the missing-binary failure surfaces much later, at box start, instead of at extraction.

Where (as of 07fa30f9)

Fast path returns on stamp existence with no content check:

// src/boxlite/src/runtime/embedded.rs:104
let stamp = dir.join(".complete");
if stamp.exists() {
    let now = filetime::FileTime::now();
    let _ = filetime::set_file_mtime(&stamp, now);
    return Ok(Self { dir });   // <- trusts the dir without verifying boxlite-guest is present
}

Release builds key the cache dir by version only (v{VERSION}/, see the module header at src/boxlite/src/runtime/embedded.rs:9-13), so two builds sharing a version but differing in embedded contents collide on the same path. The first writer's .complete then satisfies every later reader.

The stamp is written after all manifest files within a single extraction (embedded.rs:118-132), so a same-build extraction is internally consistent — the gap is specifically cross-build reuse of a version-keyed dir.

Impact

Low for published releases (all users on a version have identical binaries — the stated assumption). It bites dev / multi-worktree setups where SDKs built from different trees share ~/.local/share/boxlite/runtimes/v{VERSION}/: a guest-less .complete dir is trusted and box.start() later fails to find boxlite-guest.

Expected

Have the fast path verify the dir actually contains the expected runtime binaries (at minimum boxlite-guest) before trusting .complete; on mismatch, re-extract (or fail loudly at get() with a clear message) rather than returning a known-incomplete dir.

Current workaround (downstream)

apps/infra-local pre-scans the runtime cache and skips any .complete dir missing boxlite-guest, pinning a known-good one via BOXLITE_RUNTIME_DIR:

  • apps/infra-local/boxlite_local/config.pypick_runtime_dir / resolve_runtime_dir.

This is exactly the verification we think belongs in the extractor's fast path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions