Skip to content

Fix Windows daemon crash during v1 to v2 local migration#1209

Merged
RhysSullivan merged 1 commit into
mainfrom
fix/windows-migration-fsync-eperm
Jun 29, 2026
Merged

Fix Windows daemon crash during v1 to v2 local migration#1209
RhysSullivan merged 1 commit into
mainfrom
fix/windows-migration-fsync-eperm

Conversation

@RhysSullivan

Copy link
Copy Markdown
Owner

Problem

The bundled CLI/desktop daemon crashed on first launch on Windows with a fatal Unknown error whenever a v1 local database was present. macOS and Linux were unaffected. This is the v1.5.23 Windows startup regression (the desktop Smoke test bundled executor job fails on Windows; mac/linux pass).

Root cause

The v1 to v2 data migration performs file operations on libSQL SQLite files (the canonical data.db, the .source-* copy, and the .building-* staging db, each with -wal/-shm sidecars). On Windows, libSQL releases its OS file handle in a native finalizer that only runs on GC, so the handle lingers past client.close(). POSIX releases immediately, which is why this never reproduced off Windows. The lingering handle caused:

  • EPERM from fsyncSync on a read-only ("r") fd: Windows FlushFileBuffers requires a writable handle.
  • EBUSY from rename/rm of just-closed files. The existing renameWithRetry helped some sites, but its fixed backoff could not outlast a handle that was never finalized, and several rm sites had no retry at all.

The error surfaced only as Unknown error because the CLI error renderer collapses it; --log-level debug reveals the EPERM: ... fsync / EBUSY: ... rename chain.

Fix (apps/local/src/db/v1-v2-migration.ts)

  1. fsyncFileIfExists opens the file read-write so the flush is permitted on Windows, and treats fsync as best-effort durability hardening (the rename/copy already wrote the bytes).
  2. The .source-*/.building-* removals now retry EBUSY/EPERM the same way renameWithRetry already did.
  3. renameWithRetry and rmWithRetry force a GC pass on each retry (Bun.gc(true), no-op off Bun) so libSQL's native finalizer releases the handle before the next attempt.

Verification

Reproduced and fixed on a real Windows VM by running the published bundle against a seeded v1 database:

  • Before: EPERM: operation not permitted, fsync then EBUSY on rm/rename, daemon never ready.
  • After: Migrated local Executor data to v2; moved old DB to ...data.db.v1-v2-<ts>, then EXECUTOR_READY:<port> / Daemon ready. The v2 db and the v1-v2-* backup are present and temp files cleaned up.

typecheck passes; the migration unit tests pass except 3 pre-existing crash-recovery tests that fail identically on main in this environment (unrelated to this change). The Windows desktop smoke job in CI exercises this exact path.

The v1->v2 data migration ran fsync/rename/remove on libSQL SQLite files
whose native OS handle lingers past close() on Windows, crashing the daemon
at boot with a fatal "Unknown error". POSIX never hits this, so it only
reproduced on Windows (the v1.5.23 desktop/CLI startup regression).

- fsyncFileIfExists opens read-write, since Windows FlushFileBuffers needs a
  writable handle (a read-only fd throws EPERM), and treats fsync as
  best-effort durability hardening.
- remove of the source/staging file sets now retries EBUSY/EPERM the same
  way rename already did.
- rename/remove retries force a GC pass first so libSQL's native finalizer
  releases the handle before the next attempt.

Verified on a Windows VM: a seeded v1 database now migrates to v2 and the
daemon announces ready.
@cloudflare-workers-and-pages

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
executor-marketing 86da524 Commit Preview URL

Branch Preview URL
Jun 29 2026, 07:22 AM

@cloudflare-workers-and-pages

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
executor-cloud 86da524 Jun 29 2026, 07:23 AM

@greptile-apps

greptile-apps Bot commented Jun 29, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a Windows-only daemon crash during the v1→v2 local database migration, caused by libSQL's OS file handles lingering past client.close() on Windows (they are released only on GC, unlike POSIX). The fix introduces retry-with-GC-nudge helpers (rmWithRetry, removeSqliteFileSetWithRetry, nudgeNativeHandleRelease) and opens files read-write in fsyncFileIfExists so FlushFileBuffers succeeds.

  • nudgeNativeHandleRelease calls Bun.gc(true) before each retry, flushing libSQL's native finalizer so the handle is released before the next attempt; it is a no-op outside the Bun runtime.
  • rmWithRetry mirrors the existing renameWithRetry pattern — retries EBUSY/EPERM up to ~8 s total — and is wired in everywhere a just-closed libSQL file set is removed.
  • fsyncFileIfExists now opens with "r+" instead of "r" so Windows FlushFileBuffers has a writable handle, and treats the whole fsync as best-effort (silently skips if the open or flush fails).

Confidence Score: 4/5

Safe to merge; directly addresses a verified Windows startup crash with a targeted, well-commented fix.

The migration logic is carefully structured: the retry helpers mirror the pre-existing renameWithRetry pattern, nudgeNativeHandleRelease is gated behind optional chaining so it's a no-op outside Bun, and fsyncFileIfExists is consistently best-effort. The one remaining unretried removal in copySqliteFileSet is safe in its current call site but is inconsistent with the rest of the file.

apps/local/src/db/v1-v2-migration.ts — the copySqliteFileSet function still uses the non-retry removal helpers internally.

Important Files Changed

Filename Overview
apps/local/src/db/v1-v2-migration.ts Adds nudgeNativeHandleRelease (Bun.gc), rmWithRetry, removeSqliteFileSetWithRetry, and fixes fsyncFileIfExists to open r+ on Windows; replaces all post-close bare fs.rmSync calls with the retried variants. One remaining unretried removeSqliteFileSet call lives inside copySqliteFileSet, but that target path is a fresh nonce-based name that has not been opened by a libSQL client at that point, so it is safe.
.changeset/windows-migration-handle-release.md Standard changeset entry describing the Windows startup regression fix; no code concerns.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant M as migrateLocalV1ToV2IfNeeded
    participant C as libSQL client
    participant GC as Bun.gc (nudgeNativeHandleRelease)
    participant FS as File System

    M->>C: client.close()
    Note over C,FS: Windows: OS handle lingers in native finalizer
    M->>GC: Bun.gc(true) — flush finalizer queue
    GC-->>FS: libSQL finalizer releases OS handle
    M->>FS: rmWithRetry / renameWithRetry (retry on EBUSY/EPERM)
    FS-->>M: success

    Note over M,FS: fsyncFileIfExists path
    M->>FS: "openSync(path, "r+")  <- writable handle for FlushFileBuffers"
    FS-->>M: fd
    M->>FS: "fsyncSync(fd)  <- best-effort, errors swallowed"
    M->>FS: closeSync(fd)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant M as migrateLocalV1ToV2IfNeeded
    participant C as libSQL client
    participant GC as Bun.gc (nudgeNativeHandleRelease)
    participant FS as File System

    M->>C: client.close()
    Note over C,FS: Windows: OS handle lingers in native finalizer
    M->>GC: Bun.gc(true) — flush finalizer queue
    GC-->>FS: libSQL finalizer releases OS handle
    M->>FS: rmWithRetry / renameWithRetry (retry on EBUSY/EPERM)
    FS-->>M: success

    Note over M,FS: fsyncFileIfExists path
    M->>FS: "openSync(path, "r+")  <- writable handle for FlushFileBuffers"
    FS-->>M: fd
    M->>FS: "fsyncSync(fd)  <- best-effort, errors swallowed"
    M->>FS: closeSync(fd)
Loading

Comments Outside Diff (1)

  1. apps/local/src/db/v1-v2-migration.ts, line 1126-1139 (link)

    P2 copySqliteFileSet calls the non-retried removeSqliteFileSet and a bare fs.rmSync on the target path. In practice this is safe today because the target is a nonce-named .source-* path that no libSQL client has ever touched, so lingering handles are not present. If the call site ever changes (e.g. re-using a known path after a client closes), these two removal calls would silently fail on Windows with EBUSY/EPERM. Switching them to the retry variants keeps the function self-consistent with the rest of the file.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "fix(local): survive Windows libSQL handl..." | Re-trigger Greptile

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Cloudflare preview

Torn down — the PR is closed.

@pkg-pr-new

pkg-pr-new Bot commented Jun 29, 2026

Copy link
Copy Markdown

Open in StackBlitz

@executor-js/cli

npm i https://pkg.pr.new/@executor-js/cli@1209

@executor-js/config

npm i https://pkg.pr.new/@executor-js/config@1209

@executor-js/execution

npm i https://pkg.pr.new/@executor-js/execution@1209

@executor-js/sdk

npm i https://pkg.pr.new/@executor-js/sdk@1209

@executor-js/codemode-core

npm i https://pkg.pr.new/@executor-js/codemode-core@1209

@executor-js/runtime-quickjs

npm i https://pkg.pr.new/@executor-js/runtime-quickjs@1209

@executor-js/plugin-file-secrets

npm i https://pkg.pr.new/@executor-js/plugin-file-secrets@1209

@executor-js/plugin-graphql

npm i https://pkg.pr.new/@executor-js/plugin-graphql@1209

@executor-js/plugin-keychain

npm i https://pkg.pr.new/@executor-js/plugin-keychain@1209

@executor-js/plugin-mcp

npm i https://pkg.pr.new/@executor-js/plugin-mcp@1209

@executor-js/plugin-onepassword

npm i https://pkg.pr.new/@executor-js/plugin-onepassword@1209

@executor-js/plugin-openapi

npm i https://pkg.pr.new/@executor-js/plugin-openapi@1209

executor

npm i https://pkg.pr.new/executor@1209

commit: 86da524

@RhysSullivan RhysSullivan merged commit ffa4f70 into main Jun 29, 2026
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant