Skip to content

fix(extract): account for Offsets struct in Linux dataStart#1

Open
xav-ie wants to merge 1 commit into
vicnaum:masterfrom
xav-ie:fix/linux-dataStart-byte-count
Open

fix(extract): account for Offsets struct in Linux dataStart#1
xav-ie wants to merge 1 commit into
vicnaum:masterfrom
xav-ie:fix/linux-dataStart-byte-count

Conversation

@xav-ie

@xav-ie xav-ie commented Apr 20, 2026

Copy link
Copy Markdown

Problem

Running extract.mjs on a Bun-compiled Linux binary currently fails with an ERR_INVALID_ARG_VALUE when writing the first extracted module — the parsed "name" is actually a slice of JavaScript source, not a path.

Reproduces on any Linux ELF binary compiled with recent Bun. I hit it on @anthropic-ai/claude-code-linux-x64@2.1.114 (Bun 1.3.13).

Root cause

StringPointer offsets inside each CompiledModuleGraphFile are relative to the raw_bytes region Bun passes to StandaloneModuleGraph.fromBytes. That region contains [data .. module table .. Offsets struct .. trailer], so its total size is byte_count + sizeof(Offsets) + trailer.length.

The byte_count field in the Offsets struct excludes the Offsets struct and the trailer — per the comment in bun/src/StandaloneModuleGraph.zig:

"the length of the module graph with padding, excluding the trailer and offsets"

The Linux/ELF fallback was computing:

dataStart = trailerOffset + trailerBuf.length - byteCount;

which lands 48 bytes (sizeof(Offsets) + trailer.length) past the real start of raw_bytes. Every subsequent lookup reads from the wrong offset, and module names come back as random minified JS.

The Mach-O path derives dataStart from the section header's own length (not byte_count), so it's unaffected.

Fix

dataStart = trailerOffset - byteCount - OFFSETS_SIZE;

Equivalent to trailerEnd - (byteCount + Offsets + trailer), which is the actual start of raw_bytes.

Testing

Tested locally against @anthropic-ai/claude-code-linux-x64@2.1.114 (Bun 1.3.13, 225 MB binary):

  • 5 modules extracted with correct names: src/entrypoints/cli.js (12.4 MB), image-processor.js, audio-capture.js, and two .node ELF addons
  • cli.js passes node --check
  • cli.js starts with expected // @bun @bytecode @bun-cjs wrapper and ends cleanly (}Qt1();}) — no truncation or overrun)
  • Extracted .node files parse as ELF 64-bit shared objects
  • Downstream resplit.mjs processes the extracted bundle end-to-end: 4,987 module files, 18,954 dependency edges
  • Mach-O path unchanged (only the Linux/ELF line, the comment, and a constant extraction differ in the diff)

Known related issue (not addressed here)

MODULE_SIZE = 52 is hardcoded. In Bun 1.3.5 the struct is 36 bytes (no module_info or bytecode_origin_path fields yet); Bun added those somewhere between 1.3.5 and 1.3.13. On older binaries the current code silently reports "Modules: 0". Happy to submit a follow-up if desired.

byte_count excludes the Offsets struct and trailer (per
StandaloneModuleGraph.zig), so raw_bytes on Linux/ELF starts at
trailerOffset - byteCount - sizeof(Offsets), not trailerOffset - byteCount.
The previous formula landed 48 bytes past the real start, breaking every
name/content lookup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant