fix(extract): account for Offsets struct in Linux dataStart#1
Open
xav-ie wants to merge 1 commit into
Open
Conversation
byte_count excludes the Offsets struct and trailer (per StandaloneModuleGraph.zig), so raw_bytes on Linux/ELF starts at trailerOffset - byteCount - sizeof(Offsets), not trailerOffset - byteCount. The previous formula landed 48 bytes past the real start, breaking every name/content lookup.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Running
extract.mjson a Bun-compiled Linux binary currently fails with anERR_INVALID_ARG_VALUEwhen writing the first extracted module — the parsed "name" is actually a slice of JavaScript source, not a path.Reproduces on any Linux ELF binary compiled with recent Bun. I hit it on
@anthropic-ai/claude-code-linux-x64@2.1.114(Bun 1.3.13).Root cause
StringPointeroffsets inside eachCompiledModuleGraphFileare relative to theraw_bytesregion Bun passes toStandaloneModuleGraph.fromBytes. That region contains[data .. module table .. Offsets struct .. trailer], so its total size isbyte_count + sizeof(Offsets) + trailer.length.The
byte_countfield in the Offsets struct excludes the Offsets struct and the trailer — per the comment inbun/src/StandaloneModuleGraph.zig:The Linux/ELF fallback was computing:
which lands 48 bytes (
sizeof(Offsets) + trailer.length) past the real start ofraw_bytes. Every subsequent lookup reads from the wrong offset, and module names come back as random minified JS.The Mach-O path derives
dataStartfrom the section header's own length (notbyte_count), so it's unaffected.Fix
Equivalent to
trailerEnd - (byteCount + Offsets + trailer), which is the actual start ofraw_bytes.Testing
Tested locally against
@anthropic-ai/claude-code-linux-x64@2.1.114(Bun 1.3.13, 225 MB binary):src/entrypoints/cli.js(12.4 MB),image-processor.js,audio-capture.js, and two.nodeELF addonscli.jspassesnode --checkcli.jsstarts with expected// @bun @bytecode @bun-cjswrapper and ends cleanly (}Qt1();})— no truncation or overrun).nodefiles parse as ELF 64-bit shared objectsresplit.mjsprocesses the extracted bundle end-to-end: 4,987 module files, 18,954 dependency edgesKnown related issue (not addressed here)
MODULE_SIZE = 52is hardcoded. In Bun 1.3.5 the struct is 36 bytes (nomodule_infoorbytecode_origin_pathfields yet); Bun added those somewhere between 1.3.5 and 1.3.13. On older binaries the current code silently reports "Modules: 0". Happy to submit a follow-up if desired.