Skip to content

Add file-based toolchain detection#4990

Open
wagoodman wants to merge 1 commit into
mainfrom
add-file-based-toolchain-detector
Open

Add file-based toolchain detection#4990
wagoodman wants to merge 1 commit into
mainfrom
add-file-based-toolchain-detector

Conversation

@wagoodman

@wagoodman wagoodman commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Expands the executable cataloger to detect build tools such as compilers and linkers that actually built a binary. Results show up as a new toolchains field on the executable file metadata, so you can tell a GCC-built binary from a Clang one, or spot which linker was used, straight from the file itself.

Each detected entry records a name, a version (when we can pull one), and a component (compiler vs linker today, with room to grow into assemblers and runtimes later).

Example entry in the .files section of the syft json output:

{
  "id": "dae2c7632f4d4a42",
  "location": {
    "path": "/usr/lib/x86_64-linux-gnu/libubsan.so.1.0.0",
    "layerID": "sha256:eb20a6a75cbe349fec27d8830233974f8d1bfd356878792060042e7b48b31811"
  },
  ...
  "executable": {
    ...
    "toolchains": [
      {
        "name": "gcc",
        "version": "14.2.0",
        "component": "compiler"
      }
    ]
  }
}

Note: this feature was extracted out of a defunct PR #4454

What it detects today:

  • compilers: Go, GCC, Clang, Intel oneAPI (icx), Rust, and the GCC frontends gfortran, gdc (GNU D), gccgo, and gnat (GNU Ada)
  • linkers: LLD, mold, gold

How it works (no new catalogers, this all rides on the existing executable cataloger):

  • Go: read the embedded build info, and this runs across ELF, Mach-O, and PE
  • GCC / Clang / Intel / Rust: scan the ELF .comment section for the producer strings these compilers leave behind. icx is checked before clang since it's a clang fork that carries both strings, and clang is checked before gcc for the same reason
  • GCC frontends: gfortran, gdc, gccgo, and gnat all share the same GCC: (...) <version> comment, so we disambiguate them by the language-specific runtime symbols they pull in (e.g. MAIN__/_gfortran_*, _Dmain, __go_go, adainit/__gnat_*). This also keeps a cgo-enabled gc binary from being misread as gccgo. Falls back to plain gcc
  • linkers: LLD and mold write a version into .comment; gold drops a .note.gnu.gold-version note section. GNU ld (BFD) leaves no marker, so it's intentionally not detected

Detailed changes:

  • .comment strings and the static + dynamic symbol tables are parsed once per binary and shared across detectors rather than re-read per check
  • new Toolchain and ToolchainComponent types on file.Executable
  • testdata with Makefiles and Dockerfiles that build real sample binaries for each toolchain, so the tests run against genuine compiler/linker output rather than fixtures we hand-rolled
  • a TODO map in toolchains.go documenting the next detectors we'd want and where their signal lives (rustc commit hashes, Swift, Haskell GHC, GNU as/ld, MSVC rich header, .NET CLR, Mach-O Swift, etc.)

Schema bumped to 16.1.5 for the new field.

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
@oss-housekeeper

This comment was marked as outdated.

@wagoodman wagoodman changed the title add file-based toolchain detection Add file-based toolchain detection Jun 18, 2026
@wagoodman wagoodman self-assigned this Jun 18, 2026
@wagoodman wagoodman added this to OSS Jun 18, 2026
@wagoodman wagoodman moved this to In Review in OSS Jun 18, 2026
@wagoodman wagoodman requested a review from a team June 18, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant