fix: add config option to exclude language packages with file ownership overlap#4905
fix: add config option to exclude language packages with file ownership overlap#4905kimjune01 wants to merge 4 commits into
Conversation
|
Pushed two test-only follow-ups (b4844d0, 64376c2): the original tests built |
…ip overlap Adds a new configuration option `exclude-language-overlap-by-ownership` that allows users to exclude language packages (Python, NPM, Ruby, etc.) from the SBOM when they overlap with OS packages (deb, rpm, apk). This prevents duplicate entries for packages installed via system package managers that are also detected by language-specific catalogers. Example: python3-django deb package vs. django Python package The feature is disabled by default to maintain backward compatibility. Resolves anchore#4760 Signed-off-by: June Kim <kimjune01@gmail.com>
Previously the table-driven test built pkg.Package literals without
calling SetID(), leaving every package ID as the empty string. The
collection generated its own ID on Add() but the original literals
remained empty, so the relationship From/To IDs were all '' and
c.Package('') returned nil. The function exited at the nil check
without ever evaluating identifyOverlappingLanguageRelationship, and
the assertion compared two empty strings — the test passed
unconditionally.
Inject panic('reached') before the return: tests passed (proof of
no-op). After SetID() is called and we drop the dynamic logic-mirror
in the assertion, panic injection now triggers — the function is
actually exercised.
Signed-off-by: June Kim <kimjune01@gmail.com>
Same root cause as the prior commit: pkg.Package literals had no IDs, so child.ID() == "" and the assertion compared empty to empty when the function correctly returned "" for the no-match case AND when it should have returned the child ID — both branches passed trivially. Verified by panic injection: prior to this fix the test passed even when the function panicked at entry. Signed-off-by: June Kim <kimjune01@gmail.com>
64376c2 to
a0b55b8
Compare
|
The
This could be intentional — e.g. Java/Go/Rust have different overlap semantics worth handling separately — or just an early-iteration scoping. If the latter, it might be worth either expanding the list to cover all language types, or codifying a criterion for inclusion ( |
…types languageCatalogerTypes was hand-maintained and missed installed-package language types that an OS package can subsume on file-ownership overlap: cocoapods, conan, dart-pub, hackage, hex, opam, php-pear, swift, swipl. Define the inclusion rule explicitly and enforce it with a test that derives the expected set from cataloger capabilities, so a new language cataloger fails CI until it is classified. Exclude types whose catalogers extract many components from a single OS-owned binary or fat archive (go-module, rust-crate, dotnet, graalvm-native-image, java-archive): OS ownership of the container does not make the embedded components redundant, so deleting them would drop distinct packages from the SBOM. The rule keys on pkg.Type, a coarse proxy; a follow-up could key on per-package installed-vs-declared evidence rather than type. Signed-off-by: June Kim <kimjune01@gmail.com>
|
I derived the set from syft's own
Tradeoff: excluding whole types means safe go.mod/Cargo.lock overlaps won't be deduped either — better to under-delete than lose a real component. Limits: the rule keys on |
|
The extractor-vs-installed split is the right framing; my initial cut missed it. Deriving the set from Spot-checked one #4974 item locally: |
Closes #4760.
Cause
When OS packages (deb, apk, rpm) install language packages (Python, Ruby, npm, etc.) via system package managers, syft catalogs both the OS package and the language package. The file ownership overlap relationship exists between them, but there was no mechanism to deduplicate.
The existing
exclude-binary-overlap-by-ownershipoption handles binary packages, but an equivalent for language packages was missing.Fix
Adds
exclude-language-overlap-by-ownershipconfig option that removes language packages when they have a file ownership overlap relationship with an OS package. This mirrors the existing binary exclusion logic. The option defaults tofalseto avoid changing existing behavior.The implementation follows the same pattern as
ExcludeBinaryPackagesByFileOwnershipOverlap: iterate relationships, identify OS-parent/language-child pairs, and delete the child package.Tests
Unit tests cover:
Signed-off-by: June Kim kimjune01@gmail.com