Skip to content

Performance-Improvements for BDBA scans #720

@8R0WNI3

Description

@8R0WNI3

Context / Motivation

Gardenlinux features an evergrowing amount of flavours (22 at the time of the writing of this issue). As each single gardenlinux-image (rootfs-tar) takes quite some hours to scan using BDBA, this is an increasing concern (both because it makes feedback-loops for gardenlinux-assessments quite lengthy, but also, because gardenlinux-scans tend to block other scans), and should motivate us to take measures to improve performance.

Deduplicate redundancies between flavours

Gardenlinux-Flavours are expected to only differ by small amount of contained files, while sharing a lot of exactly identical files. While at first this may appear as a special-case, it should be reasonable to assume similar groups of flavours of filesystem-tree-based artefacts (as VM- and OCI-Images are).

Hence, ODG's BDBA-Extension should be extended such that groups of "artefact-flavours" can be configured to be scanned with deduplication between them.

Implementation Notes

Using streaming semantics, and doing deduplication incrementally (to limit both main-memory, but also storage-consumption), files from artefact-flavour-groups could be written into a directory-tree like so:

blobs.d/<hex-digest> # write each file here, using hex-digest as fname, hardlink into other directory
shared/<original-fname> # write files that are identical between all flavours here - use hardlinks to blobs.d
<flavour-id>/<original-fname> # write files that are _different_ between all flavours here - use hardlinks to blobs.

There are two possible approaches for triggering resulting scans (either way, blobs.d should of course be discarded after all artefacts were retrieved!):

  • either upload shared and flavours individually (caveat: this will result in a "virtual" artefact not present in OCM)
  • upload as one single archive (paths within archive can be used to map files to flavours)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/ipceiImportant Project of Common European Interestkind/featurenew feature, enhancement, improvement, extension

    Projects

    Status

    🛠️ Needs Refinement

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions