Context / Motivation
Gardenlinux features an evergrowing amount of flavours (22 at the time of the writing of this issue). As each single gardenlinux-image (rootfs-tar) takes quite some hours to scan using BDBA, this is an increasing concern (both because it makes feedback-loops for gardenlinux-assessments quite lengthy, but also, because gardenlinux-scans tend to block other scans), and should motivate us to take measures to improve performance.
Deduplicate redundancies between flavours
Gardenlinux-Flavours are expected to only differ by small amount of contained files, while sharing a lot of exactly identical files. While at first this may appear as a special-case, it should be reasonable to assume similar groups of flavours of filesystem-tree-based artefacts (as VM- and OCI-Images are).
Hence, ODG's BDBA-Extension should be extended such that groups of "artefact-flavours" can be configured to be scanned with deduplication between them.
Implementation Notes
Using streaming semantics, and doing deduplication incrementally (to limit both main-memory, but also storage-consumption), files from artefact-flavour-groups could be written into a directory-tree like so:
blobs.d/<hex-digest> # write each file here, using hex-digest as fname, hardlink into other directory
shared/<original-fname> # write files that are identical between all flavours here - use hardlinks to blobs.d
<flavour-id>/<original-fname> # write files that are _different_ between all flavours here - use hardlinks to blobs.
There are two possible approaches for triggering resulting scans (either way, blobs.d should of course be discarded after all artefacts were retrieved!):
- either upload
shared and flavours individually (caveat: this will result in a "virtual" artefact not present in OCM)
- upload as one single archive (paths within archive can be used to map files to flavours)
Context / Motivation
Gardenlinux features an evergrowing amount of flavours (22 at the time of the writing of this issue). As each single gardenlinux-image (rootfs-tar) takes quite some hours to scan using BDBA, this is an increasing concern (both because it makes feedback-loops for gardenlinux-assessments quite lengthy, but also, because gardenlinux-scans tend to block other scans), and should motivate us to take measures to improve performance.
Deduplicate redundancies between flavours
Gardenlinux-Flavours are expected to only differ by small amount of contained files, while sharing a lot of exactly identical files. While at first this may appear as a special-case, it should be reasonable to assume similar groups of flavours of filesystem-tree-based artefacts (as VM- and OCI-Images are).
Hence, ODG's BDBA-Extension should be extended such that groups of "artefact-flavours" can be configured to be scanned with deduplication between them.
Implementation Notes
Using streaming semantics, and doing deduplication incrementally (to limit both main-memory, but also storage-consumption), files from artefact-flavour-groups could be written into a directory-tree like so:
There are two possible approaches for triggering resulting scans (either way, blobs.d should of course be discarded after all artefacts were retrieved!):
sharedand flavours individually (caveat: this will result in a "virtual" artefact not present in OCM)