You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once kubescape/storage#319 (server-side merge of user-managed ug- ApplicationProfile / NetworkNeighborhood into the consolidated ContainerProfile) is deployed, node-agent's client-side ug- fetch + projection in containerprofilecache becomes redundant. storage#319 serves the already-merged CP on GET, so node-agent re-fetching and re-projecting the same ug- overlay is a duplicate of work the server now does.
This issue tracks removing that client-side ug- merge — gated on storage#319 being deployed — and the constraints that make it non-trivial.
Context: this is the node-agent half of the consolidation introduced by storage#319; the client-side merge was added in #788 (port of projection.go).
ContainerProfileCacheImpl.tryPopulateEntry (pkg/objectcache/containerprofilecache/containerprofilecache.go) performs two distinct overlay passes:
User-managed (ug-) pass — the one storage#319 subsumes.
Fetches userManagedAP / userManagedNN by the well-known name UserApplicationProfilePrefix + workloadName (ug-<workload>), ~lines 331–369.
Applied via projectUserProfiles(cp, userManagedAP, userManagedNN, …) (~line 462), the "user-managed projection pass".
Covered by Test_12_MergingProfilesTest / Test_13_MergingNetworkNeighborhoodTest.
User-defined overlay pass — NOT covered by storage#319.
Fetches userAP / userNN by the name in pod label UserDefinedProfileMetadataKey → overlayName (~lines 393–422).
Applied via buildEntry(cp, userAP, userNN, …) (~line 470).
storage#319 keys only off ug-<workloadSlug>, so it replaces pass (1) only.
Why a double-merge is currently harmless (so this is cleanup, not a fix)
With storage#319 live, node-agent GETs an already-merged CP and then re-projects ug- on top of it. I checked projection.go: capabilities/execs/opens/syscalls are raw-appended, but the effective semantics downstream are allow-list / set-union / user-wins, so duplicate entries don't change any enforcement decision. Net effect of the double-merge is redundant RPCs (two extra GETs per container populate/refresh tick) and minor bloat — not a correctness regression. Removing pass (1) eliminates that redundant work.
Proposed change
Remove only the user-managed (ug-) pass:
Drop the ug-<workload>userManagedAP / userManagedNN fetches in tryPopulateEntry and the corresponding refresh in reconciler.go.
Drop the projectUserProfiles(...) user-managed projection call; keep buildEntry(cp, userAP, userNN, …).
Keep the user-defined overlay pass (2) intact — it is a separate, still-client-side feature.
Clean up now-dead surface: UserManagedAPRV / UserManagedNNRV tracking fields, shouldLogOptionalUserManagedFetchError, user-managed branches of emitOverlayMetrics, the userManagedAP/NN arms of the "need SOMETHING to cache" / synthetic-CP fallback, and the reconciler's ug--RV-change re-projection trigger.
projection.go / projectUserProfiles itself likely still serves pass (2) — confirm before deleting; remove only if pass (2) doesn't use it.
Constraints / gotchas
Order dependency (release gate). Removing pass (1) while a cluster still runs pre-Feature/path #319 storage means the merged CP isn't served and ug- exceptions silently vanish → enforcement gap. This change must only ship after storage#319 is deployed. Recommend a minimum-storage-version floor (or a feature flag defaulting off until the storage version is confirmed) rather than relying on rollout ordering alone.
Don't remove the user-defined (label) overlay (pass 2). Open question for storage: is the label-driven UserDefinedProfileMetadataKey overlay also meant to move server-side eventually? If yes, that's a separate follow-up; if no, pass (2) stays in node-agent indefinitely.
storage#319 merges on GET only, not List/Watch. node-agent is safe because its CP cache is fully GET-driven (container callbacks + reconciler GetContainerProfile; no CP informer/watch). But this is the load-bearing assumption: if node-agent ever moves CP reads to a list-watch informer, the server-side merge would be bypassed and ug- overlays would disappear. Worth a code comment near the CP GET noting the dependency. (Separately flagged to the storage side as a request to make List/Watch merged-consistent or document the GET-only contract.)
Validation
Test_12_MergingProfilesTest / Test_13_MergingNetworkNeighborhoodTest should pass with ug- enforcement now coming from the server-merged CP (point them at a storage build with Feature/path #319, or stub a merged CP).
Confirm rules still alert on events absent from the merged base+ug- profile.
Confirm the user-defined (label) overlay path still works unchanged.
Confirm the redundant ug-GETs are gone (RPC-count / metrics).
Summary
Once kubescape/storage#319 (server-side merge of user-managed
ug-ApplicationProfile / NetworkNeighborhood into the consolidated ContainerProfile) is deployed, node-agent's client-sideug-fetch + projection incontainerprofilecachebecomes redundant. storage#319 serves the already-merged CP onGET, so node-agent re-fetching and re-projecting the sameug-overlay is a duplicate of work the server now does.This issue tracks removing that client-side
ug-merge — gated on storage#319 being deployed — and the constraints that make it non-trivial.Context: this is the node-agent half of the consolidation introduced by storage#319; the client-side merge was added in #788 (port of
projection.go).Current behavior (post-#788)
ContainerProfileCacheImpl.tryPopulateEntry(pkg/objectcache/containerprofilecache/containerprofilecache.go) performs two distinct overlay passes:User-managed (
ug-) pass — the one storage#319 subsumes.userManagedAP/userManagedNNby the well-known nameUserApplicationProfilePrefix + workloadName(ug-<workload>), ~lines 331–369.projectUserProfiles(cp, userManagedAP, userManagedNN, …)(~line 462), the "user-managed projection pass".Test_12_MergingProfilesTest/Test_13_MergingNetworkNeighborhoodTest.User-defined overlay pass — NOT covered by storage#319.
userAP/userNNby the name in pod labelUserDefinedProfileMetadataKey→overlayName(~lines 393–422).buildEntry(cp, userAP, userNN, …)(~line 470).storage#319 keys only off
ug-<workloadSlug>, so it replaces pass (1) only.Why a double-merge is currently harmless (so this is cleanup, not a fix)
With storage#319 live, node-agent
GETs an already-merged CP and then re-projectsug-on top of it. I checkedprojection.go: capabilities/execs/opens/syscalls are raw-appended, but the effective semantics downstream are allow-list / set-union / user-wins, so duplicate entries don't change any enforcement decision. Net effect of the double-merge is redundant RPCs (two extraGETs per container populate/refresh tick) and minor bloat — not a correctness regression. Removing pass (1) eliminates that redundant work.Proposed change
Remove only the user-managed (
ug-) pass:ug-<workload>userManagedAP/userManagedNNfetches intryPopulateEntryand the corresponding refresh inreconciler.go.projectUserProfiles(...)user-managed projection call; keepbuildEntry(cp, userAP, userNN, …).UserManagedAPRV/UserManagedNNRVtracking fields,shouldLogOptionalUserManagedFetchError, user-managed branches ofemitOverlayMetrics, theuserManagedAP/NNarms of the "need SOMETHING to cache" / synthetic-CP fallback, and the reconciler'sug--RV-change re-projection trigger.projection.go/projectUserProfilesitself likely still serves pass (2) — confirm before deleting; remove only if pass (2) doesn't use it.Constraints / gotchas
ug-exceptions silently vanish → enforcement gap. This change must only ship after storage#319 is deployed. Recommend a minimum-storage-version floor (or a feature flag defaulting off until the storage version is confirmed) rather than relying on rollout ordering alone.UserDefinedProfileMetadataKeyoverlay also meant to move server-side eventually? If yes, that's a separate follow-up; if no, pass (2) stays in node-agent indefinitely.GETonly, notList/Watch. node-agent is safe because its CP cache is fullyGET-driven (container callbacks + reconcilerGetContainerProfile; no CP informer/watch). But this is the load-bearing assumption: if node-agent ever moves CP reads to a list-watch informer, the server-side merge would be bypassed andug-overlays would disappear. Worth a code comment near the CPGETnoting the dependency. (Separately flagged to the storage side as a request to makeList/Watchmerged-consistent or document theGET-only contract.)Validation
Test_12_MergingProfilesTest/Test_13_MergingNetworkNeighborhoodTestshould pass withug-enforcement now coming from the server-merged CP (point them at a storage build with Feature/path #319, or stub a merged CP).ug-profile.ug-GETs are gone (RPC-count / metrics).References