test: enforce restricted PSS for CI user namespace#3487
Conversation
Overwrites the default baseline namespace label created by the profile controller exclusively during CI tests. This guarantees that test workloads simulate strict pod security standard (PSS) restricted enforcements without modifying production default baselines. Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
- Broaden katib triggers: tests/katib_install.sh -> tests/katib* - Broaden pipeline triggers: individual files -> tests/pipeline* - Add tests/pipeline* trigger to pipeline_run_from_notebook workflow - Replace dead experimental/security/PSS/* path (directory no longer exists) with actual test files: tests/kubeflow_profile_install.sh and tests/PSS_enable.sh across all affected workflows Note: Dashboard/profiles directory paths are intentionally kept as-is since applications/profiles/ does not exist yet on master. Those paths will be updated once the directory restructure lands. Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome to the Kubeflow Manifests Repository Thanks for opening your first PR. Your contribution means a lot to the Kubeflow community. Before making more PRs: Community Resources:
Thanks again for helping to improve Kubeflow. |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR updates CI to validate/enforce Kubernetes Pod Security Standards (PSS) settings for Kubeflow profile namespaces, and wires the related test scripts into multiple GitHub Actions path filters so relevant workflows run when these scripts change.
Changes:
- Update
kubeflow_profile_install.shto label the Kubeflow user namespace withpod-security.kubernetes.io/enforce=restrictedandenforce-version=latest, and assert the labels are applied. - Add
tests/kubeflow_profile_install.shandtests/PSS_enable.shto workflowon.pull_request.pathsfilters across several test workflows. - Broaden some workflow path filters (e.g.,
tests/pipeline*,tests/katib*) to cover more related changes.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/kubeflow_profile_install.sh | Labels the profile namespace for PSS restricted + adds assertions for applied labels. |
| .github/workflows/workspaces_pipeline_run_test.yaml | Triggers workflow on profile/PSS test script changes. |
| .github/workflows/volumes_web_application_test.yaml | Triggers workflow on profile install script changes. |
| .github/workflows/training_operator_test.yaml | Triggers workflow on profile/PSS test script changes. |
| .github/workflows/trainer_test.yaml | Triggers workflow on profile/PSS test script changes. |
| .github/workflows/pipeline_test.yaml | Broadens pipeline path trigger + triggers on profile/PSS test script changes. |
| .github/workflows/pipeline_run_from_notebook.yaml | Adds pipeline wildcard + profile/PSS script triggers. |
| .github/workflows/kserve_test.yaml | Triggers workflow on profile/PSS test script changes. |
| .github/workflows/kserve_models_web_application_test.yaml | Triggers workflow on profile install script changes. |
| .github/workflows/katib_test.yaml | Broadens katib path trigger + triggers on profile/PSS test script changes. |
| .github/workflows/istio_validation.yaml | Triggers workflow on profile/PSS test script changes. |
| .github/workflows/dex_oauth2-proxy_test.yaml | Triggers workflow on profile/PSS test script changes. |
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
| kubectl get pods -n kubeflow-system -l app.kubernetes.io/name=trainer | ||
| kubectl get clustertrainingruntimes torch-distributed | ||
|
|
||
| kubectl patch clustertrainingruntime torch-distributed --type=json -p='[ |
There was a problem hiding this comment.
this must also be a proper overlay and maybe upstreamed.
There was a problem hiding this comment.
There might already bea recent Pr for this.
There was a problem hiding this comment.
Moved the torch-distributed runtime changes into applications/trainer/overlays/runtimes-restricted; upstream follow-up is noted separately.
There was a problem hiding this comment.
Checked kubeflow/trainer#3066 ; it helps with override support, but this PR still needs the local runtime default overlay.
There was a problem hiding this comment.
please a proper overlay. this has to survive the next release and then we can already remove it. Please raise a follow up PR to remove it such that i can merge it afte rthe 26.03.1 release.
There was a problem hiding this comment.
Moved this into applications/training-operator/overlays/kubeflow-restricted-pss and noted the follow-up removal after 26.03.1.
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 37 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
tests/workspaces_pipeline_run_test.sh:1
- This JSONPatch adds nested fields under
/spec/podTemplate/securityContext/...and/spec/podTemplate/containerSecurityContext/.... JSONPatchaddwill fail if the parent object (e.g.,/spec/podTemplate/securityContext) does not already exist. To make this robust across different base WorkspaceKind manifests, patch the parent objects in one operation (e.g., add/merge the fullsecurityContext/containerSecurityContextobjects) or use a strategic/merge patch (--type=merge) targeting the parent keys.
tests/kserve_test.sh:1 - The polling logic treats any response other than 403 as success (twice). This can still false-positive on 404/503 while routes/backends are not ready—exactly the failure mode mentioned in the comment above the loop. Prefer checking for the expected success code (typically 200) or at least explicitly excluding 404/503 (and possibly 000) from counting toward
STABLE_POLL_COUNT, to prevent passing the policy wait while the service is still unreachable.
#!/bin/bash
There was a problem hiding this comment.
Can you upstream this or make sure to only patch the securitycontext, but not the image for example ?
| serviceAccountName: default-editor | ||
| containers: | ||
| - name: ray-worker | ||
| image: rayproject/ray:2.44.1-py311-cpu |
There was a problem hiding this comment.
may you update ray and kuberay to the latest available versions please
There was a problem hiding this comment.
so not just this file, also the kuberay operator
|
|
||
| cd applications/katib/upstream && kustomize build installs/katib-with-kubeflow | kubectl apply -f - && cd ../../../ | ||
|
|
||
| kustomize build applications/katib/overlays/security | kubectl apply -f - |
There was a problem hiding this comment.
is it possible to have 2 steps for this? since these scripts are also used for the single step install and would reach customers.
There was a problem hiding this comment.
Maybe you have to use kustomize components instead of overlays to have a cleaner separation. If componnts do not work you can also stay with the overlay
There was a problem hiding this comment.
you caould have an if around it as in our kind installation that you install the extra security stuff on top only if you are in GHA. That goes for all installation scripts.
| export KSERVE_M2M_TOKEN="$(kubectl -n ${NAMESPACE} create token default-editor)" | ||
| export KSERVE_TEST_NAMESPACE=${NAMESPACE} | ||
|
|
||
| kubectl patch clusterstoragecontainer default --type=merge --patch '{ |
There was a problem hiding this comment.
can you add this directly to the kserve installation as patch ? Maybe even upstream it ?
| kubectl label namespace "$KF_PROFILE" \ | ||
| pod-security.kubernetes.io/enforce=restricted \ | ||
| pod-security.kubernetes.io/enforce-version=latest \ | ||
| --overwrite |
There was a problem hiding this comment.
only when you are in the GHA environments as in our kind script.
| kubectl -n kubeflow rollout status deployment/tensorboard-controller-deployment --timeout=60s | ||
| kubectl -n kubeflow rollout status deployment/tensorboards-web-app-deployment --timeout=60s | ||
| kubectl -n kubeflow rollout status deployment/volumes-web-app-deployment --timeout=60s | ||
| kubectl -n kubeflow rollout status deployment/jupyter-web-app-deployment --timeout=180s |
There was a problem hiding this comment.
why do you increase the timeouts so much ?
| kubectl wait --for=condition=Available deployment/jobset-controller-manager -n kubeflow-system --timeout=120s | ||
|
|
||
| kustomize build upstream/overlays/runtimes | kubectl apply --server-side --force-conflicts -f - | ||
| kustomize build overlays/runtimes-restricted | kubectl apply --server-side --force-conflicts -f - |
There was a problem hiding this comment.
the same comments as above. keep it like before for regular users, but if you are in gha add the extra security
|
|
||
| cd applications/training-operator/upstream | ||
| kustomize build overlays/kubeflow | kubectl apply --server-side --force-conflicts -f - | ||
| kustomize build applications/training-operator/overlays/kubeflow-restricted-pss | kubectl apply --server-side --force-conflicts -f - |
There was a problem hiding this comment.
same here, only when you are in the GHA environment
| kubectl apply -f /tmp/pytorch-job.yaml | ||
|
|
||
| kubectl wait --for=jsonpath='{.status.conditions[0].type}=Created' pytorchjob.kubeflow.org/pytorch-simple -n $KF_PROFILE --timeout=60s | ||
| kubectl wait --for=condition=Created pytorchjob.kubeflow.org/pytorch-simple -n "$KF_PROFILE" --timeout=180s |
There was a problem hiding this comment.
why do you increase the timeouts ?
There was a problem hiding this comment.
if really needed this has to be upstreamed. CC @christian-heusel
|
|
||
| kubectl get clustertrainingruntime torch-distributed -o yaml |
There was a problem hiding this comment.
what is the purpose of this line ?
There was a problem hiding this comment.
why do you change the timeouts ?
Summary of Changes
This PR makes CI validate Kubeflow user workloads under Kubernetes Pod Security Standards
restricted.kubeflow-user-example-comtopod-security.kubernetes.io/enforce=restrictedafter the Profile Controller creates it.kubectl patchor generated temp-overlay logic.Notes
This is intentionally CI/test focused. It does not change the default customer Profile namespace policy.
The new application changes are local Kubeflow manifests overlays under
applications/*/overlays. They do not edit synchronizedapplications/*/upstreamfolders orcommon/*manifests.Follow-up notes are tracked separately for keeping the copied Katib CI config in sync during future Katib syncs and for removing the temporary Training Operator overlay after the final
26.03.1path is available.Related
Follow-up to #3444.
Validation
Latest head
7adb2d55is green on all technical checks.bash -n,shellcheck, YAML parsing,kustomize buildfor the new overlays, rendered field assertions,git diff --check, and a guard confirming noapplications/*/upstream/**changes.tideis still waiting for review labels.Contributor Checklist