Skip to content

Add granular DRA status RBAC permissions for ResourceClaim driver updates#54

Open
qingketsing wants to merge 1 commit into
CoHDI:mainfrom
qingketsing:fix-dra-granular-status-authz
Open

Add granular DRA status RBAC permissions for ResourceClaim driver updates#54
qingketsing wants to merge 1 commit into
CoHDI:mainfrom
qingketsing:fix-dra-granular-status-authz

Conversation

@qingketsing

Copy link
Copy Markdown

Summary

This PR updates dynamic-device-scaler RBAC to align with the Kubernetes v1.36 DRA granular status authorization changes.

The controller updates ResourceClaim.status.devices, so in addition to the existing resourceclaims/status permission it also needs resourceclaims/driver permissions with the arbitrary-node verbs.

Changes

  • add resourceclaims/driver permissions to the controller RBAC
  • grant:
    • arbitrary-node:update
    • arbitrary-node:patch
  • keep the existing resourceclaims/status permissions
  • update the kubebuilder RBAC annotation to match the shipped manifest

Why

Starting in Kubernetes v1.36, DRA components that update ResourceClaim.status.devices need additional authorization on the synthetic resourceclaims/driver subresource.

This controller runs as a control-plane Deployment, so arbitrary-node:* is the appropriate permission model.

Verification

  • ran go test ./...

Signed-off-by: Qingke <ljw329426@163.com>
@qingketsing qingketsing force-pushed the fix-dra-granular-status-authz branch from cac08c2 to 3e2e9bf Compare May 24, 2026 03:53
@qingketsing

Copy link
Copy Markdown
Author

Adding context for reviewers: this change is to align the controller RBAC with the Kubernetes DRA granular status authorization update in v1.36.

Related upstream tracking issue:

This controller updates ResourceClaim.status.devices and runs as a control-plane Deployment, so resourceclaims/driver with arbitrary-node:update / arbitrary-node:patch is the intended permission model.

@KobayashiD27

KobayashiD27 commented May 25, 2026

Copy link
Copy Markdown
Collaborator

@qingketsing
This issue seems to be related to the following PR, which is currently under verification:
CoHDI/cohdi-chart#7

We will wait for the result of that verification before proceeding with the dds side, and will revisit this PR once it is confirmed.

@qingketsing

Copy link
Copy Markdown
Author

Thanks for the context.

That makes sense. I’ll wait for the verification result of CoHDI/cohdi-chart#7, and I’m happy to adjust this PR based on the confirmed direction for the chart side.

@Fresnel-Fabian

Fresnel-Fabian commented May 25, 2026

Copy link
Copy Markdown

Thanks for putting this together. I think there may be one additional piece needed based on the upstream granular authorization implementation.

The authorizer performs per-driver matching on resourceclaims/driver, so the RBAC rule likely also needs:

resourceNames:
  - gpu.nvidia.com

From the discussion on cohdi-chart#7, maintainers mentioned CoHDI currently only supports gpu.nvidia.com, so hardcoding it seems consistent with the current design.

Without resourceNames, I believe authorization may still fail once DRAResourceClaimGranularStatusAuthorization is enabled.

Happy to open a follow-up commit/PR if useful.

@qingketsing

Copy link
Copy Markdown
Author

@Fresnel-Fabian
Thanks for pointing this out.

That makes sense to me. If the upstream authorizer performs per-driver matching for resourceclaims/driver, then granting the synthetic subresource without a matching resourceNames entry may not be sufficient once DRAResourceClaimGranularStatusAuthorization is enabled.

In that case, the controller could still fail authorization when trying to update ResourceClaim.status.devices, even though it already has resourceclaims/driver with arbitrary-node:update / arbitrary-node:patch. The failure would likely show up at runtime as a forbidden status update for the driver-specific authorization check.

Since CoHDI currently only supports gpu.nvidia.com, adding:

resourceNames:
  - gpu.nvidia.com

to the resourceclaims/driver RBAC rule sounds consistent with the current design.

I’ll wait for the maintainers’ confirmation, especially while the related chart-side verification is still in progress, and I’m happy to update this PR accordingly if this is the intended RBAC shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants