Skip to content

Do not remove driver when gpu.deploy.operands label is set to false#2575

Draft
cdesiniotis wants to merge 1 commit into
NVIDIA:mainfrom
cdesiniotis:disable-operands-label-does-not-disable-driver
Draft

Do not remove driver when gpu.deploy.operands label is set to false#2575
cdesiniotis wants to merge 1 commit into
NVIDIA:mainfrom
cdesiniotis:disable-operands-label-does-not-disable-driver

Conversation

@cdesiniotis

@cdesiniotis cdesiniotis commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Previously, setting nvidia.com/gpu.deploy.operands label to 'false' would remove all GPU Operator related pods from a node. This commit makes it so that the driver no longer gets removed when the gpu.deploy.operands label is set to false. To manually remove the driver pod from a node, a user has to explicitly label the node with nvidia.com/gpu.deploy.driver=false.

This change provides an extra guardrail for the driver pod whose removal from a node is highly disruptive. Additionally, this change is motivated by our future plans to integrate the NVIDIA DRA Driver for GPUs with the GPU Operator. In particular, this change helps provide a possible migration story from the k8s-device-plugin sw stack to the DRA driver sw stack that leverages the nvidia.com/gpu.deploy.operands label to switch between the respective software components.

Code changes in this PR were drafted with the assistance of Claude Code.

Note, this section in our documentation would have to be updated: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#preventing-installation-of-operands-on-some-nodes

Previously, setting nvidia.com/gpu.deploy.operands label to 'false'
would remove all GPU Operator related pods from a node. This commit
makes it so that the driver no longer gets removed when the
gpu.deploy.operands label is set to false. To manually remove the
driver pod from a node, a user has to explicitly label the node
with nvidia.com/gpu.deploy.driver=false.

This change provides an extra guardrail for the driver pod
whose removal from a node is highly disruptive. Additionally,
this change is motivated by our future plans to integrate the
NVIDIA DRA Driver for GPUs with the GPU Operator. In particular,
this change helps provide a possible migration story from
the k8s-device-plugin sw stack to the DRA driver sw stack that
leverages the nvidia.com/gpu.deploy.operands label to switch
between the respective software components.

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
@cdesiniotis cdesiniotis force-pushed the disable-operands-label-does-not-disable-driver branch from eb26f58 to 47d5cda Compare June 23, 2026 21:10
@cdesiniotis cdesiniotis self-assigned this Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant