Skip to content

How to evaluate if the nvidia driver is available on the node #16

Description

@mgazz

With #14 we introduced a direct dependency to the nvidia-gpu-operator

Dependabot flagged the dependency for potential vulnerability:

The goal of this issue is to discuss how to proceed.

I looked at the source code and the operator checks for the nvidia gpu operator cluster policy to know if the driver is enabled.

Here some of the calls:

While checking the cluster policy gives us hints on the gpu-operator configuration, this does not ensure that the driver is actually installed. Errors during the installation might translate in the driver not being present.

Question: Should we verify if the driver is enabled by inspecting the current status of the node?
For instance, files like /proc/driver/nvidia/version or /proc/driver/nvidia/version can give us insight on the current state. As a bonus point we also remove the direct dependency to the nvidia gpu-operator

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions